
 |

|
|
Second
Generation Searching on the Web |
|
|
|
This tutorial covers some of the newer search engine services on the Web.
It includes a group of search services that make use
of technology that organizes search results by concept, site, domain,
popularity and linking. This is in contrast to the more common term
relevancy ranking. This newer type of ranking often looks at "off-the-page"
information to determine
the retrieval and order of your search results. Search engines that
employ this alternative may be thought of as second generation
search services. For example:
|
Google ranks by the number of
links from pages ranked high by the service |
|
Direct Hit ranks according to
sites other searchers have chosen from their results to similar queries |
|
Inference Find ranks by concept
and top-level domain |
|
Northern Light sorts results
into Custom Search Folders representing concepts and/or types of sites |
|
Oingo performs a concept processing
of your search and attempts to identify the probable intended meaning |
|
|
Here are a few of the trends to watch with second-generation services:
|
|
|
The human element: concept processing. Second generation
services such as
Ask Jeeves,
Northern Light,
Oingo and
SurfWax
apply different kinds of concept processing to a search statement to
determine the probable intent of a search. This is often accomplished
by the use of human generated indexes. With these services, the burden
of coming up with precise or extensive terminology is shifted from the user
to the engine. These services are therefore taking on the role of thesauri. |
|
The human element: collective judgment. Search services such as
Google and
Direct Hit
derive their results from the behavior of millions of Web users. |
|
The human element: directories. First generation search
services have gotten into the act by partnering with second
generation services and/or including content from human gathered
directories with their search results to supplement documents retrieved
from the spider-indexed Web. Examples include
AltaVista,
GO.com (Infoseek),
Lycos and many others. |
|
|
|
|
Search engines covered in this tutorial
|
|
|
|
|
|
Direct Hit
|
|
Exercise: Organization of results according to popularity with Direct
Hit
|
|
|
Online help
Direct Hit calls itself a Popularity Engine because it ranks results
according to sites other searchers have chosen from their results to
similar queries.
|
|
|
Special Features:
|
|
|
Measures what pages users select from search results and roughly how
long they spend visiting these pages |
|
Compiles results from daily data feeds of millions of records from
a number of sources, including its major search engine partners |
|
Results can work quite well if other users are choosing quality and
relevant sites |
|
Offers related queries to try out to enhance
your initial query; these are hyperlinked to a list of results |
|
The service is also available at HotBot,
Lycos and MSN Web Search |
|
|
Drawbacks:
|
|
|
Results lists change dynamically over time as new data is retrieved by
this service, so you may not get the best results at any particular time |
|
Popularity is not necessarily the best method of finding quality or
relevant sites |
|
|

|
|
Google
|
|
Exercise: Retrieving results by link ranking using Google.
|
|
|
Online help
Google ranks results by the number of links from pages ranked high by the
service. This unique ranking system can be quite effective.
|
|
|
Special Features:
|
|
|
Returns results ranked by the number of links from pages ranked high by
the service; high ranking pages are also determined by the number of links
to them |
|
In determining relevancy ranking, the engine also looks at various
textual clues including linking text |
|
Search results include sites from the Open
Directory Project, offering an interesting mix of sites from the
wider Web and those chosen by editors for inclusion into the directory.
See also Google's own version, the
Google Web Directory. |
|
Requires no syntax:
simply type keywords and Google defaults to the Boolean AND with term
proximity |
|
OR searching is supported if "OR" is typed in CAPS, e.g.,
university OR college; works only with multiple single words |
|
Attempts to return results in which multiple query words are in
close proximity within the source document |
|
For more refined searches, use quotations for phrases
("El Nino") or a minus sign (-) for the Boolean NOT |
|
Engine searches on stop words (and, the, etc.) with the following
syntax: "rules +of +the game" |
|
Engine does not stem
words; it searches on your word form exactly as it is typed |
|
Results include the text
from the source document that matches your query I'm feeling lucky
option returns the top-ranked source for a query |
|
Offers searching of Web
pages in a number of languages; and the Google site can be set
to display only the tips and instructions in a different language |
|
Offers a spell check
operation. Example: spell:priviledge |
|
Returns a link to a map
from Yahoo! or MapBlast if you enter an address, e.g., 198 central
avenue albany ny |
|
Returns stock prices if you enter a company's ticker symbol |
|
Displays links to news headlines when they are relevant to a search |
|
|
Drawbacks:
|
|
|
New Web pages will not appear in your results, as it takes time for the
creators of other Web pages to link to new resources, and for this activity to
be reflected at Google |
|
|

|
|
Inference Find
|
|
Exercise: Organization of results into concepts and/or types of
sites with Inference Find
|
|
|
Online help
Inference Find is a meta search engine that searches six search engines
simultaneously. The search engine merges the results, removes duplicate
files, and organizes the results into sections.
|
|
|
Special Features:
|
|
|
Retrieves the maximum number of results each search engine will allow by
searching target engines in parallel. For example, GO.com is searched
three times in parallel. |
|
Groups results into sections by concepts and/or by top-level domain, e.g.,
educational institution, non-profit site, European site, federal government,
etc. |
|
|
Drawbacks:
|
|
|
Gives no syntax directions. Suggests the use of Boolean operators but
cautions about inconsistent results |
|
List of results contains only titles of Web pages, so the relevancy of
the source document is not always easy to determine without visiting the page
|
|
|

|
|
TracerLock
and The Informant
|
|
Exercise: Storing Queries for regularly updated results with TracerLock
and The Informant
|
|
|
TracerLock and The Informant are two services that save your search query,
process it at regular intervals, and e-mail you when new pages are found
containing your search terms. Both services are free, and require users
to register. Each service works a little different from the other.
|
|
|
Special Features: TracerLock
|
|
|
Stores Boolean search statements for regular processing |
|
Searches AltaVista every night for pages matching your terms that
were indexed by AltaVista on or before the date three days ago, and on or after
the date stored with your search terms; the date is adjusted daily to
keep the search fresh |
|
You can reset the search date window at any time |
|
The first ten results are sent to you by e-mail |
|
|
Special Features: The Informant
|
|
|
Searches up to three sets of keywords |
|
Searches on these keywords at a periodic interval of 3, 7, 14, 30 or
60 days |
|
Uses AltaVista, Lycos, Excite, and GO.com |
|
If a new page appears in the top ten most relevant new pages, or if
pages from your previous top ten list has been updated, The Informant
will send you an e-mail |
|
Also tracks up to 5 URLs and e-mails you when page changes occur |
|
|
Note: Karnak is another
similar service to try. The free service will track only one query,
but the premium services will track multiple queries. A strength
of this service is the significant number of sources which it uses to
locate documents.
|
|

|
|
Ixquick Metasearch
|
|
Exercise: Tapping into the ranking schemes of several engines with
Ixquick Metasearch
|
|
|
Online help
Ixquick Metasearch is a meta search engine that searches multiple engines
and directories and returns only those documents that appear in the top 10
of any search results.
|
|
|
Special Features:
|
|
|
Returns the most relevant results as ranked in the top 10 by a number
of individual sources |
|
Uses a "star" system whereby the number of stars indicates the
number of sites ranking each result in the top 10 |
|
Shows the sources that have ranked the page and the placement within
the top 10 list, e.g., Google (1) |
|
Offers a variety of search options including full Boolean, implied
Boolean, natural language search, truncation, case sensitivity and field
searching; Ixquick sends your query to the engines that support these
options |
|
Also searches for news, MP3 music files and pictures |
|
|
Drawbacks:
|
|
|
Because it offers only the top 10 results from any source, obscure sites
will not appear in its results |
|
Some search syntax options do not work well, i.e., natural language
searching is an option but the results are not necessarily successful |
|
|

|
|
Northern Light
|
|
Exercise: Grouping of results into concept folders with Northern Light
|
|
|
Online help:
Main Screen |
Power Search
Northern Light organizes results into Custom Search Folders that represent
concepts and/or types of sites. Results within these folders are relevancy
ranked. With this system, you can ignore the folders that are irrelevent and
choose those that fit your query best. This may be more convenient than
working through one master list of results.
|
|
|
Special Features:
|
|
|
Sorts search results into folders by subjects, types (e.g., press
releases, maps), source sites, or languages |
|
Within folder levels, a new group of folders is presented |
|
Relevancy ranked results are available on the same screens as the folders
|
|
Clusters results on this results list by presenting one hit per site |
|
Offers a Special Collection database of relevant articles from thousands
of sources for a small fee |
|
Contains a very large database of searchable files |
|
|
Drawbacks:
|
|
|
Folders may not be consistently useful for all queries; however, you
can simply skip over irrelevant folders |
|
|
If you like Northern Light, check out
Vivisimo. This service allows you to choose a search engine and it
will organize your results into categories.
|
|

|
|
Oingo
|
|
Exercise: Concept processing with Oingo
|
|
|
Online help
Oingo offers what it calls "meaning-based" searching. This service parses
each search statement to identify the probable intended meaning.
This is done by processing searches through a network of interconnected
meanings called the Oingo Lexicon and attempting to determine probabilities
of relevancy. Oingo also offers topics related to the term(s)
if you wish to pursue a different aspect of your topic.
|
|
|
Special Features:
|
|
|
Parses search statements to identify the probable intended meaning
and presents results based on this meaning |
|
With each search result list, offers a variety of concepts related
to your search term(s) from which you can choose for further identifying
and searching the desired concept |
|
Concept choices can be very wide-ranging and relieve you
of coming up with precise or extensive terminology |
|
Works well with complex or ambiguous queries |
|
Takes its results from dmoz Open Directory
Project and AltaVista |
|
|
Drawbacks:
|
|
|
Search terms entered as a phrase may end out with its concepts split
in the search results |
|
Concept searching is not applied to AltaVista results |
|
Concept options may not be useful to a search |
|
Not all chosen concepts yield relevant results |
|
No field searching is available |
|
|
TIP:
Simplified is a
search service that offers a similar option to choose from alternative
meanings.
|
|

|
|
SurfWax
|
|
Exercise: Concept searching with SurfWax
|
|
|
Online help
SurfWax is a meta engine that offers options to see a quick view of
the content of sites in your search results list, along with
search terms to broaden or narrow a subsequent search.
|
|
|
Special Features:
|
|
|
Offers "SiteSnaps" that display summaries of retrieved sites
including Author Description, Key Points, Emphasis and FocusWords |
|
FocusWords may be chosen to be added to your Personal Searcher for a
future search [Note: you must have Preferences set to turn on personalization]
|
|
Focus feature may be applied to your search terms, allowing you to choose
broader or narrower search terms to apply to subsequent searches |
|
Various personalization options are available |
|
|
Drawbacks:
|
|
|
Not all sites have SiteSnaps and FocusWords, though the number
is growing |
|
|
|
|
Any comments, concerns,
questions or suggestions should be directed to:
Library
Webmaster
Paul Meek Library
University of Tennessee at Martin
Martin, TN 38238
|