Second Generation Searching on the Web


This tutorial covers some of the newer search engine services on the Web. It includes a group of search services that make use of technology that organizes search results by concept, site, domain, popularity and linking. This is in contrast to the more common term relevancy ranking. This newer type of ranking often looks at "off-the-page" information to determine the retrieval and order of your search results. Search engines that employ this alternative may be thought of as second generation search services. For example:
Google ranks by the number of links from pages ranked high by the service
Direct Hit ranks according to sites other searchers have chosen from their results to similar queries
Inference Find ranks by concept and top-level domain
Northern Light sorts results into Custom Search Folders representing concepts and/or types of sites
Oingo performs a concept processing of your search and attempts to identify the probable intended meaning
  • Here are a few of the trends to watch with second-generation services:
  • The human element: concept processing. Second generation services such as Ask Jeeves, Northern Light, Oingo and SurfWax apply different kinds of concept processing to a search statement to determine the probable intent of a search. This is often accomplished by the use of human generated indexes. With these services, the burden of coming up with precise or extensive terminology is shifted from the user to the engine. These services are therefore taking on the role of thesauri.
    The human element: collective judgment. Search services such as Google and Direct Hit derive their results from the behavior of millions of Web users.
    The human element: directories. First generation search services have gotten into the act by partnering with second generation services and/or including content from human gathered directories with their search results to supplement documents retrieved from the spider-indexed Web. Examples include AltaVista, GO.com (Infoseek), Lycos and many others.

    Search engines covered in this tutorial


    Direct Hit
    Exercise: Organization of results according to popularity with Direct Hit

    Direct Hit - http://www.directhit.com/

    Online help

    Direct Hit calls itself a Popularity Engine because it ranks results according to sites other searchers have chosen from their results to similar queries.

    Special Features:
    Measures what pages users select from search results and roughly how long they spend visiting these pages
    Compiles results from daily data feeds of millions of records from a number of sources, including its major search engine partners
    Results can work quite well if other users are choosing quality and relevant sites
    Offers related queries to try out to enhance your initial query; these are hyperlinked to a list of results
    The service is also available at HotBot, Lycos and MSN Web Search
    Drawbacks:
    Results lists change dynamically over time as new data is retrieved by this service, so you may not get the best results at any particular time
    Popularity is not necessarily the best method of finding quality or relevant sites


    Google
    Exercise: Retrieving results by link ranking using Google.

    Google - http://www.google.com/

    Online help

    Google ranks results by the number of links from pages ranked high by the service. This unique ranking system can be quite effective.

    Special Features:
    Returns results ranked by the number of links from pages ranked high by the service; high ranking pages are also determined by the number of links to them
    In determining relevancy ranking, the engine also looks at various textual clues including linking text
    Search results include sites from the Open Directory Project, offering an interesting mix of sites from the wider Web and those chosen by editors for inclusion into the directory. See also Google's own version, the Google Web Directory.
    Requires no syntax: simply type keywords and Google defaults to the Boolean AND with term proximity
    OR searching is supported if "OR" is typed in CAPS, e.g., university OR college; works only with multiple single words
    Attempts to return results in which multiple query words are in close proximity within the source document
    For more refined searches, use quotations for phrases ("El Nino") or a minus sign (-) for the Boolean NOT
    Engine searches on stop words (and, the, etc.) with the following syntax: "rules +of +the game"
    Engine does not stem words; it searches on your word form exactly as it is typed
    Results include the text from the source document that matches your query I'm feeling lucky option returns the top-ranked source for a query
    Offers searching of Web pages in a number of languages; and the Google site can be  set to display only the tips and instructions in a different language
    Offers a spell check operation. Example: spell:priviledge
    Returns a link to a map from Yahoo! or MapBlast if you enter an address, e.g., 198 central avenue albany ny
    Returns stock prices if you enter a company's ticker symbol
    Displays links to news headlines when they are relevant to a search
    Drawbacks:
    New Web pages will not appear in your results, as it takes time for the creators of other Web pages to link to new resources, and for this activity to be reflected at Google


    Inference Find
    Exercise: Organization of results into concepts and/or types of sites with Inference Find

    Inference Find - http://www.infind.com/

    Online help

    Inference Find is a meta search engine that searches six search engines simultaneously. The search engine merges the results, removes duplicate files, and organizes the results into sections.

    Special Features:
    Retrieves the maximum number of results each search engine will allow by searching target engines in parallel. For example, GO.com is searched three times in parallel.
    Groups results into sections by concepts and/or by top-level domain, e.g., educational institution, non-profit site, European site, federal government, etc.
    Drawbacks:
    Gives no syntax directions. Suggests the use of Boolean operators but cautions about inconsistent results
    List of results contains only titles of Web pages, so the relevancy of the source document is not always easy to determine without visiting the page


    TracerLock and The Informant
    Exercise: Storing Queries for regularly updated results with TracerLock and The Informant

    TracerLock - http://www.peacefire.org/tracerlock/
    The Informant - http://informant.dartmouth.edu/

    TracerLock and The Informant are two services that save your search query, process it at regular intervals, and e-mail you when new pages are found containing your search terms. Both services are free, and require users to register. Each service works a little different from the other.

    Special Features: TracerLock
    Stores Boolean search statements for regular processing
    Searches AltaVista every night for pages matching your terms that were indexed by AltaVista on or before the date three days ago, and on or after the date stored with your search terms; the date is adjusted daily to keep the search fresh
    You can reset the search date window at any time
    The first ten results are sent to you by e-mail
    Special Features: The Informant
    Searches up to three sets of keywords
    Searches on these keywords at a periodic interval of 3, 7, 14, 30 or 60 days
    Uses AltaVista, Lycos, Excite, and GO.com
    If a new page appears in the top ten most relevant new pages, or if pages from your previous top ten list has been updated, The Informant will send you an e-mail
    Also tracks up to 5 URLs and e-mails you when page changes occur
    Note: Karnak is another similar service to try. The free service will track only one query, but the premium services will track multiple queries. A strength of this service is the significant number of sources which it uses to locate documents.


    Ixquick Metasearch
    Exercise: Tapping into the ranking schemes of several engines with Ixquick Metasearch

    Ixquick Metasearch - http://ixquick.com/

    Online help

    Ixquick Metasearch is a meta search engine that searches multiple engines and directories and returns only those documents that appear in the top 10 of any search results.

    Special Features:
    Returns the most relevant results as ranked in the top 10 by a number of individual sources
    Uses a "star" system whereby the number of stars indicates the number of sites ranking each result in the top 10
    Shows the sources that have ranked the page and the placement within the top 10 list, e.g., Google (1)
    Offers a variety of search options including full Boolean, implied Boolean, natural language search, truncation, case sensitivity and field searching; Ixquick sends your query to the engines that support these options
    Also searches for news, MP3 music files and pictures
    Drawbacks:
    Because it offers only the top 10 results from any source, obscure sites will not appear in its results
    Some search syntax options do not work well, i.e., natural language searching is an option but the results are not necessarily successful


    Northern Light
    Exercise: Grouping of results into concept folders with Northern Light

    Northern Light - http://www.nlsearch.com/

    Online help: Main Screen | Power Search

    Northern Light organizes results into Custom Search Folders that represent concepts and/or types of sites. Results within these folders are relevancy ranked. With this system, you can ignore the folders that are irrelevent and choose those that fit your query best. This may be more convenient than working through one master list of results.

    Special Features:
    Sorts search results into folders by subjects, types (e.g., press releases, maps), source sites, or languages
    Within folder levels, a new group of folders is presented
    Relevancy ranked results are available on the same screens as the folders
    Clusters results on this results list by presenting one hit per site
    Offers a Special Collection database of relevant articles from thousands of sources for a small fee
    Contains a very large database of searchable files
    Drawbacks:
    Folders may not be consistently useful for all queries; however, you can simply skip over irrelevant folders
    If you like Northern Light, check out Vivisimo. This service allows you to choose a search engine and it will organize your results into categories.


    Oingo
    Exercise: Concept processing with Oingo

    Oingo - http://www.oingo.com/

    Online help

    Oingo offers what it calls "meaning-based" searching. This service parses each search statement to identify the probable intended meaning. This is done by processing searches through a network of interconnected meanings called the Oingo Lexicon and attempting to determine probabilities of relevancy. Oingo also offers topics related to the term(s) if you wish to pursue a different aspect of your topic.

    Special Features:
    Parses search statements to identify the probable intended meaning and presents results based on this meaning
    With each search result list, offers a variety of concepts related to your search term(s) from which you can choose for further identifying and searching the desired concept
    Concept choices can be very wide-ranging and relieve you of coming up with precise or extensive terminology
    Works well with complex or ambiguous queries
    Takes its results from dmoz Open Directory Project and AltaVista
    Drawbacks:
    Search terms entered as a phrase may end out with its concepts split in the search results
    Concept searching is not applied to AltaVista results
    Concept options may not be useful to a search
    Not all chosen concepts yield relevant results
    No field searching is available
    TIP: Simplified is a search service that offers a similar option to choose from alternative meanings.


    SurfWax
    Exercise: Concept searching with SurfWax

    SurfWax - http://www.surfwax.com/

    Online help

    SurfWax is a meta engine that offers options to see a quick view of the content of sites in your search results list, along with search terms to broaden or narrow a subsequent search.

    Special Features:
    Offers "SiteSnaps" that display summaries of retrieved sites including Author Description, Key Points, Emphasis and FocusWords
    FocusWords may be chosen to be added to your Personal Searcher for a future search [Note: you must have Preferences set to turn on personalization]
    Focus feature may be applied to your search terms, allowing you to choose broader or narrower search terms to apply to subsequent searches
    Various personalization options are available
    Drawbacks:
    Not all sites have SiteSnaps and FocusWords, though the number is growing

    Any comments, concerns, questions or suggestions should be directed to:

    Library Webmaster
    Paul Meek Library
    University of Tennessee at Martin
    Martin, TN 38238