Checklist of Internet Research Tips


The Internet is a self-publishing medium. It is not a library of evaluated publications selected by professionals. Rather, the Internet is a bulletin board containing everything from the definitive to the spurious. Everything, everything must be analyzed for its appropriateness for research use. For guidelines on how to do this, see Evaluating Internet Resources.
Be sure to try out a handful of sites when researching a topic on the Internet. Do not rely on only one site or one type of site.
Three major resources for locating Internet materials are the subject directory, the search engine, and databases on the so-called "invisible Web." Be sure you understand the difference:

  •  
  • SUBJECT DIRECTORY

    Definition: A subject directory is a service that offers a collection of links to Internet resources submitted by site creators or evaluators and organized into subject categories. Directory services use selection criteria for choosing links to include, though the selectivity varies among services. Most directories offer a search engine mechanism to query the service.
    When using subject directories, keep in mind that:
    There are two basic types of directories: academic and professional directories often created and maintained by subject experts to support the needs of researchers, and commercial portals that cater to the general public and are competing for traffic. Be sure you use the directory that appropriately meets your needs.
    Subject directories differ significantly in selectivity. Consider the policies of any directory that you visit. One challenge to this is the fact that not all directory services are willing to disclose either their policies or the names and qualifications of site reviewers. This is especially true of commercial portals.
    Many people don't make enough use of subject directories, but instead go straight to search engines. Keep in mind that academic subject directories contain carefully chosen and annotated lists of quality Internet sites. Don't overlook subject directories when searching for quality on the Internet.
    INFOMINE is a good example of an academic subject directory. Yahoo! is a good example of a commercial portal. A more complete list of both types of directories may be found on the page Internet Subject Directories.

    SEARCH ENGINES

    Definition: A search engine is a searchable database of Internet files collected by a computer program (called a wanderer, crawler, robot, worm, spider). Indexing is created from the collected files, e.g., title, full text, size, URL, etc. There is no selection criteria for the collection of files.
    A search engine might well be called a search engine service or a search service. As such, it consists of three components:
    Spider: Program that traverses the Web from link to link, identifying and reading pages
    Index: Database containing a copy of each Web page gathered by the spider
    Search engine mechanism: Software that enables users to query the index and that usually returns results in relevancy ranked order
    GO.com is a good example of a search engine. A more complete list may be found on the page Internet Search Engines.

    INVISIBLE WEB

    Definition: The so-called "invisible" or "deep" Web consists of information stored in searchable databases mounted on the Web. These databases usually search a targeted topic or aspect of a topic, though entire Web sites may be contained within a database. Search engine spiders cannot or will not index this information
    When dealing with the invisible Web, keep in mind that:
    There are Web sites that specialize in collecting links to databases available on the Web. One such site is actually called The InvisibleWeb and links to 10,000 Web-accessible databases. You may also want to visit other sites that collect links to Web databases.
    Topical coverage on the Invisible Web is extremely varied. This presents a challenge, since it is impossible to anticipate what might turn up in a database. In addition, this coverage will be fluid as databases proliferate on the Web.
    Information that is dynamically changing in content will appear on the Invisible Web. Examples include news, job postings, available airline flights, etc.
    Subject directories can be highly useful when you have a broad topic and wish to view recommended sites relevant to that topic. Search engines are appropriate to use when you are looking for a specific site or have a narrow topic to pursue.
    Yahoo! is one of the most popular sites on the Web. It is one of the Web's largest commercial portals. But it is not a reliable or adequate research tool and should not be used for this purpose. 
    Beware of its drawbacks:
    Yahoo! is merely the passive recipient of sites submitted to it
    Yahoo!'s staff does not carefully evaluate content when choosing to add items to the database; therefore scholarly sites are haphazardly mixed in with everything else
    When you do a search in Yahoo!, you are searching only the title and the short descriptive blurb about the site; by contrast, search engines usually give you access to the full text of the document
    Yahoo! tends to index only the major landing page of a site; therefore, any significant subsidiary pages on a related or different topic may not show up on this site
    The editors at Yahoo! don't have time to review all submitted sites. Many sites are not added to Yahoo! because there simply isn't enough time.
    It is very helpful to understand the principles of Boolean search logic when using a search engine on the Web. This search logic is manifested in three distinct ways on Web search engines. Review Boolean Searching on the Internet.
    Other search strategies are also useful to examine in order to make accurate use of Web search engines. Be sure to check these out.
    When you enter more than one word in a Web search engine, the space between the words has a logical meaning that directly affects your results. This is known as the default syntax. For example:
    In AltaVista, Infoseek, and Excite, a search on the words
    birds    migration
    means that you will get back documents that contain either the word birds, the word migration, or both. The space between the words defaults to the Boolean OR. This is probably not what you want for this search.
    In HotBot, Lycos and Northern Light, a search on the words
    birds migration
    means that you will get back documents that contain both the words birds and migration. The space between the words defaults to the Boolean AND. This is more appropriate.
    Be sure you know the default syntax of the search engine you are using. For an overview of the default syntax of major search engines, see Quick Reference Guide to Search Engine Syntax.
    When using Web search engines, a de facto search language is emerging especially for basic search (i.e., main screen) interfaces. When in doubt, use the following syntax:
    use a + for mandatory words:  +birds    +migration
    phrases within double quotations:  "human rights"
    If you are looking to search without the hassle of perfecting a technique, try Ask Jeeves. This search service find answers to questions stated in plain English.
    Search engines offer numerous features that help you hone in on what you want. For a review of these features, and the search engines that support them, see How to Choose a Search Engine or Research Database.
    Search engines return results in a schematic order. Most search engines use various criteria to construct a term relevancy rating of each hit and will present your search results in this order. Criteria can include: search terms in the title, URL, first heading, HTML META tag; number of times search terms appear in the document; search terms appearing early in the document; search terms appearing close together; etc.
    One of the most interesting developments in search engine technology is the organization of search results by concept, site, domain, popularity and linking rather than by relevancy. This type of ranking looks at "off-the-page" information to determine the order of your search results. Search engines that employ this alternative may be thought of as second generation search services. For example:
    Direct Hit ranks according to sites other searchers have chosen from their results to similar queries
    Google! ranks by the number of links from pages ranked high by the service
    Inference Find ranks by concept and top-level domain
    Northern Light sorts results into Custom Search Folders representing concepts and/or types of sites
    A more detailed look at second generation search services may be found in the tutorial Second Generation Searching on the Web.
    Don't be impressed--or even necessarily worried-- by a large number of hits in response to a well-formulated search. Often multiple pages are returned from a single site because they all contain your search terms. AltaVista, Infoseek, HotBot, Northern Light and Lycos avoid this by a technique called results grouping, whereby all the results from one site are clustered together into one result. You are then given the opportunity to view all the retrieved pages from that site if you choose. With these engines, you may get a smaller number of results from a search, but each result is coming from a different site.
    If you have too many search results, or results that are not relevant:
    Add concept words
    Use vocabulary that is specific to your topic, e.g., Honda rather than cars.
    Link appropriate terms with the Boolean AND ( + ) so that each term is required to appear in the record
    Use term proximity operators if they are available
    Narrow your search to individual parts of the Web page such as title, first page level, etc.
    Use the Boolean NOT to keep out records containing terms you don't want
    If you have too few search results:
    Drop off the least important concept(s) to broaden your subject
    Use more general vocabulary
    Add alternate terms or spellings for individual concepts and connect with the Boolean OR
    Try the option available on some engines to find related documents to one or more of your relevant hits. Excite, HotBot and Infoseek all offer this type of feature.
    Meta search engines simultaneously search multiple search engines. They are also referred to as parallel search engines, multithreaded search engines, or or mega search engines. These are useful when:
    you have an obscure topic
    you are not having luck finding anything when you search
    your search is not complex
    you want to retrieve a relatively small number of relevant results
    There are two types of meta search engines:
    One type searches a number of engines and does not collate the results. This means you must look through a separate list of results from each engine that was searched; you will often see the same result more than once. Some engines require you to visit each site to view your results, while others will fetch the results back to their own sites. When results are brought back to the site, a certain limitation is placed on what is allowed to be retrieved (more on this in the next point). With this type of meta search engine, you can retrieve comprehensive, and sometimes overwhelming, results. An example of this type of engine is Dogpile.
    The other type is more common and returns a single list of results, often with the duplicate hits removed. This type of meta engine always brings the results back to its own site for viewing. In these cases, the engine retrieves a certain maximum number of documents from the individual engines it has searched, cut off after a certain point as the search is processed. Inference Find claims to return the maximum number of results that its targeted search engines will allow. Other meta search engines stop processing a query after a certain amount of time. Still others give the user a certain degree of control over the number of documents returned in a search. All these factors have two implications:
    These meta search engines return only a portion of the documents available to be retrieved from the individual engines they have searched
    Results retrieved by these engines can be highly relevant, since they are usually grabbing the first items from the relevancy-ranked list of hits returned by the individual search engines

    The better meta search engines remove duplicate files and give you some information along with the document title. To see a list of meta search engines, visit Internet Search Engines.

    Keep in mind that search engines do not index all the documents available on the Web. For example, most search engines cannot index files to password-protected sites, behind firewalls, or configured by the host server to be left alone. Still other Web pages may not be picked up if they are not linked to other pages, and are therefore missed by a search engine spider as it crawls from one page to the next. Search engines rarely contain the most recent documents posted to the Internet; do not look for yesterday's news on a search engine.
    Finally, watch for converging content. Many well-known sites now contain information from an array of sources. This can increase the usefulness of search sites, but also create confusion in terms of the information source. For example:
    Spider gathered index: The mechanism for searching a spider-gathered index is the feature people usually associate with a search engine.
    Results from other search services: It is increasingly common for a search engine to return results from other services with which it has partnered. Examples include Ask Jeeves, Direct Hit, RealNames and many others. Each partner service offers an enhancement over the more traditional term ranked results. This represents an interesting combination of first and second generation search technologies appearing on the same site.
    Directory: Many search services offer a directory on their sites. This directory may be a name brand such as LookSmart or the Open Directory Project, or a directory compiled by a site's own editors. Results from the directory may appear automatically with results from the spider-crawled Web, or the directory may be searched or browsed separately.
    Specialty databases: Many search services offer the option to search databases offering specific content. Included may be news, business, shopping, multimedia files, and so on. These databases constitute a small subset of the Invisible Web.

    Any comments, concerns, questions or suggestions should be directed to:

    Library Webmaster
    Paul Meek Library
    University of Tennessee at Martin
    Martin, TN 38238