Library Instruction

Investigating Search Engines

Search engines use software "bots" to scout the Web and assemble databases or indexes of Web pages. When you enter a query at a search engine Web site, your input is checked against the search engine's keyword indexes. The best matches are then returned to you as hits. A search engine has two parts: A "robot", "crawler", or "spider" that travels to pages on the Web and assembles an expansive index, and a program that receives your search request, compares it to the index entries, and returns results to you.

Keyword Searching

This is the most common way to search the Web and it works by matching the text of your query to the database's indexed text. Authors often specify the keywords they want the search engines to use but often authors do not select keywords or the the search engine spider looks for additional information on the page. Words that are mentioned towards the top of a document and text that is repeated throughout the document are more likely to be judged as important. Some search engines index every word on every page, others index only part of the document.

Lycos, for example, indexes the first 20 lines of text, in addition to the title, headings, subheadings and the hyperlinks to other sites. Full-text indexing systems pick up every word in the text except commonly occurring stop words such as "a," "an," "the," "is," "and," "or," and "www." AltaVista utilizes full-text indexing without excluding the common articles, "a," "an," and "the." Some search engines discriminate between upper case and lower case; others store all words without reference to capitalization. Some search engines use the information contained in the mark up tags of the coded Web page, while others do not. Search engines vary in how they deal with singular and plural words, verb tenses, and stemming, e.g. if you enter the word "dog" should you receive "dogma" as a hit? Even in light of these shortcomings and potential points of confusion, keyword searching via search engines can be very powerful.

Spider-Indexed Search Engines
Human-Indexed Search Engines (Directories)
Blog Search Engines
Visual Search Engines
Meta Search Engines

These web searching tools operate in basically the same regard as other search engines except they contain no indexed database of their own. When you submit a query, the search engine sends copies of the query to other search engines and returns those results. Search engines have differing algorithms for determining what are the most relevant pages matching your search. Meta search engines are useful because they provide a way to find common results among the sets of returned pages and a means of comparing and evaluating individual search engines. Although meta search engines are helpful, almost none of the freely available meta search engines search Google, a search engine indexing one of the largest numbers pages on the Web.

Multimedia Search Engines

Often using a general search engine can yield frustrating results when attempting to find a specific type of multimedia such as image or audio files. Fortunately there are a variety of multimedia search engines on the Web.

Specialized Search Engines

Often it can be useful to use a search engine that specifically looks for a type of document or specialized information. For example, if you were looking specifically for medical Information, using a standard search engine might leave you sifting through hundreds of Web sites while using a search engine that specializes in medical information would likely yield more relevant results.

More Search Engine Information
WSU Libraries Search Engine Guides