investigating search engines

About Search Engines

Search engines use software "bots" to scout the Web and assemble databases or indexes of Web pages. When you enter a query at a search engine Web site, your input is checked against the search engine's keyword indexes. The best matches are then returned to you as hits. A search engine has two parts: A "robot", "crawler", or "spider" that travels to pages on the Web and assembles an expansive index, and a program that receives your search request, compares it to the index entries, and returns results to you.

Keyword Searching

key

This is the most common way to search the Web and it works by matching the text of your query to the database's indexed text. Authors often specify the keywords they want the search engines to use but often authors do not select keywords or the the search engine spider looks for additional information on the page. Words that are mentioned towards the top of a document and text that is repeated throughout the document are more likely to be judged as important. Some search engines index every word on every page, others index only part of the document.

Lycos, for example, indexes the first 20 lines of text, in addition to the title, headings, subheadings and the hyperlinks to other sites. Full-text indexing systems pick up every word in the text except commonly occurring stop words such as "a," "an," "the," "is," "and," "or," and "www." AltaVista utilizes full-text indexing without excluding the common articles, "a," "an," and "the." Some search engines discriminate between upper case and lower case; others store all words without reference to capitalization. Some search engines use the information contained in the mark up tags of the coded Web page, while others do not. Search engines vary in how they deal with singular and plural words, verb tenses, and stemming, e.g. if you enter the word "dog" should you receive "dogma" as a hit? Even in light of these shortcomings and potential points of confusion, keyword searching via search engines can be very powerful.

Search Engines


Spider-Indexed Search Engines
Human-Indexed Search Engines
Google
Yahoo
Alta Vista
Librarians' Index to the Internet
Northern Light
About.com
AllTheWeb
Open Directory
HotBot
LookSmart
Lycos
Ask Jeeves

Meta Search Engines

These web searching tools operate in basically the same regard as other search engines except they contain no indexed database of their own. When you submit a query, the search engine sends copies of the query to other search engines and returns those results. Search engines have differing algorithms for determining what are the most relevant pages matching your search. Meta search engines are useful because they provide a way to find common results among the sets of returned pages and a means of comparing and evaluating individual search engines. Although meta search engines are helpful, almost none of the freely available meta search engines search Google, a search engine indexing one of the largest numbers pages on the Web.

Webcrawler
RedeSearch
Copernic Basic 2001
Exploratorious
Ixquick
Dogpile
Metacrawler

Ask Jeeves

Vivisimo

Multimedia Search Engines

Often using a general search engine can yield frustrating results when attempting to find a specific type of multimedia such as image or audio files. Fortunately there are a variety of multimedia search engines on the Web.

Web Clip Art

Searches the about.com databases for all kinds of useful clip art

Scour

Finds audio, video and images on the Web, including MP3 files

Google Image Search BETA

Seeks relevant pictures from more than 150 million images, one of the most comprehesive image searching tools on the Web

MIDI Explorer

Allows searchers to look for MIDI (Musicial Instructment Digital Interface) files

Audiofind

Browses by artist, date or filename, or keyword search for MP3 (MPEG layor 3 - compressed audio) files

Specialized Search Engines

Often it can be useful to use a search engine that specifically looks for a type of document or specialized information. For example, if you were looking specifically for medical Information, using a standard search engine might leave you sifting through hundreds of Web sites while using a search engine that specializes in medical information would likely yield more relevant results.

Scirus

Scirus is a product developed by Elsevier Science, concentrating soley on collecting scientific content. It searches both the open Web and membership sources such as ScienceDirect, MEDLINE on BioMedNet, and Beilstein on ChemWeb. Scirus features an array of scientific information in data and chart form.

GEIN

The German Environmental Information Network search engine gathers public affairs information distributed across Web sites run by public institutions in Germany, such as environmental authorities, and agencies and ministries at the federal levels. It functions as an information broker for environmental information in Germany.

SportSearch

SportsSearch is a specialized directory of sports Web sites.

Artcyclopedia

Artcyclopedia is an index of hundreds of museum sites and image archives. Visitors can search for where the works of over 5,500 different artists can be viewed online.

MedHunt

MedHunt uses both humans and Web crawling to build its index of medical information. Searches can be narrowed by region and a French interface is available.

More Search Engine Information

Search Engine Watch
searchengines.com

WSU Libraries Search Engine Guides

Internet Search Engine Guide
Investigating Search Engines

Return to WSU Library Instruction Home Page