Search engines use software "bots" to scout the Web and assemble databases or indexes of Web pages. When you enter a query at a search engine Web site, your input is checked against the search engine's keyword indexes. The best matches are then returned to you as hits. A search engine has two parts: A "robot", "crawler", or "spider" that travels to pages on the Web and assembles an expansive index, and a program that receives your search request, compares it to the index entries, and returns results to you.
This is the most common way to search the Web and it works by matching the text of your query to the database's indexed text. Authors often specify the keywords they want the search engines to use but often authors do not select keywords or the the search engine spider looks for additional information on the page. Words that are mentioned towards the top of a document and text that is repeated throughout the document are more likely to be judged as important. Some search engines index every word on every page, others index only part of the document.
Lycos, for example, indexes the first 20 lines of text, in addition to the title, headings, subheadings and the hyperlinks to other sites. Full-text indexing systems pick up every word in the text except commonly occurring stop words such as "a," "an," "the," "is," "and," "or," and "www." AltaVista utilizes full-text indexing without excluding the common articles, "a," "an," and "the." Some search engines discriminate between upper case and lower case; others store all words without reference to capitalization. Some search engines use the information contained in the mark up tags of the coded Web page, while others do not. Search engines vary in how they deal with singular and plural words, verb tenses, and stemming, e.g. if you enter the word "dog" should you receive "dogma" as a hit? Even in light of these shortcomings and potential points of confusion, keyword searching via search engines can be very powerful.
Google Web Directory enables people to browse and search through websites that have been organized into categories. Our directory combines Google search technology with the categorization developed by the Open Directory Project; it is available in 75 languages.
Google Blog Search is the easiest way to search for blog content on the web. Using the same technology that powers Google's web search, Google Blog Search provides fresh, relevant search results from millions of feed-enabled blogs. Users can search for blog posts, blog names, authors, or a specific date range. Google Blog Search also features Google's SafeSearch technology, giving users control over the content of their search results.
The leading blog search engine, Technorati.com indexes millions of blog posts in real time and surfaces them in seconds. The site has become the definitive source for the top stories, opinions, photos and videos emerging across news, entertainment, technology, lifestyle, sports, politics and business. Technorati.com tracks not only the authority and influence of blogs, but also the most comprehensive and current index of who and what is most popular in the Blogosphere.
These web searching tools operate in basically the same regard as other search engines except they contain no indexed database of their own. When you submit a query, the search engine sends copies of the query to other search engines and returns those results. Search engines have differing algorithms for determining what are the most relevant pages matching your search. Meta search engines are useful because they provide a way to find common results among the sets of returned pages and a means of comparing and evaluating individual search engines. Although meta search engines are helpful, almost none of the freely available meta search engines search Google, a search engine indexing one of the largest numbers pages on the Web.
Often using a general search engine can yield frustrating results when attempting to find a specific type of multimedia such as image or audio files. Fortunately there are a variety of multimedia search engines on the Web.
A searchable index of billions of images found across the web. Includes advanced features such as search by image size, format and coloration and restricting searches to specific web sites or domains.
Yahoo! Image Search allows you to search billions of images from across the Web.
Bing image search includes infinite scroll, which lets you easily browse image results without clicking to a new page. Bing also has powerful filtering tools for images to make it easier to find just the right image.
Searches the about.com databases for all kinds of useful clip art
Finds audio, video and images on the Web, including MP3 files
Allows searchers to look for Musicial Instructment Digital Interface (MIDI ) files
Often it can be useful to use a search engine that specifically looks for a type of document or specialized information. For example, if you were looking specifically for medical Information, using a standard search engine might leave you sifting through hundreds of Web sites while using a search engine that specializes in medical information would likely yield more relevant results.
Google Book Search is an index of book content that makes it easy to find books that interest you. Like a card catalog, it helps you learn where to get the full book from a bookstore or a library. Use the Google Book Search homepage to get only book results, or Google.com to see book results as part of your regular search results. Google Book Search makes the full text of millions of books (including out of print and public domain books) instantly searchable, and makes those books discoverable. For authors and publishers, it means that millions of booksare instantly discoverable - and able to be purchased. Google Book Search acts as a free marketing program that protects copyright while dramatically expanding the potential audience for, in theory, every book in the world.
Google News gathers information from nearly 10,000 news sources worldwide and presents news stories in a searchable format within minutes of their online publication. The leading stories are presented as headlines on the Google News home page. These headlines are selected for display entirely by a computer algorithm, without regard to viewpoint or ideology. Google News uses an automated process to pull together related headlines, which enables readers to see many different viewpoints on the same story. Topics are updated continuously throughout the day and you can view new stories by checking the Google News website, subscribing to Google News Alerts via email, or activating a newsfeed. Currently, the Google News service is tailored to 22 international editions.
Scirus is a product developed by Elsevier Science, concentrating soley on collecting scientific content. It searches both the open Web and membership sources such as ScienceDirect, MEDLINE on BioMedNet, and Beilstein on ChemWeb. Scirus features an array of scientific information in data and chart form.
The German Environmental Information Network search engine gathers public affairs information distributed across Web sites run by public institutions in Germany, such as environmental authorities, and agencies and ministries at the federal levels. It functions as an information broker for environmental information in Germany.
Artcyclopedia is an index of hundreds of museum sites and image archives. Visitors can search for where the works of over 5,500 different artists can be viewed online.
Health on the Net Foundation uses both humans and Web crawling to build its index of medical information. Searches can be narrowed by region and a French interface is available.