Sometimes a reporter does not know where to look for information or know if the needed information is available online. In these cases, journalists turn to Internet subject directories and search engines. To avoid the frustration of searches that might identify more than 50,000 documents, journalists learn the differences among search engines and how to use advanced search methods. With practice, reporters can easily find fewer, but more pertinent, results to their inquiries.
Help Sections and Advanced Searching Areas: Reporters who do not understand how search engines work simply type in keywords and receive hundreds of responses- many of them irrelevant. Frustrated at having to look through so much information, they abandon search engines. Yet, subject directories and search engines are a big help when used correctly.
Learning about one or two search engines makes searching more efficient and productive Reporters should look at two areas in search engines to learn how to use them well. The first area is the “help” section. Help sections usually indicate how search engines sort through the millions of Web sites on the Internet to present links to pages they think match the topic.
This section often has lists of frequently asked questions and tips for doing searches. A second helpful area to review is “advanced searching.” Using a search engine’s advanced searching capabilities; reporters can refine their searches so that the computer responds with a handful of highly relevant Web sites.
How Search Engines Work: When journalists type the same keyword in different search engines, the results vary because each search engine operates in a different way. All search directories contain large databases. These databases are the search engine’s copied version of the Internet. Sophisticated software called a spider, bot or crawler- within each search engine goes out onto the Internet and compares each Web page it finds with the Web page in its database. If a page on the Internet has changed from the last time the software saw it, it replaces the page in its database with the newer version. If the software encounters a new Web site, it adds the site to the search engine’s database.
Search engines vary in the frequency of Web crawling-some search engine programs index the entire Internet in four weeks; some index it in six. Thus, some search engines respond to a query with more recent Web sites than others. Furthermore, search engine databases vary in volume, offering different numbers of responses to their searches.
Relevancy Ranking: Search engines are not thinking people. Unlike librarians, they cannot ask a journalist to tell them more about the topic or ascertain what way the journalist is using a keyword. The search engine identifies all Web sites containing that keyword, no matter the definition or context.
The search engine then lists the Web sites containing the keyword, according to how pertinent the Web site is to the journalist’s keyword search. This is called relevancy ranking. Search engines’ programming tells them how to compare the keyword with other words on a Web page for relevancy ranking.
The search engine programs use a location-and-frequency method to rank sites. Using the location variable, the search engine compares the keyword with the Web site’s title, assuming that if the keyword appears in a URL address or a Web page title (which appears in the window’s title bar), then the site must have something to do with that subject. It also compares the keyword to words appearing near the top of the Web page, such as in the headline or in the first few paragraphs. Using the frequency variable, the search engine analyzes how often the keyword appears in relation to other words in the page. The more frequently the keyword appears, the more relevant the engine judges the Web site to be. Some search engines include the number of links to a Web page in its relevancy ranking. The help section of a search engine describes its particular method of relevancy ranking.
Formulating Queries: Search engines use varied combinations of searching methods. To formulate a query, journalists must identify the concepts of their topic. They determine likely keywords and specify logical relationships between the words. Search engines support different methods to identify relationships and employ various search procedures. Some of those procedures are described below and may be explained more fully in a search engine’s help section or used in an advanced searching area:
- Boolean logic employs connectors-”OR,” “AND” and “NOT”-to determine the relationship between words. “OR” means that the Web page can have either word. For example, to search for information about gas taxes, a reporter might specify “fee or tax” to appear because different writers may use different words to mean the same thing. “AND” signifies that both words, such as “tax and gas,” must appear somewhere on the same Web page. “NOT” removes unnecessary Web pages that otherwise might be included as relevant pages. In this instance, a reporter would type, “gas NOT natural.” Thus, the search string might look like: “fee OR tax AND gas NOT natural.”
Some search engines recognize implied Boolean logic, which uses symbols in place of “OR,” “AND” and “NOT.” A space between words implies the connector “OR,” a plus sign replaces “AND,” and a minus sign (a hyphen) signifies
“NOT.”
Boolean Logic Implied Boolean Logic
fee OR tax fee tax
tax AND gas +tax +gas
gas NOT natural + gas -natural
Once journalists learn about implied Boolean logic, they understand why a search using the keywords “sex discrimination” yields thousands of Web pages about sex as well as sex discrimination.
- Truncation characters substitute for precise keyword spelling. For example, the exclamation mark replaces an undetermined number of letters at the end of a word. “Gas!” will produce pages with the word “gas” or “gasoline,” but it might also yield pages with the word “gaseous.” In many search engines, an asterisk substitutes for only one character. For instance, “wom*n” will result in Web pages containing “woman” or “women” or “womyn.”
- Proximity operators tell a search engine how close keywords need to be in relation to each other. For example, journalists may use “Elizabeth w/3 Ebony” to include Web pages that have “Elizabeth” and “Ebony” within three words of each other. Results would include “Elizabeth Ebony,” “Elizabeth ‘Beth’ Ebony” or “Elizabeth W. Ebony.” Other proximity operators may be “near” and “adjacent.”
- Field searching tells the search engine where to look for the keywords. “T: plutonium” means that the word “plutonium” must appear in the URL title of the document. Journalists can also search according to author, among other fields.
- Phrase searching-with words inside parentheses or quote marks-helps journalists who need an exact phrase. For instance, typing in “radio-controlled airplanes” tells the search engine to find these exact words in this exact order.
- Relevancy ranking tells reporters how relevant the Web pages probably are to their search.
- Concept searching enables the reporter to choose synonyms to help with keyword meaning.
Most reporters experiment with a variety of subject directories and search engines. Then they decide which one or two they like most and become familiar with its unique characteristics and search procedures.
Choosing a Search Engine or Subject Directory
The programs used to search the Internet fall into three groups: subject directories, search engines and meta-search engines. This section lists some of the most popular search programs. More information about each can be found at its Web site.
A subject directory is a database of Internet files selected by site creators or evaluators and organized into subject categories. These vary considerably in selectivity. Reporters should consult a directory’s policies when choosing a search program:
- Yahoo !-http://www.yahoo.com
- Magellan-http://magellan.excite.com
- The Argus Clearinghouse-http://www.clearinghouse.net
A search engine is a database of Internet files collected by a computer program (called spiders, bots and crawlers). No selection criteria exist for the database:
- InfoSeek-http://www.infoseek.com
- Alta Vista-http://www.altavista.com
- HotBot-http://hotbot.lycos.com
- Excite-http://www.excite.com
- Lycos-http://www.lycos.com
A meta-search engine searches multiple search engines and subject directories simultaneously. Many journalists prefer to use a meta-search engine because it will retrieve the most relevant (but not all) documents from each engine it searches:
- Meta Crawler -http://www.metacrawler.com
- Cyber411-http://www.cyber411.com
- Inference Find-http://www.inference.com