Short
Course: Searching the Web with Altavista
by Denny Curtin
I. INTRODUCTION
II. TUTORIAL III.
ALTAVISTA REFERENCE
IV. PRACTICING SIMPLE QUERIES
V. SELF-TEST
Notice
This material may be printed out by individuals for personal
use. It may not be copied or distributed in any way without
the express permission of Tramline, Inc. Any individuals or
organizations seeking authorization to make multiple copies
or to otherwise use this material beyond individual personal
use should contact info@tramline.com.
|
Surfing the Web by clicking links to jump from page to page is
fun but no way to do serious research. To find things quickly, you
need to use a search engine like AltaVista. In this guide, you explore
how to use the powerful search engine to locate the information
you want. To make things interesting, our theme will be Yosemite,
one of America's most beautiful national parks. We'll see how much
information we can find about this area, especially information
on the people such as Carleton Watkins and Ansel Adams who photographed
it and showed its beauty to the rest of the world.
|
View from Artist's Point (click image for
larger view) |
Before we begin, here's a brief chronology of Yosemite to familiarize
you with the area's history.
- 14,000 years ago, glaciers carved the valley.
- 1851 the first recorded visit of nonnative Americans who proposed
the name "Yosemity."
- 1859 the first known photographs taken by C.L. Weed.
- 1861 Carleton Watkins begins photographing Yosemite
- 1868 John Muir makes first visit.
- 1872 Eadward Muybridge begins photographing with large glass-plate
negatives.
- 1890 Yosemite National Park created by an Act of Congress.
- 1892 Sierra Club organized.
- 1916 Ansel Adams begins photographing with a Brownie camera
on a family vacation.
Return to Top
In this tutorial, you are guided step by step through the techniques
of successful searching. Before you begin, you might want to browse
the reference section that contains more detailed explanations of
the search techniques you'll be using.
This tutorial deals only with AltaVista simple queries for pages
on the Web. A separate tutorial covers advanced queries. However
the term "simple" is misleading. Even AltaVista has this
to say about its simple and advanced searches "Advanced search
is for very specific searches and not for general searching. Almost
everything you need to search for can be found quickly and with
better results using the standard search box, where the AltaVista
search services sorts the results by placing the most relevant content
first. However, if you need to find documents within a certain range
of dates or if you have to do some complex Boolean searches there
isnt a more powerful tool on the Web."
As you complete this tutorial, you'll learn how to:
- List pages that contain any word or phrase you specify
- Use upper- and lowercase letters in words and phrases
- Use wildcards in words
- Listing only those pages that contain all of the words or phrases
you specify
- Eliminate sites that contain specific words or phrases
- Search for images
- Find sites linked to another site
- Locate URLs containing specific words
1. Searching for a Word
To begin, use your browser to go to the AltaVista
site at http://www.altavista.com/. The AltaVista screen changes
all of the time, but for searching the key element is the Search
box. When you type a word or phrase (called a query) into
this box and then click the Search button, AltaVista
lists all of the pages on the Web containing the words you entered.
It ranks pages so those with the most matches are listed first.
Let's see what we can find on "Yosemite."
|
The Search box |
Search: To begin, click in the Search box, type
Yosemite and click AltaVista's Search button.
Write down the number of
pages AltaVista has found: ________________ |
Result: The results are listed and there are a
lot of them. Notice how the Results page has a number of elements:
- AltaVista knows the answer to this question is a new feature
that tries to lure you off your search.
- AltaVista found 256570 Web pages for you (your number will be
different because it's a dynamic index). The number here is a
good indicator of how precise your search has been. Obviously,
a quarter-million listings isn't very useful.
- Numbered listings with links to sites is the real section of
interest. Although there are a lot of pages containing the word
you searched for, AltaVista tries to rank them in order of interest.
Check out the first few pages to see how relevant they are.
|
Numbered Listing |
Each listing has a title that is underlined to indicate it's a
link. When you point to it, the mouse pointer will change to a pointing
finger and the URL of the page you will jump to if you click it
is listed on the bottom of your browser. Below the title is a description
of the page, its URL, and finally the date it was last modified.
2. Exploring Case
Now, let's see if the case of characters has any affect on our
search. In the last query, you searched for Yosemite with an uppercase
letter "Y." This time let's use only lowercase letters
in the query and search for yosemite.
Search: Click in the Search box and select the
current entry or delete it. Type yosemite and then
click AltaVista's Search button.
Write down the number of
pages AltaVista has found: ________________ |
Result: You should get slightly more hits when
you use all lowercase. When you use uppercase, AltaVista will only
list pages where the searched for word(s) have the exact same case.
However, when you use lowercase, it will list pages in any case.
Here the difference is minor because you've searched for a place
name. However, you'll find a big difference in some situations.
It's usually better to use lowercase at first so you don't miss
possibilities.
3.Using Wildcards
The asterisk is a wildcard that stands for between zero and five
lowercase letters. One of its main functions is to be sure you get
all versions of a word including singular, plural, and possessive.
Search: Click in the Search box to select the current entry
or delete it. Type Yosemite* and then click AltaVista's Search
button.
Write down the number of
pages AltaVista has found: ________________ |
Result: Pages that have both plurals and possessives such
as Yosemite's beauty are now listed.
4. Using Multiple Words
To expand a search, you can enter more than one word in your query.
Documents that contain either word will be listed although those
that contain all of the words will be at the top of the list. Let's
see what we can find about the photographer Carleton Watkins and
his photographs of Yosemite.
Search: Enter the query Carleton Watkins Yosemite and
click AltaVista's Search button.
Write down the number of
pages AltaVista has found: ________________ |
Result: This search turns up any pages with one or more
of the words, and there are a lot of them. It's called an OR query
because it tells the computer to find any page containing Carleton
OR Watkins OR Yosemite. A page only has to have one of the three
words to be listed.
5. Searching for Phrases
The words Carleton and Watkins in the previous query are a photographer's
name. Let's treat them as a phrase to see what effect that has.
Search: Enter the query "Carleton Watkins"
Yosemite and click AltaVista's Search button.
Write down the number of
pages AltaVista has found: ________________ |
Result: Now the number of hits drops because "Carleton
Watkins" is no longer two words. Pages have to have the full
name, or the word Yosemite to be listed. Documents with just
Carleton or Watkins won't be listed.
6. Forcing Matches
Up until now, pages have been listed when they contain any of the
specified words or phrases. Let's use the plus sign (+) to limit
the matches to those pages that contain both the photographer's
name and Yosemite.
Search: Enter the query +"Carleton Watkins"
+Yosemite and click AltaVista's Search button.
Write down the number of
pages AltaVista has found: ________________ |
Result: Only pages that contain both the name and the place
are listed. This is called an AND query because to be listed, pages
must contain both "Carleton Watkins" AND Yosemite.
7. Preventing Matches
To prevent matches and eliminate some pages while displaying others,
use a minus (-) sign in front of the word or phrase you don't want
in the listed pages. Let's see how many references there are to
Carleton Watkins that don't also refer to Yosemite.
Search: Enter the query +"Carleton Watkins"
-Yosemite and click AltaVista's Search button.
Write down the number of
pages AltaVista has found: ________________ |
Result: Only pages containing the photographer's name but
not the place are listed. This is called a NOT query because to
be listed pages must contain "Carleton Watkins"
but NOT Yosemite.
8. Finding Images
The image: keyword is used to find images on the Web. Let's
see if we can find any of Yosemite. We'll use a wildcard
in the search so we find images in any format.
Search: Enter the query image:yosemite.* and click
AltaVista's Search button.
Write down the number of
pages AltaVista has found: ________________ |
Result: Any graphic file is listed that is named yosemite
and with any extension (GIF, JPEG, etc.).
9. Checking Linked Sites
When you find a site you like, it's easy to find other sites that
like it too. You do this using the link: keyword that gives
you a list of all sites that have established links to the site
you're curious about. Let's see what sites have linked to the yosemite.org
site.
Search: Enter the query link:yosemite.org and click
AltaVista's Search button.
Write down the number of
pages AltaVista has found: ________________ |
Result: All pages containing links to yosemite.org
are listed. These pages probably have content related to Yosemite
since they linked to a Yosemite site.
10. Checking URLs
There may be sites that have a domain name or folder containing
the word yosemite. Let's use the url: keyword to find
out.
Search: Enter the query url:yosemite and click AltaVista's
Search button.
Write down the number of
pages AltaVista has found: ________________ |
Result: Any pages with yosemite anywhere in their
URL are listed. Since an entire page has been given this name these
pages probably have content related to Yosemite and not just a passing
reference to the name.
11. Listing Pages on a Host
To get a list of pages on a host computer, you use the host:
keyword. Let's use it to see what's on the yosemite.org computer.
Search: Enter the query host:yosemite.org and click
AltaVista's Search button.
Write down the number of
pages AltaVista has found: ________________ |
Result: All of the pages on the yosemite.org site
are listed. This is like a table of contents of the site and makes
it easier to locate specific information without having to browse
through the entire site.
Return to Top
There are many search engines on the Web that help you zero in
on information you want to find and AltaVista is one of the most
powerful and most popular. AltaVista is continually searching the
Web for new pages that it adds to its index. It has indexed billions
of words from millions of Web pages. When you enter a word or phrase
in the Search box and then click the Search button, AltaVista
searches its index for Web pages containing those words and displays
a list of them on the Results page. Any pages that it finds are
called matches or hits because at least one word on the page matches
one of the words in your query. (Query is just a computer
word for question.) AltaVista ranks the Web pages it finds
using a set of rules (called an algorithm) and assigns
a score to each page. Those pages with a higher score are listed
first. When entering queries, it's important to phrase your queries
so the documents you want get higher scores. Higher scores are obtained
when the words or phrases in your query:
- appear in the first few words of the document, perhaps in the
documents title.
- are located close to one another in the document.
- appear more than once in the document.
Getting good at searching the Web takes some practice.
- Use the results of one search as a guide to the next one.
- If you get a match that is what you are looking for, see if
the page contains unique words or word patterns that might guide
you in a more refined search to locate other pages.
If you're curious about how AltaVista lets you search the Web,
you might be surprised to learn that it uses a type of program called
a spider (AltaVista named theirs Scooter). This program tirelessly
wanders from link to link on the Web day and night. When it finds
a new or updated page, it sends the entire page back to headquarters.
There, other software takes all of the meaningful words in the document
and lists them in an index along with the address of the page they
are from.
1. The AltaVista Screen Display
When you connect to the AltaVista Web site, its screen display
contains a Search box and lots of other information that changes
continually.
The Search Box
To look for pages on the Web, you first try to think of some rare
or unique words that might appear in a page you are looking for.
The more unusual the words, the more likely you'll find what you're
looking for. You enter these words in the Search box and then click
AltaVista's Search button.
The Results page
When you execute a query, any pages that contain your words are
listed in the order AltaVista thinks is most relevant. Above the
list of matches is a summary of the search that indicates how many
times each of your search words was listed in the AltaVista index.
It then lists the number of Web pages in its index that contain
those words.
The Listed Pages
Below the summary area on the screen is a list of all of the Web
pages that match your query. The title of each listed page is a
hyperlink you can click it to display the actual page. (Note that
the page's title may not actually appear on the document itself.
It is a name assigned to the document by its author.) The description
that follows is taken from the first few lines of the document.
This is followed by the actual URL, the size of the page, and the
date it was last edited. If more than one URL is listed for a site
it means the pages differ in some respect.
The order in which pages are listed on the Results page is determined
by a ranking algorithm. Each listed document is given a grade based
on how many of your search terms it contains, where those words
are in the document, and how close to each other they are.
The Page Number List
When the list of sites is too long to be displayed on a single
page, a list of Results pages is displayed at the bottom of the
screen. You will probably have to scroll to see it. Clicking one
of the page numbers displays the sites listed on that page. You
can also click [next >>] and [<< prev] to
scroll through the pages. Different colors are used to indicate
the page you are currently on, pages you have visited already, and
pages you haven't yet visited.
2. Searching for Words
Knowing what to search for and how to phrase your query are basic
skills you should acquire so you can quickly zero into the information
you want.
Searching for words such as digital or printer can
give thousands or tens of thousands of matches. Searching for Where
gives over 6-million matches, far too many to be useful. On
the other hand, searching for the wrong word form might give too
few. For example, searching for microprocessor (the singular) won't
find microprocessors (the plural). When entering queries, it helps
to know that AltaVista considers a word to be any series of letters
and digits that begin and end with:
- white space
- non-alphabetic characters such as & % $ / # _ ~
- spaces, tabs, line ends, start of document, end of document
When you enter more than one word, AltaVista treats these as OR
queries and lists documents containing any of the words. However,
it does place at the top of the list those pages that contain all
of the words.
Examples
- Library of Congress finds documents containing Library
of Congress. Only some of the documents will refer
to the Library of Congress.
- Watkins Yosemite will list documents containing the
names Watkins OR Yosemite. Only some of the documents
will refer to Yosemite.
Here are some tips to improve your results.
- Start with just one word in your query, and then slowly add
others, examining the results of each search. Generally, the more
words you use, the more matches you'll get.
- If you get too few matches, check your spelling. Searching for
the misspelled word micorprocessor won't get any hits unless
someone else has also misspelled the word in the same way.
- Search for synonyms instead of the original words. For example,
instead of searching for chip search for microprocessor,
or even "Pentium Pro".
3. Understanding Case and Accents
AltaVista is case-sensitive so the case you use to enter search
words is important.
UPPERCASE Uppercase
lowercase
|
Uppercase and lowercase letters |
- Entering a word in lowercase finds words in any case.
For example, searching for buffalo in a query
will match buffalo, Buffalo, BuFFalo,
or BUFFALO.
- Using an uppercase letter forces AltaVista to find only exact
matches for the word. For example, the capitalized word Buffalo
in a query will only match Buffalo in the document, and
ignore other capitalization variants.
- Often words are in one case when they are the first word in
a sentence, or in a title or heading and in another case when
they fall elsewhere in the document. When running your first search,
it's best to use all lowercase so you find all occurrences of
the word or phrase. You should use uppercase only when you want
to force a match to an exact spelling.
- Accents are treated in the same way as capitalization. An accented
word in a query forces an exact match of the entire word. For
example, searching for résumé will not find resume.
To find all occurrences, don't use accents and use all lowercase.
For example, searching for resume will find both
résumé and resume.
4. Using Wildcards
To find all forms of a word, such as words in both singular and
plural, possessives, or words with a similar pattern, use the asterisk
(*). This is a wildcard that will match between zero and five lowercase
letters (not numbers) that occur at its position in the string.
Examples
- librar* finds library, library's, and
libraries
- microprocessor* finds microprocessor, microprocessors,
and even microprocessor's
TIP: You can also use the asterisk in the middle
of a word, provided it is preceded by at least three characters
(otherwise it will find way too many matches). If there are too
many matches, AltaVista ignores the query. Uppercase letters and
numbers will not be matched.
5. Searching for Phrases
Searching for phrases instead of words can dramatically narrow
your search because matches only occur when a Web page has the same
words in the same order. A phrase is a series of adjacent words
separated by white space or punctuation. To search for phrases,
enclose them in quotes. For example, searching for "Library
of Congress" finds only documents with the complete phrase.
(You can also use punctuation to glue words together into phrases,
but it isn't recommended.) Since punctuation is treated as white
space, Carleton;Watkins is the same query as "Carleton
Watkins".
Examples
- "Ansel Adams"
- "Eadward Muybridge"
- "Carleton Watkins"
- "Glacier Point"
- "Half Dome"
When AltaVista indexes pages on the Web, it ignores punctuation
marks and white space except to indicate where words begin and end.
For this reason, you cant search for punctuation or white
space but you can use either to join words in a phrase so it's treated
as a unit. For example, searching for "Library of Congress",
Library/of/Congress, or Library-of-Congress give the
same results. (Don't use the asterisk for this purpose since it
has a special meaning.)
Note that because of the way AltaVista handles punctuation, strings
such as AT&T or yosemite.com are treated as two
words joined together in a phrase.
6. Forcing and Preventing Matches
When you use two or more words or phrases in a query, documents
that contain any of the words are listed. Some of the documents
will not contain all of the words. To ensure that only documents
with a specific word are listed, put a plus sign in front of the
word in your query. Be sure not to require too many such words or
phrases because you may eliminate documents that would be of interest.
+ specifies that
the document must contain the word
Many queries display long lists of matches that are of no use.
If they have a word or phrase in common, such as a company name,
you can prevent them from being listed. To do so, place a minus
sign in front of the word or phrase.
- specifies that
the document must not contain the word
Examples
"Carleton Watkins" Yosemite will list documents
containing the name Carleton Watkins OR Yosemite.
+"Carleton Watkins" +Yosemite will only list
documents containing both Carleton Watkins AND Yosemite.
+"Carleton Watkins" Yosemite will find documents
that contain Carleton Watkins and may or may not contain
Yosemite.
+"Carleton Watkins" -Yosemite will only list
documents containing the name Carleton Watkins and NOT Yosemite.
7. Finding Specific Things on Web Pages
To limit your search to the structured parts of a document, you
use a keyword (in lowercase), a colon, and then the word or phrase
you are searching for.
Searching for Titles and Hyperlinks
- title:"The Yosemite Observer" matches
pages with the phrase The Yosemite Observer in the title.
The title this searches for isn't displayed on the page, it's
a name that has been assigned to the page by the author. The page's
title is what's displayed on the browser's title bar when you
are viewing the page and the first item in the list of pages AltaVista
displays.
- anchor:"Yosemite by Ansel Adams" matches pages
with the visible phrase Yosemite by Ansel Adams in the
text of a hyperlink. The anchor is the part of the link that's
visible on the page and highlighted.
- link:yosemite.org displays pages that contain the specified
link, for example, at least one link to a page with yosemite.org
in its URL. The link isn't visible on the page, you see it on
the browser's status bar when you point to an anchor in your browser.
Searching for Text and Images
- text:yosemite matches pages that contain the
word yosemite in any part of the visible text of a page.
- image:yosemite.* lists pages with yosemite followed
by any extension in an image tag.
Searching for URLs
- url:yosemite looks at all parts of the page's
URL, in this case for yosemite. It will find www.yosemite.org
and www.parks.gov/yosemite.htm.
- host:yosemite looks only at the domain name part of
the page's URL, in this case yosemite and lists all of
the pages on that site. It will find www.yosemite.org and
www.yosemite.org.
- domain:yosemite looks only at the topmost level of the
page's domain name, in this case for yosemite. Folders
with the same name won't be listed so your search will be significantly
narrowed. There are currently only a few domain names including
.gov, com, .org, .edu, .net, and country codes although the list
is being expanded.
Searching for Applets and Objects
- applet:NervousText matches pages containing
the name of the Java applet class found in an applet tag; in this
case, NervousText.
- object:Marquee matches pages containing the name of
the ActiveX object found in an object tag; in this case, Marquee.
8. Why Can't I Locate A Page?
There are times when you can't seem to locate a page, even when
you know what's on it. Generally this is because the page hasn't
yet been indexed. However, it may be listed but the summary is too
meaningless for you to recognize it. If the page isn't listed, there
could be a number of reasons:
- The document is on a computer behind a gateway or firewall.
- AltaVista may not have been able to follow a link to the page
because to so you have to fill out a form or take some other action
that AltaVista can't do.
- The page's author has requested that it not be indexed by robots
or spiders.
- The page may be an island on the Web with no other sites pointing
to it. The only way it would be indexed is if someone sent its
URL to AltaVista
- AltaVista may not have been able to reach the page because the
computer it's on was out of service or congested.
- The page may have been renamed or removed by the owner since
it was last indexed.
Return to Top
There's nothing like a little practice to hone your search techniques.
Go to the AltaVista site and try locating information on the following
topics.
- How was Yosemite carved out by glaciers?
- What was the role of the Mariposa Battalion?
- What did Dr. Lafayette H. Brunnell do?
Locations to learn about
- Glacier Point
- Bridalveil Falls
- El Capitan
- Yosemite Falls
- Half Dome
- The Ahwahnee Hotel
- Tuolumne Meadows
- Merced River and Lake
- Tioga Pass
- Vernal Falls
- Mariposa Grove
Return to Top
Once you have finished this tutorial and read the reference section,
you might test your understanding before you leave. Just answer
these questions.
1. What is the plus sign (+) used for?
2. What is the minus sign (-) used for?
3. How do you indicate a phrase?
4. What documents are displayed when you use two words such as
Vernal Falls?
5. What documents are displayed when you use a phrase such as "Vernal
Falls"?
6. What keyword do you use to look for images?
7. What's the major difference between the domain: and url:
keywords.
8. What would be the effect of entering Buffalo instead
of buffalo in your query?
9. What would you enter to be sure you found glacier, glaciers,
and glacier's.
10. What pages would be listed by the query host:yosemite.com?
Return to Top
|