2. Additional Factors That Influence Link Value
Classic PageRank isn’t the only factor that influences the value
of a link. In the following subsections, we discuss some additional
factors that influence the value a link passes.
2.1. Anchor text
Anchor text refers to the clickable part of a link from one web
page to another. As an example, Figure 10 shows a snapshot of
a part of the Alchemist Media Home Page at http://www.alchemistmedia.com.
The anchor text for Link #3 in Figure 10 is “SEO Web Site
Design”. The search engine uses this anchor text to help it understand
what the page receiving the link is about. As a result, the search
engine will interpret Link #3 as saying that the page receiving the
link is about “SEO Web Site Design”.
The impact of anchor text can be quite powerful. For example, if
you link to a web page that has no search-engine-visible content
(perhaps it is an all-Flash site), the search engine will still look
for signals to determine what the page is about. Inbound anchor text
becomes the primary driver in determining the relevance of a page in
that scenario.
The power of anchor text also resulted in SEOs engaging in
Google Bombing. The idea is that if you link to a
given web page from many places with the same anchor text, you can get
that page to rank for queries related to that anchor text, even if the
page is unrelated.
One notorious Google Bomb was a campaign that targeted the
Whitehouse.gov biography page for George W. Bush with the anchor text
miserable failure. As a result, that page ranked
#1 for searches on miserable failure until Google
tweaked its algorithm to reduce the effectiveness of this
practice.
However, this still continues to work in Yahoo! Search (as of
May 2009), as shown in Figure 11.
President Obama has crept in here too, largely because of a
redirect put in place by the White House’s web development
team.
2.2. Relevance
Links that originate from sites/pages on the same topic as the
publisher’s site, or on a closely related topic, are worth more than
links that come from a site on an unrelated topic.
Think of the relevance of each link being evaluated in the
specific context of the search query a user has just entered. So, if
the user enters used cars in Phoenix and the
publisher has a link to the Phoenix used cars page that is from the
Phoenix Chamber of Commerce, that link will reinforce the search
engine’s belief that the page really does relate to Phoenix.
Similarly, if a publisher has another link from a magazine site
that has done a review of used car websites, this will reinforce the
notion that the site should be considered a used car site. Taken in
combination, these two links could be powerful in helping the
publisher rank for “used cars in Phoenix”.
2.3. Authority
This has been the subject of much research. One of the more
famous papers, written by Apostolos Gerasoulis and others at Rutgers
University and titled “DiscoWeb: Applying Link Analysis to Web Search”
(http://www.cse.lehigh.edu/~brian/pubs/1999/www8/),
became the basis of the Teoma algorithm, which was later acquired by
AskJeeves and became part of the Ask algorithm.
What made this unique was the focus on evaluating links on the
basis of their relevance to the linked page. Google’s original
PageRank algorithm did not incorporate the notion of topical
relevance, and although Google’s algorithm clearly does do this today,
Teoma was in fact the first to offer a commercial implementation of
link relevance.
Teoma introduced the notion of hubs, which
are sites that link to most of the important sites relevant to a
particular topic, and authorities, which are
sites that are linked to by most of the sites relevant to a particular
topic.
The key concept here is that each topic area that a user can
search on will have authority sites specific to that topic area. The
authority sites for used cars are different from the authority sites
for baseball.
Refer to Figure 12 to get a sense
of the difference between hub and authority sites.
So, if the publisher has a site about used cars, it seeks links
from websites that the search engines consider to be authorities on
used cars (or perhaps more broadly, on cars). However, the search
engines will not tell you which sites they consider
authoritative—making the publisher’s job that much more
difficult.
The model of organizing the Web into topical communities and
pinpointing the hubs and authorities is an important model to
understand (read more about it in Mike Grehan’s paper, “Filthy Linking
Rich!” at http://www.search-engine-book.co.uk/filthy_linking_rich.pdf).
The best link builders understand this model and leverage it to their
benefit.
2.4. Trust
Trust is distinct from authority. Authority, on its own, doesn’t
sufficiently take into account whether the linking page or the domain
is easy or difficult for spammers to infiltrate. Trust, on the other
hand, does.
Evaluating the trust of a website likely involves reviewing its
link neighborhood to see what other trusted sites link to it. More
links from other trusted sites would convey more trust.
In 2004, Yahoo! and Stanford University published a paper titled
“Combating Web Spam with TrustRank” (http://www.vldb.org/conf/2004/RS15P3.PDF). The paper
proposed starting with a trusted seed set of pages (selected by manual
human review) to perform PageRank analysis, instead of a random set of
pages as was called for in the original PageRank thesis.
Using this tactic removes the inherent risk in using a purely
algorithmic approach to determining the trust of a site, and
potentially coming up with false positives/negatives.
The trust level of a site would be based on how many clicks away
it is from seed sites. A site that is one click away accrues a lot of
trust; two clicks away, a bit less; three clicks away, even less; and
so forth. Figure 13 illustrates the
concept of TrustRank.
The researchers of the TrustRank paper also authored a paper
describing the concept of spam mass (http://ilpubs.stanford.edu:8090/697/1/2005-33.pdf).
This paper focuses on evaluating the effect of spammy links on a
site’s (unadjusted) rankings. The greater the impact of those links,
the more likely the site itself is spam. A large percentage of a
site’s links being purchased is seen as a spam indicator as well. You
can also consider the notion of reverse
TrustRank, where linking to spammy sites will lower a
site’s TrustRank.
It is likely that Google, Yahoo!, and Bing all use some form of
trust measurement to evaluate websites, and that this trust metric can
be a significant factor in rankings. For SEO practitioners, getting
measurements of trust can be difficult. Currently, mozTrust from
SEOmoz’s Linkscape is the only publicly available measured estimation
of a page’s TrustRank.
3. How Search Engines Use Links
The search engines use links primarily to discover web pages, and
to count the links as votes for those web pages. But how do they use
this information once they acquire it? Let’s take a look:
Index inclusion
Search engines need to decide what pages to include in their
index. Discovering pages by crawling the Web (following links) is
one way they discover web pages (the other is through the use of
XML Sitemap files). In addition, the search engines do not include
pages that they deem to be of low value because cluttering their
index with those pages will not lead to a good experience for
their users. The cumulative link value, or link juice, of a page
is a factor in making that decision.
Crawl rate/frequency
Search engine spiders go out and crawl a portion of the Web
every day. This is no small task, and it starts with deciding
where to begin and where to go. Google has publicly indicated that
it starts its crawl in PageRank order. In other words, it crawls
PageRank 10 sites first, PageRank 9 sites next, and so on. Higher
PageRank sites also get crawled more deeply than other sites. It
is likely that other search engines start their crawl with the
most important sites first as well.
This would make sense, because changes on the most important
sites are the ones the search engines want to discover first. In
addition, if a very important site links to a new resource for the
first time, the search engines tend to place a lot of trust in
that link and want to factor the new link (vote) into their
algorithms quickly.
Ranking
Links play a critical role in ranking. For example, consider
two sites where the on-page content is equally relevant to a given
topic. Perhaps they are the shopping sites Amazon.com and (the
less popular) JoesShoppingSite.com.
The search engine needs a way to decide who comes out on
top: Amazon or Joe. This is where links come in. Links cast the
deciding vote. If more sites, and more important sites, link to
it, it must be more important, so Amazon wins.