The concept of using links as a way to measure a site’s importance
was first made popular by Google with the implementation of its PageRank
algorithm (others had previously written about it, but Google’s rapidly
increasing user base popularized it). In simple terms, each link to a web
page is a vote for that page, and the page with the most votes
wins.The key to this concept is the notion that links represent an
“editorial endorsement” of a web document. Search engines rely heavily on
editorial votes. However, as publishers learned about the power of links,
some publishers started to manipulate links through a variety of methods.
This created situations in which the intent of the link was not editorial
in nature, and led to many algorithm enhancements, which we will discuss
in this chapter.
To help you understand the origins of link algorithms, the
underlying logic of which is still in force today, let’s take a look at
the original PageRank algorithm in detail.
1. The Original PageRank Algorithm
The PageRank algorithm was built on the basis of the original
PageRank
thesis authored by Sergey Brin and Larry Page while they were
undergraduates at Stanford University.
In the simplest terms, the paper states that each link to a web
page is a vote for that page. However, votes do not have equal weight.
So that you can better understand how this works, we’ll explain the
PageRank algorithm at a high level. First, all pages are given an innate
but tiny amount of PageRank, as shown in Figure 1.
Pages can then increase their PageRank by receiving links from
other pages, as shown in Figure 2.
How much PageRank can a page pass on to other pages through links?
That ends up being less than the page’s PageRank. In Figure 3 this is
represented by f(x), meaning that the passable
PageRank is a function of x, the total
PageRank.
If this page links to only one other page, it passes all of its
PageRank to that page, as shown in Figure 4, where Page B receives
all of the passable PageRank of Page A.
However, the scenario gets more complicated because pages will
link to more than one other page. When that happens the passable
PageRank gets divided among all the pages receiving links. We show that
in Figure 5, where
Page B and Page C each receive half of the passable PageRank of Page
A.
In the original PageRank formula, link weight is divided equally
among the number of links on a page. This undoubtedly does not hold true
today, but it is still valuable in understanding the original intent.
Now take a look at Figure 6, which
depicts a more complex example that shows PageRank flowing back and
forth between pages that link to one another.
Cross-linking makes the PageRank calculation much more complex. In
Figure 6, Page B now links back to
Page A and passes some PageRank, f(y), back to Page
A. Figure 7 should give you a
better understanding of how this affects the PageRank of all the
pages.
The key observation here is that when Page B links to Page A to
make the link reciprocal, the PageRank of Page A
(x) becomes dependent on f(y),
the passable PageRank of Page B, which happens to be dependent on
f(x)!. In addition, the PageRank that Page A passes
to Page C is also impacted by the link from Page B to Page A. This makes
for a very complicated situation where the calculation of the PageRank
of each page on the Web must be determined by recursive analysis.
We have defined new parameters to represent this:
q, which is the PageRank that accrues to Page B
from the link that it has from Page A (after all the iterative
calculations are complete); and z, which is the
PageRank that accrues to Page A from the link that it has from Page B
(again, after all iterations are complete).
The scenario in Figure 8 adds
additional complexity by introducing a link from Page B to Page D. In
this example, pages A, B, and C are internal links on one domain, and
Page D represents a different site (shown as Wikipedia). In the original
PageRank formula, internal and external links passed PageRank in exactly
the same way. This became exposed as a flaw because publishers started
to realize that links to other sites were “leaking” PageRank away from
their own site, as you can see in Figure 8.
Because Page B links to Wikipedia, some of the passable PageRank
is sent there, instead of to the other pages that Page B is linking to
(Page A in our example). In Figure 7-8, we
represent that with the parameter w, which is the
PageRank not sent to Page A because of the link to Page D.
The PageRank “leak” concept presented a fundamental flaw in the
algorithm once it became public. Like Pandora’s Box, once those who were
creating pages to rank at Google investigated PageRank’s founding
principles, they would realize that linking out from their own sites
would cause more harm than good. If a great number of websites adopted
this philosophy, it could negatively impact the “links as votes” concept
and actually damage Google’s potential. Needless to say, Google
corrected this flaw to its algorithm. As a result of these changes,
worrying about PageRank leaks is not recommended. Quality sites should
link to other relevant quality pages around the Web.
Even after these changes, internal links from pages still pass
some PageRank, so they still have value, as shown in Figure 9.
Google has changed and refined the PageRank algorithm many times.
However, familiarity and comfort with the original algorithm is
certainly beneficial to those who practice optimization of Google
results