The Art of SEO : How Links Influence Search Engine Rankings (part 1) - The Original PageRank Algorithm

1/8/2011 3:56:22 PM
The concept of using links as a way to measure a site’s importance was first made popular by Google with the implementation of its PageRank algorithm (others had previously written about it, but Google’s rapidly increasing user base popularized it). In simple terms, each link to a web page is a vote for that page, and the page with the most votes wins.

The key to this concept is the notion that links represent an “editorial endorsement” of a web document. Search engines rely heavily on editorial votes. However, as publishers learned about the power of links, some publishers started to manipulate links through a variety of methods. This created situations in which the intent of the link was not editorial in nature, and led to many algorithm enhancements, which we will discuss in this chapter.

To help you understand the origins of link algorithms, the underlying logic of which is still in force today, let’s take a look at the original PageRank algorithm in detail.

1. The Original PageRank Algorithm

The PageRank algorithm was built on the basis of the original PageRank thesis authored by Sergey Brin and Larry Page while they were undergraduates at Stanford University.

In the simplest terms, the paper states that each link to a web page is a vote for that page. However, votes do not have equal weight. So that you can better understand how this works, we’ll explain the PageRank algorithm at a high level. First, all pages are given an innate but tiny amount of PageRank, as shown in Figure 1.

Figure 1. Some PageRank for every page

Pages can then increase their PageRank by receiving links from other pages, as shown in Figure 2.

Figure 2. Pages receiving more PageRank through links

How much PageRank can a page pass on to other pages through links? That ends up being less than the page’s PageRank. In Figure 3 this is represented by f(x), meaning that the passable PageRank is a function of x, the total PageRank.

Figure 3. Some of a page’s PageRank passable to other pages

If this page links to only one other page, it passes all of its PageRank to that page, as shown in Figure 4, where Page B receives all of the passable PageRank of Page A.

Figure 4. Passing of PageRank through a link

However, the scenario gets more complicated because pages will link to more than one other page. When that happens the passable PageRank gets divided among all the pages receiving links. We show that in Figure 5, where Page B and Page C each receive half of the passable PageRank of Page A.

Figure 5. Simple illustration of how PageRank is passed

In the original PageRank formula, link weight is divided equally among the number of links on a page. This undoubtedly does not hold true today, but it is still valuable in understanding the original intent. Now take a look at Figure 6, which depicts a more complex example that shows PageRank flowing back and forth between pages that link to one another.

Figure 6. Cross-linking between pages

Cross-linking makes the PageRank calculation much more complex. In Figure 6, Page B now links back to Page A and passes some PageRank, f(y), back to Page A. Figure 7 should give you a better understanding of how this affects the PageRank of all the pages.

Figure 7. Iterative PageRank calculations

The key observation here is that when Page B links to Page A to make the link reciprocal, the PageRank of Page A (x) becomes dependent on f(y), the passable PageRank of Page B, which happens to be dependent on f(x)!. In addition, the PageRank that Page A passes to Page C is also impacted by the link from Page B to Page A. This makes for a very complicated situation where the calculation of the PageRank of each page on the Web must be determined by recursive analysis.

We have defined new parameters to represent this: q, which is the PageRank that accrues to Page B from the link that it has from Page A (after all the iterative calculations are complete); and z, which is the PageRank that accrues to Page A from the link that it has from Page B (again, after all iterations are complete).

The scenario in Figure 8 adds additional complexity by introducing a link from Page B to Page D. In this example, pages A, B, and C are internal links on one domain, and Page D represents a different site (shown as Wikipedia). In the original PageRank formula, internal and external links passed PageRank in exactly the same way. This became exposed as a flaw because publishers started to realize that links to other sites were “leaking” PageRank away from their own site, as you can see in Figure 8.

Figure 8. PageRank being leaked

Because Page B links to Wikipedia, some of the passable PageRank is sent there, instead of to the other pages that Page B is linking to (Page A in our example). In Figure 7-8, we represent that with the parameter w, which is the PageRank not sent to Page A because of the link to Page D.

The PageRank “leak” concept presented a fundamental flaw in the algorithm once it became public. Like Pandora’s Box, once those who were creating pages to rank at Google investigated PageRank’s founding principles, they would realize that linking out from their own sites would cause more harm than good. If a great number of websites adopted this philosophy, it could negatively impact the “links as votes” concept and actually damage Google’s potential. Needless to say, Google corrected this flaw to its algorithm. As a result of these changes, worrying about PageRank leaks is not recommended. Quality sites should link to other relevant quality pages around the Web.

Even after these changes, internal links from pages still pass some PageRank, so they still have value, as shown in Figure 9.

Figure 9. Internal links still passing some PageRank

Google has changed and refined the PageRank algorithm many times. However, familiarity and comfort with the original algorithm is certainly beneficial to those who practice optimization of Google results

Most View
Silverlight Recipes : Controls - Creating Custom Column Types for a DataGrid
MSI Z77 MPOWER Mainboard - Military Class Burn-in Test Passed (Part 2)
HP Envy 4 - Ultra Envious
Managing Windows Server 2012 (part 6) - Working with Computer Management
Fix Volume Control Problem
SQL Server 2012 : Defining Policies (part 2) - Conditions
Sennheiser PC 323D G4ME – Reasonable Headphone
47 Ways To Speed Up Your PC for Free! (Part 1)
Windows 7 : Computer Management (part 1) - Task Scheduler, Event Viewer
Microsoft Exchange Server 2010 : Working with Active Mailbox Databases (part 3) - Recovering Deleted Mailboxes , Recovering Deleted Items from Mailbox Databases
Top 10
Sharepoint 2013 : Farm Management - Disable a Timer Job,Start a Timer Job, Set the Schedule for a Timer Job
Sharepoint 2013 : Farm Management - Display Available Timer Jobs on the Farm, Get a Specific Timer Job, Enable a Timer Job
Sharepoint 2013 : Farm Management - Review Workflow Configuration Settings,Modify Workflow Configuration Settings
Sharepoint 2013 : Farm Management - Review SharePoint Designer Settings, Configure SharePoint Designer Settings
Sharepoint 2013 : Farm Management - Remove a Managed Path, Merge Log Files, End the Current Log File
SQL Server 2012 : Policy Based Management - Evaluating Policies
SQL Server 2012 : Defining Policies (part 3) - Creating Policies
SQL Server 2012 : Defining Policies (part 2) - Conditions
SQL Server 2012 : Defining Policies (part 1) - Management Facets
Microsoft Exchange Server 2010 : Configuring Anti-Spam and Message Filtering Options (part 4) - Preventing Internal Servers from Being Filtered