WEBSITE

The Art of SEO : How Links Influence Search Engine Rankings (part 2) - Additional Factors That Influence Link Value

1/8/2011 3:58:13 PM

2. Additional Factors That Influence Link Value

Classic PageRank isn’t the only factor that influences the value of a link. In the following subsections, we discuss some additional factors that influence the value a link passes.

2.1. Anchor text

Anchor text refers to the clickable part of a link from one web page to another. As an example, Figure 10 shows a snapshot of a part of the Alchemist Media Home Page at http://www.alchemistmedia.com.

Figure 10. Anchor text: a strong ranking element


The anchor text for Link #3 in Figure 10 is “SEO Web Site Design”. The search engine uses this anchor text to help it understand what the page receiving the link is about. As a result, the search engine will interpret Link #3 as saying that the page receiving the link is about “SEO Web Site Design”.

The impact of anchor text can be quite powerful. For example, if you link to a web page that has no search-engine-visible content (perhaps it is an all-Flash site), the search engine will still look for signals to determine what the page is about. Inbound anchor text becomes the primary driver in determining the relevance of a page in that scenario.

The power of anchor text also resulted in SEOs engaging in Google Bombing. The idea is that if you link to a given web page from many places with the same anchor text, you can get that page to rank for queries related to that anchor text, even if the page is unrelated.

One notorious Google Bomb was a campaign that targeted the Whitehouse.gov biography page for George W. Bush with the anchor text miserable failure. As a result, that page ranked #1 for searches on miserable failure until Google tweaked its algorithm to reduce the effectiveness of this practice.

However, this still continues to work in Yahoo! Search (as of May 2009), as shown in Figure 11.

Figure 11. Anchor text making pages rank for unrelated terms


President Obama has crept in here too, largely because of a redirect put in place by the White House’s web development team.

2.2. Relevance

Links that originate from sites/pages on the same topic as the publisher’s site, or on a closely related topic, are worth more than links that come from a site on an unrelated topic.

Think of the relevance of each link being evaluated in the specific context of the search query a user has just entered. So, if the user enters used cars in Phoenix and the publisher has a link to the Phoenix used cars page that is from the Phoenix Chamber of Commerce, that link will reinforce the search engine’s belief that the page really does relate to Phoenix.

Similarly, if a publisher has another link from a magazine site that has done a review of used car websites, this will reinforce the notion that the site should be considered a used car site. Taken in combination, these two links could be powerful in helping the publisher rank for “used cars in Phoenix”.

2.3. Authority

This has been the subject of much research. One of the more famous papers, written by Apostolos Gerasoulis and others at Rutgers University and titled “DiscoWeb: Applying Link Analysis to Web Search” (http://www.cse.lehigh.edu/~brian/pubs/1999/www8/), became the basis of the Teoma algorithm, which was later acquired by AskJeeves and became part of the Ask algorithm.

What made this unique was the focus on evaluating links on the basis of their relevance to the linked page. Google’s original PageRank algorithm did not incorporate the notion of topical relevance, and although Google’s algorithm clearly does do this today, Teoma was in fact the first to offer a commercial implementation of link relevance.

Teoma introduced the notion of hubs, which are sites that link to most of the important sites relevant to a particular topic, and authorities, which are sites that are linked to by most of the sites relevant to a particular topic.

The key concept here is that each topic area that a user can search on will have authority sites specific to that topic area. The authority sites for used cars are different from the authority sites for baseball.

Refer to Figure 12 to get a sense of the difference between hub and authority sites.

Figure 12. Hubs and authorities


So, if the publisher has a site about used cars, it seeks links from websites that the search engines consider to be authorities on used cars (or perhaps more broadly, on cars). However, the search engines will not tell you which sites they consider authoritative—making the publisher’s job that much more difficult.

The model of organizing the Web into topical communities and pinpointing the hubs and authorities is an important model to understand (read more about it in Mike Grehan’s paper, “Filthy Linking Rich!” at http://www.search-engine-book.co.uk/filthy_linking_rich.pdf). The best link builders understand this model and leverage it to their benefit.

2.4. Trust

Trust is distinct from authority. Authority, on its own, doesn’t sufficiently take into account whether the linking page or the domain is easy or difficult for spammers to infiltrate. Trust, on the other hand, does.

Evaluating the trust of a website likely involves reviewing its link neighborhood to see what other trusted sites link to it. More links from other trusted sites would convey more trust.

In 2004, Yahoo! and Stanford University published a paper titled “Combating Web Spam with TrustRank” (http://www.vldb.org/conf/2004/RS15P3.PDF). The paper proposed starting with a trusted seed set of pages (selected by manual human review) to perform PageRank analysis, instead of a random set of pages as was called for in the original PageRank thesis.

Using this tactic removes the inherent risk in using a purely algorithmic approach to determining the trust of a site, and potentially coming up with false positives/negatives.

The trust level of a site would be based on how many clicks away it is from seed sites. A site that is one click away accrues a lot of trust; two clicks away, a bit less; three clicks away, even less; and so forth. Figure 13 illustrates the concept of TrustRank.

Figure 13. TrustRank illustrated


The researchers of the TrustRank paper also authored a paper describing the concept of spam mass (http://ilpubs.stanford.edu:8090/697/1/2005-33.pdf). This paper focuses on evaluating the effect of spammy links on a site’s (unadjusted) rankings. The greater the impact of those links, the more likely the site itself is spam. A large percentage of a site’s links being purchased is seen as a spam indicator as well. You can also consider the notion of reverse TrustRank, where linking to spammy sites will lower a site’s TrustRank.

It is likely that Google, Yahoo!, and Bing all use some form of trust measurement to evaluate websites, and that this trust metric can be a significant factor in rankings. For SEO practitioners, getting measurements of trust can be difficult. Currently, mozTrust from SEOmoz’s Linkscape is the only publicly available measured estimation of a page’s TrustRank.

3. How Search Engines Use Links

The search engines use links primarily to discover web pages, and to count the links as votes for those web pages. But how do they use this information once they acquire it? Let’s take a look:


Index inclusion

Search engines need to decide what pages to include in their index. Discovering pages by crawling the Web (following links) is one way they discover web pages (the other is through the use of XML Sitemap files). In addition, the search engines do not include pages that they deem to be of low value because cluttering their index with those pages will not lead to a good experience for their users. The cumulative link value, or link juice, of a page is a factor in making that decision.


Crawl rate/frequency

Search engine spiders go out and crawl a portion of the Web every day. This is no small task, and it starts with deciding where to begin and where to go. Google has publicly indicated that it starts its crawl in PageRank order. In other words, it crawls PageRank 10 sites first, PageRank 9 sites next, and so on. Higher PageRank sites also get crawled more deeply than other sites. It is likely that other search engines start their crawl with the most important sites first as well.

This would make sense, because changes on the most important sites are the ones the search engines want to discover first. In addition, if a very important site links to a new resource for the first time, the search engines tend to place a lot of trust in that link and want to factor the new link (vote) into their algorithms quickly.


Ranking

Links play a critical role in ranking. For example, consider two sites where the on-page content is equally relevant to a given topic. Perhaps they are the shopping sites Amazon.com and (the less popular) JoesShoppingSite.com.

The search engine needs a way to decide who comes out on top: Amazon or Joe. This is where links come in. Links cast the deciding vote. If more sites, and more important sites, link to it, it must be more important, so Amazon wins.

Other  
 
Most View
Windows Server 2012 MMC Administration (part 5) - Building custom MMCs - Adding snap-ins to the console
Sharepoint 2010 : Administering Enterprise Content Management - Document Management (part 6) - Managed Metadata Fields
Routers - March 2014 (Part 2) - Asus RT-AC68U, Tenda W1800R
Asus P8Z77-V LK Mainboard - One Of The Simplest Designs From Asus (Part 3)
Windows Server 2008 R2 networking : Planning and Deploying DNS (part 3) - Setting up DNS zones
Facebook Home - The New Face Of Android? (Part 2)
50 New Windows Secrets (Part 2)
Thermaltake Cases Are Suitable For Everyone’s Budget (Part 1) : Thermaltake Commander MS-I
Multifunction Printer Group Test (Part1) : Canon Pixma MG2250, Canon Pixma MX895
Sony Cybershot DSC-TX30 - Waterproof Camera (Part 1)
Top 10
Sharepoint 2013 : Farm Management - Disable a Timer Job,Start a Timer Job, Set the Schedule for a Timer Job
Sharepoint 2013 : Farm Management - Display Available Timer Jobs on the Farm, Get a Specific Timer Job, Enable a Timer Job
Sharepoint 2013 : Farm Management - Review Workflow Configuration Settings,Modify Workflow Configuration Settings
Sharepoint 2013 : Farm Management - Review SharePoint Designer Settings, Configure SharePoint Designer Settings
Sharepoint 2013 : Farm Management - Remove a Managed Path, Merge Log Files, End the Current Log File
SQL Server 2012 : Policy Based Management - Evaluating Policies
SQL Server 2012 : Defining Policies (part 3) - Creating Policies
SQL Server 2012 : Defining Policies (part 2) - Conditions
SQL Server 2012 : Defining Policies (part 1) - Management Facets
Microsoft Exchange Server 2010 : Configuring Anti-Spam and Message Filtering Options (part 4) - Preventing Internal Servers from Being Filtered