On occasion, it can be valuable to show search engines one version
of content and show humans a different version. This is technically called
cloaking, and the search engines’ guidelines have near-universal policies
restricting this. In practice, many websites, large and small, appear to
use content delivery effectively and without being penalized by the search
engines. However, use great care if you implement these techniques, and
know the risks that you are taking.1. Cloaking and Segmenting Content Delivery
Before we discuss the risks and potential benefits of
cloaking-based practices, take a look at Figure 1, which shows an illustration of how
cloaking works.
Google’s Matt Cutts, head of Google’s webspam team, has made
strong public statements indicating that all forms of cloaking (other
than First Click Free) are subject to penalty. This was also largely
backed by statements by Google’s John Mueller in a May 2009 interview,
which you can read at http://www.stonetemple.com/articles/interview-john-mueller.shtml.
Google makes its policy pretty clear in its Guidelines on Cloaking
(http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=66355):
Serving up different results based on user agent may cause your
site to be perceived as deceptive and removed from the Google
index.
There are two critical pieces in the preceding quote:
may and user agent. It is true
that if you cloak in the wrong ways, with the wrong intent, Google and
the other search engines may remove you from their
index, and if you do it egregiously, they certainly
will. But in some cases, it may be the right thing
to do, both from a user experience perspective and from an engine’s
perspective.
The key is intent: if the engines feel you are attempting to
manipulate their rankings or results through cloaking, they may take
adverse action against your site. If, however, the intent of your
content delivery doesn’t interfere with their goals, you’re less likely
to be subject to a penalty, as long as you don’t violate important
technical tenets (which we’ll discuss shortly).
What follows are some examples of websites that perform some level
of cloaking:
Google
Search for google toolbar or
google translate or
adwords or any number of Google properties
and note how the URL you see in the search results and the one you
land on almost never match. What’s more, on many of these pages,
whether you’re logged in or not, you might see some content that
is different from what’s in the cache.
NYTimes.com
The interstitial ads, the request to log in/create an
account after five clicks, and the archive inclusion are all
showing different content to engines versus humans.
Wine.com
In addition to some redirection based on your path, there’s
the state overlay forcing you to select a shipping location prior
to seeing any prices (or any pages). That’s a form the engines
don’t have to fill out.
Yelp.com
Geotargeting through cookies based on location is a very
popular form of local targeting that hundreds, if not thousands,
of sites use.
Amazon.com
At SMX Advanced 2008 there was quite a lot of discussion
about how Amazon does some cloaking (http://www.naturalsearchblog.com/archives/2008/06/03/s-secret-to-dominating-serp-results/).
In addition, Amazon does lots of fun things with its Buybox..com subdomain
and with the navigation paths and suggested products if your
browser accepts cookies.
Trulia.com
Trulia was found to be doing some interesting redirects on
partner pages and its own site (http://www.bramblog.com/trulia-caught-cloaking-red-handed/).
The message should be clear. Cloaking isn’t always evil, it won’t
always get you banned, and you can do some pretty smart things with it.
The key to all of this is your intent. If you are doing it for reasons
that are not deceptive and that provide a positive experience for users
and search engines, you might not run into problems. However, there is
no guarantee of this, so use these types of techniques with great care,
and know that you may still get penalized for it.
2. When to Show Different Content to Engines and Visitors
There are a few common causes for displaying content differently
to different visitors, including search engines. Here are some of the
most common ones:
Multivariate and A/B split testing
Testing landing pages for conversions requires that you show
different content to different visitors to test performance. In
these cases, it is best to display the content using
JavaScript/cookies/sessions and give the search engines a single,
canonical version of the page that doesn’t change with every new
spidering (though this won’t necessarily hurt you). Google offers
software called Google Website Optimizer to perform this
function.
Content requiring registration and First Click Free
If you force registration (paid or free) on users to view
specific content pieces, it is best to keep the URL the same for
both logged-in and non-logged-in users and to show a snippet (one
to two paragraphs is usually enough) to non-logged-in users and
search engines. If you want to display the full content to search
engines, you have the option to provide some rules for content
delivery, such as showing the first one to two pages of content to
a new visitor without requiring registration, and then requesting
registration after that grace period. This keeps your intent more
honest, and you can use cookies or sessions to restrict human
visitors while showing the full pieces to the engines.
In this scenario, you might also opt to participate in a
specific program from Google called First Click Free, wherein
websites can expose “premium” or login-restricted content to
Google’s spiders, as long as users who click from the engine’s
results are given the ability to view that first article for free.
Many prominent web publishers employ this tactic, including the
popular site, Experts-Exchange.com.
To be specific, to implement First Click Free, the publisher
must grant Googlebot (and presumably the other search engine
spiders) access to all the content they want indexed, even if
users normally have to log in to see the content. The user who
visits the site will still need to log in, but the search engine
spider will not have to do so. This will lead to the content
showing up in the search engine results when applicable. However,
if a user clicks on that search result, you must permit him to
view the entire article (all pages of a given article if it is a
multiple-page article). Once the user clicks to look at another
article on your site, you can still require him to log in.
For more details, visit Google’s First Click Free program
page at http://googlewebmastercentral.blogspot.com/2008/10/first-click-free-for-web-search.html.
Navigation unspiderable to search engines
If your navigation is in Flash, JavaScript, a Java
application, or another unspiderable format, you should consider
showing search engines a version that has spiderable, crawlable
content in HTML. Many sites do this simply with CSS layers,
displaying a human-visible, search-invisible layer and a layer for
the engines (and less capable browsers, such as mobile browsers).
You can also employ the noscript tag for this purpose, although
it is generally riskier, as many spammers have applied noscript as a way to hide content. Adobe
recently launched a portal on SEO and Flash and provides best
practices that have been cleared by the engines to help make Flash
content discoverable. Take care to make sure the content shown in
the search-visible layer is substantially the same as it is in the
human-visible layer.
Duplicate content
If a significant portion of a page’s content is duplicated,
you might consider restricting spider access to it by placing it
in an iframe that’s restricted by robots.txt. This ensures that you can
show the engines the unique portion of your pages, while
protecting against duplicate content problems. We will discuss
this in more detail in the next section.
Different content for different users
At times you might target content uniquely to users from
different geographies (such as different product offerings that
are more popular in their area), with different screen resolutions
(to make the content fit their screen size better), or who entered
your site from different navigation points. In these instances, it
is best to have a “default” version of content that’s shown to
users who don’t exhibit these traits to show to search engines as
well.