In association with heise online

17 February 2009, 14:21

Search engines fight duplication with canonical

  • Twitter
  • Facebook
  • submit to slashdot
  • StumbleUpon
  • submit to reddit

Google, Microsoft and Yahoo are now providing a way for web site developers to specify a preferred URL for any piece of content on a web site. The problem has been that sites may, legitimately, have the same content on different URLs. This causes problems for search engines which can't easily differentiate between these duplications. Now, a new type of "canonical" link reference can allow a page to express the preferred URL for a search engine to use.

Google explains the process in a with a step-by-step example. In the example, they show some URLs that point to the same page:

  • (preferred URL)
  • 5678

The differences in the examples are caused first by a category parameter and secondly by a tracking id and session id. To set the preferred URL, the page maintainer adds a link element, with the rel attribute set to "canonical" and the href attribute set to the preferred URL. The link element, goes into into the head section of the page. For the examples above this would be

<link rel="canonical" href="" />

A Google search will now understand that the duplicated links belong to, and additional URL properties like PageRank and related informations, says Google, will be transferred as well.

W3C (the World Wide Web Consortium) specifically provided the rel attribute in the link element for use by web developers to define relationships between pages for consumption by search engines.

More details of how canonical will be used by the search engines can be found on the Yahoo announcement, Microsoft's announcement and of course, Google's announcement.

Matt Cutts, a Google Engineer, also published a presentation on the link element and pointed out that canonical plug-ins have already appeared for Wordpress, Magento and Drupal.


Print Version | Send by email | Permalink:

  • July's Community Calendar

The H Open

The H Security

The H Developer

The H Internet Toolkit