Duplicate content can be a killer for websites, especially blogs and news sites, if the organization is not handled correctly. Often websites such as blogs and news sites are organized into categories and are then interlinked by other means such as sidebar widgets, related post plugins, and tags in WordPress. With all of the different ways of organizing sites, though, and the reality of pagination, we can quickly get into a hot mess of closely identical pages across our site that do not add value to the user experience and could be treated as duplicate content by the search engines.
So how do we decide what content we want the search engines to index and rank, and once we decide how do we make this happen?
In this post I am going to introduce you (or remind you, if you already know about them) to a few meta tags, placed in the <head> section of your site, that will help you with dealing with duplicate content. At the end, if you’re using WordPress, I’ll show you how to do it using Yoast’s SEO plugin.
Important Meta Tags
Let’s look at the meta tags that are going to be most important to us as we try to trim down duplicate or close-to-duplicate content on our sites.
Rel=”next” and Rel=”prev”
Rel=”next” and rel=”prev” were introduced by Google in September 2011 as a way for SEOs and webmasters to designate that a page is contained in a hierarchy of pages. Because Google hates indexing paginated pages, as they generally do not provide the best user experience, this tag can be used to show the hierarchy to Google.
How to implement rel=”next” and rel=”prev
Let’s say we have http://www.domain.com/category/, http://www.domain.com/category/page2/, and http://www.domain.com/category/page3/ where /page2/ and /page3/ are paginated results under /category/. You would implement the tags in the following way:
On http://www.domain.com/category/, put the following in the <head> section:
<link rel=”next” href=”http://www.domain.com/category/page2″ />
On http://www.domain.com/category/page2, the second page in the series, put the following:
<link rel=”next” href=”http://www.domain.com/category/page3″ />
<link rel=”prev” href=”http://www.domain.com/category/” />
Finally, on http://www.domain.com/category/page3/, put the following:
<link rel=”prev” href=”http://www.domain.com/category/page2/” />
Notice: on the first page only the “next” page should be designated, while on the last page only the “prev” page should be designated.
The next tag to be familiar with is the meta=”robots” tag. The purpose of this post is not to teach you how to use the robots meta tag in all the different circumstances, so I’ll just refer you to the official Robotstxt.org tutorial. For the purpose of dealing with duplicate content on paginated results, you only need to know the following:
<meta name=”robots” content=”noindex,follow”/>
If you don’t already know, basically what this tag is telling Google is “don’t index this page, but follow all the links.” This way, we keep the page from being indexed (and don’t block the crawler completely as we would do if we used a disallow statement in the robots.txt file), but also allow link equity to flow through to the pages and articles.
The last tag you need to know about is the rel=canonical tag, which hopefully you know about if you have been in SEO for any period of time. The rel=canonical, not to be confused with a 301 redirect, can be used to deal with duplicate content by pointing the tag at another page to tell Google “This page is actually a duplicate of this page over here.” It’s illustrated well by this graphic from SEOmoz:
A common question I hear now is “So why not just point the canonical tag to the top page in the hierarchy?” There are two reasons to not do this:
- The canonical tag will drive the link equity back to the top, thus not allowing the crawler to follow the links on the deeper pages, thus decreasing your internal linking;
- The pages are not EXACT duplicates, and they are constantly changing as new information is published, so the canonical tag is not the best choice here.
That’s Great. Now show me how in WordPress
Alright, so now you know the thought process behind dealing with indexation (or the lack thereof) of pagination. So how do you implement it? I’m going to show you how to do it in WordPress using Yoast’s SEO plugin, and will tell you to bug your dev friends to do it on other platforms.
After you have installed the plugin, go to SEO -> Indexation:
Now check the “Subpages of archives and taxonomies” option:
Boom! That’s it. The plugin automagically includes the rel=”next” and rel=”prev” tags as well as the canonical tags.
Questions? Suggestions? Disagree with me? Let me know in the comments!