Duplicate content can be a killer for websites, especially blogs and news sites, if the organization is not handled correctly. Often websites such as blogs and news sites are organized into categories and are then interlinked by other means such as sidebar widgets, related post plugins, and tags in WordPress. With all of the different ways of organizing sites, though, and the reality of pagination, we can quickly get into a hot mess of closely identical pages across our site that do not add value to the user experience and could be treated as duplicate content by the search engines.
So how do we decide what content we want the search engines to index and rank, and once we decide how do we make this happen?
In this post I am going to introduce you (or remind you, if you already know about them) to a few meta tags, placed in the <head> section of your site, that will help you with dealing with duplicate content. At the end, if you’re using WordPress, I’ll show you how to do it using Yoast’s SEO plugin.
Important Meta Tags
Let’s look at the meta tags that are going to be most important to us as we try to trim down duplicate or close-to-duplicate content on our sites.
Rel=”next” and Rel=”prev”
Rel=”next” and rel=”prev” were introduced by Google in September 2011 as a way for SEOs and webmasters to designate that a page is contained in a hierarchy of pages. Because Google hates indexing paginated pages, as they generally do not provide the best user experience, this tag can be used to show the hierarchy to Google.
How to implement rel=”next” and rel=”prev
Let’s say we have http://www.domain.com/category/, http://www.domain.com/category/page2/, and http://www.domain.com/category/page3/ where /page2/ and /page3/ are paginated results under /category/. You would implement the tags in the following way:
On http://www.domain.com/category/, put the following in the <head> section:
<link rel=”next” href=”http://www.domain.com/category/page2″ />
On http://www.domain.com/category/page2, the second page in the series, put the following:
<link rel=”next” href=”http://www.domain.com/category/page3″ />
<link rel=”prev” href=”http://www.domain.com/category/” />
Finally, on http://www.domain.com/category/page3/, put the following:
<link rel=”prev” href=”http://www.domain.com/category/page2/” />
Notice: on the first page only the “next” page should be designated, while on the last page only the “prev” page should be designated.
Meta Robots
The next tag to be familiar with is the meta=”robots” tag. The purpose of this post is not to teach you how to use the robots meta tag in all the different circumstances, so I’ll just refer you to the official Robotstxt.org tutorial. For the purpose of dealing with duplicate content on paginated results, you only need to know the following:
<meta name=”robots” content=”noindex,follow”/>
If you don’t already know, basically what this tag is telling Google is “don’t index this page, but follow all the links.” This way, we keep the page from being indexed (and don’t block the crawler completely as we would do if we used a disallow statement in the robots.txt file), but also allow link equity to flow through to the pages and articles.
Rel=Canonical
The last tag you need to know about is the rel=canonical tag, which hopefully you know about if you have been in SEO for any period of time. The rel=canonical, not to be confused with a 301 redirect, can be used to deal with duplicate content by pointing the tag at another page to tell Google “This page is actually a duplicate of this page over here.” It’s illustrated well by this graphic from SEOmoz:
A common question I hear now is “So why not just point the canonical tag to the top page in the hierarchy?” There are two reasons to not do this:
- The canonical tag will drive the link equity back to the top, thus not allowing the crawler to follow the links on the deeper pages, thus decreasing your internal linking;
- The pages are not EXACT duplicates, and they are constantly changing as new information is published, so the canonical tag is not the best choice here.
That’s Great. Now show me how in WordPress
Alright, so now you know the thought process behind dealing with indexation (or the lack thereof) of pagination. So how do you implement it? I’m going to show you how to do it in WordPress using Yoast’s SEO plugin, and will tell you to bug your dev friends to do it on other platforms.
After you have installed the plugin, go to SEO -> Indexation:
Now check the “Subpages of archives and taxonomies” option:
Boom! That’s it. The plugin automagically includes the rel=”next” and rel=”prev” tags as well as the canonical tags.
Questions? Suggestions? Disagree with me? Let me know in the comments!
John,
thanks for very useful article. Just yesterday Google surprised me with a huge amount of new 404 error page. The problem was because i using Page tag plugin but i solved (i hope) problem with Yoast’s SEO plugin.
Thanks again.
Random question. What would happen if the page you have specified in the rel=”canonical” is set to noindex,follow? will the link juice still be passed to the page? Also, what page do you think google would rank in the SERPs?
Hi John, thanks for the meta robot explanation. I was looking for how to set noindex,follow in wordpress. Also thanks a lot to show us about Yoast WordPress SEO plugin, it’s very usefull.. Thank you 🙂
Hi, I’ve been using the Yoasts SEO for years now, but unfortunately it does not prevent Google from indexing pagination all the time.
Matt Cutts. the Google technical expert and representative says in his video that the ‘noindex’ meta tag is a weak method in trying to prevent Google from indexing a page, since Googlebot will not always obey this tag.
Thanks Doherty….You are awsome…You really saved me a good amount of money..I was about to hire a wordpress developer to fix duplicate tag problem……I was using SEO by Yoast but did not know about no-index feature…
Thanks a lot
Pingback: 50 of the Best Online Marketing Articles From 2012 - Wow Internet Blog
Pingback: Foreclosure University » 4 Steps to Panda-Proof Your Website (Before It’s Too Late!)
Pingback: 4 Steps to Panda-Proof Your Website (Before It’s Too Late!) - Canada SEO Blog – Canada SEO Professional Ltd.
Pingback: 4 Steps to Panda-Proof Your Website (Before It’s Too Late!) ← Visa Pass - Get the Visa
Pingback: 4 Steps to Panda-Proof Your Website (Before It’s Too Late!) « Search Engine Optimization
Pingback: 4 Steps to Panda-Proof Your Website (Before It’s Too Late!) | Giầy cao gót đẹp
Pingback: 4 Steps to Panda-Proof Your Website (Before It’s Too Late!) | Rank High SEO Services
Pingback: 4 Steps to Panda-Proof Your Website (Before It’s Too Late!) | Measured SEM
Pingback: 4 Steps to Panda-Proof Your Website (Before It’s Too Late!) « Your Partners
Pingback: 4 Steps to Panda Proof Your Website Before Its Too Late | KiiPass
Hi John Doherty,
Thanks for the tutorial, I just wanted to noindex category and tag in my blog.
Pingback: 50 of the Best Online Marketing Articles From 2012 - Startup Pangaea
Hi John, great tutorial and very nice explanation. But just for your information, Yoast SEO has been updated to a newer version and now the indexation options are in the Taxonomies section within the plugin settings area. It’s quite easy to put a noindex tag in various areas, when you are using the Yoast SEO plugin.