Duplicate content is a growing concern among site owners. WordPress creates several pages with duplicate content, this includes date archives, categories, tags, etc. Apart from these, content scrapers are also a threat to newer sites. Sometimes these content scrapers could even rank higher than the original site. In this article, I will talk about protecting your WordPress site from duplicate content issues, and dealing with content scrapers.
Identifying Duplicate Content
Search engines are smart enough to find out the permalink of an article and differentiate between archives and single entries. One common myth found among new bloggers is that search engines will penalize them for their archive pages. This is not true.
However, in some cases an archive page can be exactly similar to several other pages on your site. Lets take a look at some possible case-scenarios when you may accidentally create duplicate content.
1. A category archive with just one post will be an exact copy of the original post.
2. A tag archive with just one post will be a copy of the original post
3. On a single author blog, the author archive will be exactly the same as your main index and date archives.
If your site has many tags that you have only once, and your tag archive templates are showing the full post, then this means you have many tag pages which are almost identical to some of your posts. This makes it difficult for search engines to figure out which one of them they should consider the main link and this is where the trouble begins. Search engines may consider this a deliberate attempt to create duplicate content for higher rankings.
First of all you should learn how to effectively use categories and tags to sort your content. If you are in the habit of creating too many categories and too many tags with the same name, then you are not doing it right. To deal with this situation you can use Term Management Tools plugin to merge categories, tags, and other taxonomies.
Another way to deal with archives issue is to instruct search engines not to index or follow those pages. This can be achieved using WordPress SEO plugin. After installing and activating the plugin go to SEO » Titles & Metas. Click on the Other tab and you will find Meta Robots entry for your author and date archives. You can noindex, nofollow them, or even disable those archives. If you are running a single author blog, then I will recommend you to noindex, nofollow, and disable author archives on your site.
You can also do the same for categories and tags by clicking on the Taxonomies tab.
Excerpt vs Full content
An easy way to avoid duplicate content is to use excerpts on all your archive and index pages. Using excerpts not only helps you avoid the duplicate content issue, it also increases your page views and improves page load times across your website.
To replace full posts with excerpts you may need to edit your theme or child theme’s template files like archive.php, category.php, tag.php, etc. You need to find the instances of:
<?php the_content(); ?>
and replace those with:
<?php the_excerpt(); ?>
Another way to do this throughout your site is by using this code in your theme’s functions.php file or a site specific WordPress plugin.
// Add filter to the_content add_filter('the_content', 'my_excerpts'); function my_excerpts($content = false) { // If is the home page, an archive, or search results if(is_front_page() || is_archive() || is_search()) : global $post; $content = $post->post_excerpt; // If an excerpt is set in the Optional Excerpt box if($content) : $content = apply_filters('the_excerpt', $content); // If no excerpt is set else : $content = $post->post_content; $content = strip_shortcodes($content); $content = str_replace(']]>', ']]>', $content); $content = strip_tags($content); $excerpt_length = 55; $words = explode(' ', $content, $excerpt_length + 1); if(count($words) > $excerpt_length) : array_pop($words); array_push($words, '...'); $content = implode(' ', $words); endif; $content = '<p>' . $content . '</p>'; endif; endif; // Make sure to return the content return $content; }
This code simply filters out the_content with our own function and displays excerpt instead of full post.
Excerpts in RSS Feed
An easier way to deal with content scrapers is by showing excerpts instead of full articles in your RSS feeds. Many content scaping sites actually use RSS aggregators to automatically fetch and report the content from various sites. To deal with this simply go to Settings » Reading, and select Summary for your feeds.
One disadvantage of this is that your subscribers will not be able to read full articles in their feed readers. But on a positive note, this also means more pageviews on your site.
Using Google Webmaster Tools
Google Webmaster tools is a collection of tools provided by the search engine to webmasters. These tools allow you to see how your site is doing on the search engine. Not only it provides useful information about search queries used to find your site, URLs from your site in Google’s index, and structured data. It will also show you crawl errors, warnings, and other useful information that might be affecting your search rankings. This includes duplicate titles and descriptions.
If you are worried about content scrapers, then Webmaster tools is an efficient way to help Google figure out which content appeared first.
To summarize the whole topic, the best way to improve your site’s performance in search engines is by creating great quality content. Don’t worry too much about duplicate content, and keep your efforts focused on new quality content and you will see that your site will start ranking higher. If you liked this article, then please join us on Twitter and Google+
Comments Leave a Reply