How To Prevent Duplicate Content in Google And Other Search Engines on Blogger
In my short time blogging on Blogger, I have run into an interesting issue that seems very common. It seems by default, that Blogger blogs will allow archive pages and in some cases even post labels or categories to be indexed by search engines.This is not what you want, as most search engines frown on duplicate content, and this will fill the search results for your site with many useless pages for potential visitors.
One of the most important aspects of your blog is "how" it appears in search results across the major search engines such as Google, Yahoo, and Bing. If your archive and label pages are being indexed, it could potentially hurt your rankings, or worse, rank those archive or label pages higher then the actual article. The good news is we can take a few simple steps to remedy this issue.
Disclaimer:
You must follow these directions to the letter, failure to do so could result in you blocking search engines from your blog, getting de-indexed, etc. I am not responsible for any errors or mistakes that could be made, or any typos in this post. You follow these instructions at your own risk, and I am in no way responsible. You the reader assumes all risk. If you are not comfortable performing any of these steps, then please refrain from doing so or contact an experienced webmaster.Implement Canonical URL's
Canonicalization of your url's allows you to tell major search engines a preferred url version of a specific page. Since its possible to have many version of the same page IE archives for example, its important to specify a canonical url to tell search engines the preferred url of a specific page. To implement Canonical urls across your blogger blog, please follow the steps below.
1. On your Blogger dashboard, click "Template" and then click "Edit HTML" (Backup your template first!)
2. Search for the following string:
<b:include data='blog' name='all-head-content'/>3. If that string is found, your blog template already supports canonical url's and you don't need to do anything else to implement them. If its not found, follow step 4.
4. Search your blog template for </head> Next, put the following code just before the </head> tag. See example below:
</head>
<link rel='canonical' expr:href='data:blog.url'/>
5. Click Save. Thats it! now you have Canonical url's implemented across your site.
Implement noindex and noarchive meta tags to archive pages
By default, archive pages will be crawled and spidered on Blogger. If you don't add the noindex and noarchive meta tags to your archive pages, they will get indexed by the major search engines and will show up in the search results. This is obviously not what you would want, so to rectify this situation, we need to add the correct meta tags just to these archive pages, so they won't get indexed. To do this, please follow the simple steps below:
1. On your Blogger dashboard, click on "Template" and then click on "Edit HTML"(Back up your template first!)
2. Find the <head> tag and copy the code below into it:
<!--Begin noindex,noarchive archive pages -->3. Click Save, and thats it! now "only your archive pages" will have the noindex, noarchive meta tags. The rest of your site will be unaffected. Browse your archive pages and check the source code to see them, then browse to a post or blog homepage and see they are not there. Thats how you want it.
<b:if cond='data:blog.pageType == "archive"'>
<meta content='noindex,noarchive' name='robots'/>
</b:if>
<!--End noindex,noarchive archive pages -->
Add rel="nofollow" attribute to post labels and the label cloud
If you have labels attached to your blog posts, or are using a label cloud on your sidebar (like this site), then you want to add a rel="nofollow" to those label links to tell search engine crawlers such as Google to not follow those links. This one is a bit more difficult, but its doable. All you have to do is follow the steps below, and you should get it implemented correctly.
For post labels - To add rel="nofollow" to labels attached to your posts, follow the steps below.
1. Open your Blogger dashboard, Click "Template" and then Click "Edit HTML" (Backup your Template First!)
2. Search your blogger template for the following string:
<a expr:href='data:label.url' rel='tag'>3. Now change it and add the nofollow to the string like you see below.
<a expr:href='data:label.url' rel='tag,nofollow'>4. Click Save, and thats it! Now all your labels attached to your posts should have the rel="nofollow" attribute attached to them.
Open a post in Firefox and right-click on a post label and select "Inspect Element" You should see something similar to rel"=nofollow" or rel="tag,nofollow". If you do you done it correctly. If you also are using a tag cloud in your sidebar, you have a little bit more work to do yet.
For label cloud - To add rel="nofollow" to your label cloud, follow the steps below:
1. Open your Blogger dashboard, Click "Template" and then Click "Edit HTML" (Backup your Template First!)
2. Search your blogger template for the following string:(You may find it twice)
<a expr:dir='data:blog.languageDirection' expr:href='data:label.url'><data:label.name/></a>3. Now change it to:
<a expr:dir='data:blog.languageDirection' expr:href='data:label.url' rel='nofollow'>You may find that string twice, if you do add the rel="nofollow" after 'data.label.url'
4. Once you have added it, if you have no errors, click Save. Thats it! now all your label links are nofollow.
You can test this by right-clicking a link on your label cloud in Firefox and selecting "Inspect Element" you should see a rel="nofollow" on the link. If you do, you done it correctly.
Using Robots.txt to block access to search and label pages
Lastly, we will use the Robots.txt file to tell search engine crawlers that the /search and /search/labels/ directories are off limits. To do this please follow the steps below:
Important
You must follow these directions to the letter, failure to do so could result in you blocking search engines from your blog.1. Go to your Blogger Dashboard.
2. Click Settings --> then Search Preferences
3. Click Enable Custom Robots.tx
4. Click Edit next to Custom Robots.txt
5. Delete the Contents of the Robots.txt and replace it with the following:
Important: Replace the section of text in red below with your blog url failure to do will will result in errors for search engines on your blog.
User-agent: *6. Click Save, and thats it.
Disallow: /search
Disallow: /search/labels/
Allow: /
Sitemap: http://www.yourblogurlhere/feeds/posts/default?orderby=UPDATED
Now your robots.txt file will tell search engine crawlers that the search and label directories are off limits. All crawlers that adhere to the robots.txt standard will honor these directives, thus preventing labels and blogger search pages from being indexed.As a bonus, we have also included a sitemap directive and pointed it towards your RSS/Atom feed. This will allow search robots to better index your content in a timely manner.
I hope this post helped some of you out there, have a great day!
Attribution: Post Image By Roulex_45 (Own work) [GFDL or CC-BY-SA-3.0-2.5-2.0-1.0], via Wikimedia Commons