In a default WordPress installation you get different instances of your content repeated trough the site. For example take a post. It will appear in the homepage, in it’s single version, listed under a category, in the author pages, in search result, etc. Probably you are not comfortable having duplicated content indexed and letting Search Engines pick/guess which version of it is the one you consider the most relevant. Thats why it’s good to take some things into consideration.
What to index
There’s a lot of dicussion around the web of what you should and shouldn’t index. Unfortunately for us there isn’t one solution that will fit perfectly for all sites, to achieve the best results it will usually require planning ahead and even doing some experimenting and tweaking on the road. The type of content, the blog structure even the theme structure and its elements are things that affects these plans and should be taken into consideration prior making a decision. For example, not all themes will always show duplicate content. If your theme category pages show excerpts of the content that you previously took the time to write, then it certainly is good to index that page. Same thing could happen with tags, author and archives pages as well. If those provide different or extra content then why would you not index them.
How can you control this?
Create a robots.txt file
Im sure most of you know what a robots.txt file is for. For those who don’t know, it’s basically a file that goes in the root folder of your site with instructions of what specific files or directories you want to allow or disallow Search Engines to index. Here is a sample of a WordPress robots.txt file:
User-agent: * Disallow: /tag Disallow: /author
To prevent duplicated content, in the sample above decided to disallow the indexing of everything under /tag and /author. Remember not all themes/sites should have these same instructions, you have to choose carefully what you’ll want to remove. As a live example you can see my robots.txt file. If you feel uncomfortable editing files in your server there’s a nice plugin to create this file from within the Admin panel.
Add the correct robots meta tags to your theme pages
To have more specific control, or if you don’t trust enough your robots.txt file, you can always opt to use the robots meta tag. You can use this tag to tell Search Engines not to index the content of a page, and/or not scan it for links to follow. Joost de Valk has put together a great plugin to handle this easily from within WordPress control panel.
Hope you enjoyed. Thanks!