If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!
Yes - I am surprised when pros get it wrong. I created my robots.txt when I installed wordpress. At that time, the only thought to my mind for creating a robots.txt file was to disallow robots from entering into my admin area. Though with experience I learned, that there is no way a robot would crawl into my admin area unless untill I or someone else specifically link to it. So I was surprised to see a few advices given by pros like Micheal Grey did’nt get it quite right and experienced bloggers like Twenty Steps are confused.
The idea of excluding duplicate contents is an excellent one. Google does mark contents as duplicate, and you dont want a cluttered google index. However, on this point having a post in a single category tags so that it would otherwise have different category based urls, and therefore present duplicate contents is not the best idea.
The reason you have category and tags is becasue you want the user to find your contents based on these tags. So if an article makes sense for more then one category, your user may end up finding it through any of those categories. May be becasue s/he has special interest to that tag, or whatever.
So if you have a post that makes sense for more then one category, mark it with these tags. Only dont allow wordpress to generate different urls for the same posts. I dont know about older installations of wordpress, but certainly the permalink structure I am using does exactly that for category based urls. It uses only the first category as part of the url. And for me using category name is also important for having relevant keywords in the urls.
The same goes for barring robots to enter in archives does make sense only when archive presents different urls based on the chronology. However if they dont, then there is certainly no need to prevent robots. But unlike multiple category tags, you dont loose anything if you do prevent. If you have a good sitemap you would already have all the posts in your blog linked and available for a spider to crawl.
Browse more posts marked in:
;

1 graywolf // Jun 25, 2007 at 9:24 am
Here’s the problem though if you don’t use the more tag the content will exist in full on the archive page for each category you put it in and on the single page post page. You are then in situation of hoping google figures it out correctly. While they are right more than they are wrong, they still do get ti wrong and I’m not willing ot take the chance. Does that make sense?
2 John Doe // Jun 25, 2007 at 10:24 am
Yes - that makes sense. But firstly I always use the more tag, so contents are never reproduced in full on more then one page.
Secondly, I put a lot more value of contents being categorized in more then one category, if it fits, so that for end user it is easier to find with his/her own interests.
After all my blog is for the users first, and it seems like we are forgetting the poor user in SEOizing the blogs.
3 graywolf // Jun 28, 2007 at 12:53 pm
Come up with a list of X main categories, every post has to be in one of those, everything else is extra. Block all of the non main categories with robots and you solve the duplicate issue and serve the users at the same time.
If you confuse the bots the user who you are trying to serve will never find you from a search engine
Leave a Comment