Block an entire subdomain with robots.txt?

kylesuss

Is it possible to block an entire subdomain with robots.txt?

I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas?

kylesuss

Awesome! That did the trick -- thanks for your help. The site is no longer listed

sprynewmedia

Fact is, the robots file alone will never work (the link has a good explanation why - short form: all it does is stop the bots from indexing again).

Best to request removal then wait a few days.

kylesuss

Yeah. As of yet, the site has not been de-indexed. We placed the conditional rule in htaccess and are getting different robots.txt files for the domain and subdomain -- so that works. But I've never done this before so I don't know how long it's supposed to take?

I'll try to verify via Webmaster Tools to speed up the process. Thanks

sprynewmedia

You should do a remove request in Google Webmaster Tools. You have to first verify the sub-domain then request the removal.

See this post on why the robots file alone won't work...

http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts

kylesuss

Awesome. We used your second idea and so far it looks like it is working exactly how we want. Thanks for the idea.

Will report back to confirm that the subdomain has been de-indexed.

sprynewmedia

Option 1 could come with a small performance hit if you have a lot of txt files being used on the server.

There shouldn't be any negative side effects to option 2 if the rewrite is clean (IE not accidently a redirect) and the content of the two files are robots compliant.

Good luck

kylesuss

Thanks for the suggestion. I'll definitely have to do a bit more research into this one to make sure that it doesn't have any negative side effects before implementation

kylesuss

We have a plugin right now that places canonical tags, but unfortunately, the canonical for the subdomain points to the subdomain. I'll look around to see if I can tweak the settings

sprynewmedia

Sounds like (from other discussions) you may be stuck requiring a dynamic robot.txt file which detects what domain the bot is on and changes the content accordingly. This means the server has to run all .txt file as (I presume) PHP.

Or, you could conditionally rewrite the /robot.txt URL to a new file according to sub-domain

RewriteEngine on
RewriteCond %{HTTP_HOST} ^subdomain.website.com$
RewriteRule ^robotx.txt$ robots-subdomain.txt

Then add:

User-agent: *
Disallow: /

to the robots-subdomain.txt file

(untested)

john4math

Placing canonical tags isn't an option? Detect that the page is being viewed through the subdomain, and if so, write the canonical tag on the page back to the root domain?

Or, just place a canonical tag on every page pointing back to the root domain (so the subdomain and root domain pages would both have them). Apparently, it's ok to have a canonical tag on a page pointing to itself. I haven't tried this, but if Matt Cutts says it's ok...

kylesuss

Hey Ryan,

I wasn't directly involved with the decision to create the subdomain, but I'm told that it is necessary to create in order to bypass certain elements that were affecting the root domain.

Nevertheless, it is a blog and the users now need to login to the subdomain in order to access the Wordpress backend to bypass those elements. Traffic for the site still goes to the root domain.

AdoptionHelp

They both point to the same location on the server? So there's not a different folder for the subdomain?

If that's the case then I suggest adding a rule to your htaccess file to 301 the subdomain back to the main domain in exactly the same way people redirect from non-www to www or vice-versa. However, you should ask why the server is configured to have a duplicate subdomain? You might just edit your apache settings to get rid of that subdomain (usually done through a cpanel interface).

Here is what your htaccess might look like:

<ifmodule mod_rewrite.c="">RewriteEngine on
# Redirect non-www to wwww
RewriteCond %{HTTP_HOST} !^www.mydomain.org [NC]
RewriteRule ^(.*)$ http://www.mydomain.org/$1 [R=301,L]</ifmodule>

AndyKuiper

Not to me LOL I think you'll need someone with a bit more expertise in this area than I to assist in this case. Kyle, I'm sorry I couldn't offer more assistance... but I don't want to tell you something if I'm not 100% sure. I suspect one of the many bright SEOmozer's will quickly come to the rescue on this one.

Andy

kylesuss

Hey Andy,

Herein lies the problem. Since the domain and subdomain point to the exact same place, they both utilize the same robots.txt file.

Does that make sense?

AndyKuiper

Hi Kyle Yes, you can block an entire subdomain via robots.txt, however you'll need to create a robots.txt file and place it in the root of the subdomain, then add the code to direct the bots to stay away from the entire subdomain's content.

User-agent: *
Disallow: /

hope this helps

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Block an entire subdomain with robots.txt?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Large robots.txt file

Wildcarding Robots.txt for Particular Word in URL

Baidu Spider appearing on robots.txt

Using subdomains for related landing pages?

Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)

Robots.txt: how to exclude sub-directories correctly?

Best way to block a search engine from crawling a link?

Blocking Dynamic URLs with Robots.txt

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved