Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Disallow wildcard match in Robots.txt
-
This is in my robots.txt file, does anyone know what this is supposed to accomplish, it doesn't appear to be blocking URLs with question marks
Disallow: /?crawler=1
Disallow: /?mobile=1Thank you
-
This is a good reply.
Everyone gets really confused because Robots.txt has very minor, partial wildcard support and that makes people think that Robots.txt files use Regex, which they do not. Instead of having some weird half and half implementation, it would be much better IMO if the Robots.txt initiative / directive were updated to say "yes, you can use full regular expressions with regards to URL string matching".
Many people are left in a kind of silly guessing game because Google doesn't 'properly' elaborate or invest in expanding the definitions to their currently (publicly) assumed end-game.
People assume that if "*" will match any string of characters, "?" will match any individual character when used in a robots.txt file. This would make sense, but it's not the case. AFAIK there are only one or two supported wildcard characters in Robots.txt and that's why people get confused, looking for escape characters and the suchlike.
-
Hi Amanda,
Those lines tell GoogleBot not to crawl urls that have that text fragments.
For example, wont crawl: domain.com/category/product**?mobile=1**BUT, that doesnt mean that will not crawl every URL with question marks. For that, the line should be like this:
Disallow: /*?I do highly recommend you to read this guides:
About /robots.txt - Official site - Robotstxt.org
Robots.txt - Moz
Robots.txt: the ultimate guide - YOAST
The Complete Guide to Robots.txt - PORTENTHope it helps.
Best luck.
GR
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt Syntax for Dynamic URLs
I want to Disallow certain dynamic pages in robots.txt and am unsure of the proper syntax. The pages I want to disallow all include the string ?Page= Which is the proper syntax?
Technical SEO | | btreloar
Disallow: ?Page=
Disallow: ?Page=*
Disallow: ?Page=
Or something else?0 -
Do I need a separate robots.txt file for my shop subdomain?
Hello Mozzers! Apologies if this question has been asked before, but I couldn't find an answer so here goes... Currently I have one robots.txt file hosted at https://www.mysitename.org.uk/robots.txt We host our shop on a separate subdomain https://shop.mysitename.org.uk Do I need a separate robots.txt file for my subdomain? (Some Google searches are telling me yes and some no and I've become awfully confused!
Technical SEO | | sjbridle0 -
One robots.txt file for multiple sites?
I have 2 sites hosted with Blue Host and was told to put the robots.txt in the root folder and just use the one robots.txt for both sites. Is this right? It seems wrong. I want to block certain things on one site. Thanks for the help, Rena
Technical SEO | | renalynd270 -
Good robots txt for magento
Dear Communtiy, I am trying to improve the SEO ratings for my website www.rijwielcashencarry.nl (magento). My next step will be implementing robots txt to exclude some crawling pages.
Technical SEO | | rijwielcashencarry040
Does anybody have a good magento robots txt for me? And what need i copy exactly? Thanks everybody! Greetings, Bob0 -
2 sitemaps on my robots.txt?
Hi, I thought that I just could link one sitemap from my site's robots.txt but... I may be wrong. So, I need to confirm if this kind of implementation is right or wrong: robots.txt for Magento Community and Enterprise ...
Technical SEO | | Webicultors
Sitemap: http://www.mysite.es/media/sitemap/es.xml
Sitemap: http://www.mysite.pt/media/sitemap/pt.xml Thanks in advance,0 -
Blocked jquery in Robots.txt, Any SEO impact?
I've heard that Google is now indexing links and stuff available in javascript and jquery. My webmastertools is showing that some links are blocked in robots.txt of jquery. Sorry I'm not a developer or designer. I want to know is there any impact of this on my SEO? and also how can I unblock it for the robots? Check this screenshot: http://i.imgur.com/3VDWikC.png
Technical SEO | | hammadrafique0 -
Allow or Disallow First in Robots.txt
If I want to override a Disallow directive in robots.txt with an Allow command, do I have the Allow command before or after the Disallow command? example: Allow: /models/ford///page* Disallow: /models////page
Technical SEO | | irvingw0 -
Exact match subdomains
Hi, I have seen significant SEO benefits from owning exact match domains and was wondering whether exact match subdomains hold the same (or some) of these benefits? eg. halloweencostumes.co.uk vs. halloween [dot] costumes.co.uk Many thanks.
Technical SEO | | martyc0