Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
What does Disallow: /french-wines/?* actually do - robots.txt
-
Hello Mozzers - Just wondering what this robots.txt instruction means: Disallow: /french-wines/?*
Does it stop Googlebot crawling and indexing URLs in that "French Wines" folder - specifically the URLs that include a question mark?
Would it stop the crawling of deeper folders - e.g. /french-wines/rhone-region/ that include a question mark in their URL?
I think this has been done to block URLs containing query strings.
Thanks, Luke
-
Glad to help, Luke!
-
Thanks Logan for your help with this - much appreciated. Really helpful!
-
Disallow: /?* is the same thing as Disallow:/?, since the asterisk is a wildcard, both of those disallows prevent any URL that begins with /? from being crawled.
And yes, it is incredibly easy to disallow the wrong thing! The robots.txt tester in Search Console (under the Crawl menu) is very helpful for figuring out what a disallow will catch and what it will let by. I highly recommend testing any new disallows there before releasing them into the wild.
-
Thanks again Logan.
What would Disallow: /?* do because that is what the site I am looking at has implemented. Perhaps it works both ways around?
I imagine it's easy to disallow the wrong thing or possibly not disallow the right thing. Ugh.
-
Disallow: /*?
This disallow literally says to crawlers 'if a URL starts with a slash (all URLs) and has a parameter, don't crawl it'. The * is a wildcard that says anything between / and ? is applicable to the disallow.
It's very easy to disallow the wrong this especially in regards to parameters, for this reason I always do these 2 things rather than using robots.txt:
- Set the purpose of each parameter in Search Console - Go to Crawl > URL Parameters to configure for your site
- Self-referring canonicals - most people disallow URLs with parameters in robots.txt to prevent indexing, but this only prevents crawling. A self-referring canonical pointing to the root level of that URL will prevent indexing or URLs with parameters.
Hope that's helpful!
-
Thanks Logan - I was just reading: Disallow: /*? # block any URL that includes a ? (and thus a query string) - do you know why the ? comes before the * in this case?
-
Hi Luke,
You are correct that this was done to block URLs with parameters. However, since there's no wildcard (the asterisk) before the folder name, the URL would have to start with /french-wines/. This disallow is really only preventing crawling on the single URL www.yoursite.com/french-wines/ with any parameters appended.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can you disallow links via Search Console?
Hey guys, Is it possible in anyway to nofollow links via search console (not disavow) but just nofollow external links pointing to your site? Cheers.
Intermediate & Advanced SEO | | lohardiu90 -
If my website do not have a robot.txt file, does it hurt my website ranking?
After a site audit, I find out that my website don't have a robot.txt. Does it hurt my website rankings? One more thing, when I type mywebsite.com/robot.txt, it automatically redirect to the homepage. Please help!
Intermediate & Advanced SEO | | binhlai0 -
What do you add to your robots.txt on your ecommerce sites?
We're looking at expanding our robots.txt, we currently don't have the ability to noindex/nofollow. We're thinking about adding the following: Checkout Basket Then possibly: Price Theme Sortby other misc filters. What do you include?
Intermediate & Advanced SEO | | ThomasHarvey0 -
Why is Google ranking irrelevant / not preferred pages for keywords?
Over the past few months we have been chipping away at duplicate content issues. We know this is our biggest issue and is working against us. However, it is due to this client also owning the competitor site. Therefore, product merchandise and top level categories are highly similar, including a shared server. Our rank is suffering major for this, which we understand. However, as we make changes, and I track and perform test searches, the pages that Google ranks for keywords never seems to match or make sense, at all. For example, I search for "solid scrub tops" and it ranks the "print scrub tops" category. Or the "Men Clearance" page is ranking for keyword "Women Scrub Pants". Or, I will search for a specific brand, and it ranks a completely different brand. Has anyone else seen this behavior with duplicate content issues? Or is it an issue with some other penalty? At this point, our only option is to test something and see what impact it has, but it is difficult to do when keywords do not align with content.
Intermediate & Advanced SEO | | lunavista-comm0 -
Question about Indexing of /?limit=all
Hi, i've got your SEO Suite Ultimate installed on my site (www.customlogocases.com). I've got a relatively new magento site (around 1 year). We have recently been doing some pr/seo for the category pages, for example /custom-ipad-cases/ But when I search on google, it seems that google has indexed the /custom-ipad-cases/?limit=all This /?limit=all page is one without any links, and only has a PA of 1. Whereas the standard /custom-ipad-cases/ without the /? query has a much higher pa of 20, and a couple of links pointing towards it. So therefore I would want this particular page to be the one that google indexes. And along the same logic, this page really should be able to achieve higher rankings than the /?limit=all page. Is my thinking here correct? Should I disallow all the /? now, even though these are the ones that are indexed, and the others currently are not. I'd be happy to take the hit while it figures it out, because the higher PA pages are what I ultimately am getting links to... Thoughts?
Intermediate & Advanced SEO | | RobAus0 -
Why is /home used in this company's home URL?
Just working with a company that has chosen a home URL with /home latched on - very strange indeed - has anybody else comes across this kind of homepage URL "decision" in the past? I can't see why on earth anybody would do this! Perhaps simply a logic-defying decision?
Intermediate & Advanced SEO | | McTaggart0 -
Is it worth removing date from Blog Posts / Articles
Wondering, is it worth to remove date from articles from seo perspective. Am sure, Google search algorithm would like demote a post written a year back, as against an article on the same post (unless a year old post has very strong Authoritative links) May be it can turn out a bad user experience of removing dates, but if can hide date using Javascripts so as to show it as image to user and hide it from search engines, is it a good idea !!
Intermediate & Advanced SEO | | Modi0 -
How to make SEF URL for PHP/MySQL web site
Hi mozzers! I'm fairly new to SEO topic, but I'm learning fast because all of you, so please take my warm thanks first! The problem: I have a web site based on PHP/MySQL that has no SEF addresses, it's made by unknown CMS, so I cannot use any extensions or modules, I have to write my own SEF extension. The question: Would you suggest me, please an article or idea, what I need to make my URLs search engine friendly? What's best to use: .htaccess or something else? This is the aforementioned web site: www.nortrak.bg Thanks a lot, Kolio
Intermediate & Advanced SEO | | kolio_kolev0