Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Should I use meta noindex and robots.txt disallow?
-
Hi, we have an alternate "list view" version of every one of our search results pages
The list view has its own URL, indicated by a URL parameter
I'm concerned about wasting our crawl budget on all these list view pages, which effectively doubles the amount of pages that need crawling
When they were first launched, I had the noindex meta tag be placed on all list view pages, but I'm concerned that they are still being crawled
Should I therefore go ahead and also apply a robots.txt disallow on that parameter to ensure that no crawling occurs? Or, will Googlebot/Bingbot also stop crawling that page over time? I assume that noindex still means "crawl"...
Thanks

-
Hi,
Thanks, I will do some testing to confirm that this behaves how I would like it to
-
if all pages are 100#5 not indexed then I would block it in robots.txt, Google's John Muller confirmed to me that Googlebot will continue to crawl every link to check to see if a nofollow or noindex has changed status.
So as a result we blocked our pages with robots.txt and saw a great increases in index/crawl rates on pages we want Google to pay attention to. It also reduces waste in server resources.
However if there are any pages that are index, if you block them in robots.txt then Googlebot will never be able to crawl the link to determine that it should be noindex. This means it could stay in a permanent stage of indexed.
I hope that answers all your questions?
-
When you say:
nofollow will tell the crawlers to not crawl the page
I believe you mean to say that this will tell the crawlers not to crawl the links on the page, the page itself is itself still "crawled" is it not?
But yes, you are right to say, that once robots.txt disallow is in place, the meta tag will not be seen and thus be moot (at which point I may as well take it off).
It would be nice to be able to say "don't crawl this and don't put it in the index"... but is there a way?
-
noindex only tells the search crawlers to not include the page in the index but still allows for them to crawl the page. nofollow will tell the crawlers to not crawl the page.
robots.txt will accomplish this as well but both I think would be overkill.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Move to new domain using Canonical Tag
At the moment, I am moving from olddomain.com (niche site) to the newdomain.com (multi-niche site). Due to some reasons, I do not want to use 301 right now and planning to use the canonical pointing to the new domain instead. Would Google rank the new site instead of the old site? From what I have learnt, the canonical tag lets Google know that which is the main source of the contents. Thank you very much!
Intermediate & Advanced SEO | | india-morocco0 -
No index detected in robots meta tag GSC issue_Help Please
Hi Everyone, We just did a site migration ( URL structure change, site redesign, CMS change). During migration, dev team messed up badly on a few things including SEO. The old site had pages canonicalized and self canonicalized <> New site doesn't have anything (CMS dev error) so we are working retroactively to add canonicalization mechanism The legacy site had URL’s ending with a trailing slash “/” <> new site got redirected to Set of url’s without “/” New site action : All robots are allowed: A new sitemap is submitted to google search console So here is my problem (it been a long 24hr night for me 🙂 ) 1. Now when I look at GSC homepage URL it says that old page is self canonicalized and currently in index (old page with a trailing slash at the end of URL). 2. When I try to perform a live URL test, I get the message "No: 'noindex' detected in 'robots' meta tag" , so indexation cant be done. I have no idea where noindex is coming from. 3. Robots.txt in search console still showing old file ( no noindex there ) I tried to submit new file but old one still coming up. When I click on "See live robots.txt" I get current robots. 4. I see that old page is still canonicalized and attempting to index redirected old page might be confusing google Hope someone can help to get the new page indexed! I really need it 🙂 Please ping me if you need more clarification. Thank you ! Thank you
Intermediate & Advanced SEO | | bgvsiteadmin1 -
Using the same image across the site?
Hi just wondering i'm using the same image across 20 pages which are optimized for SEO purposes. I was wondering is there issues with this from SEO standpoint? Will Google devalue the page because the same image is being used? Cheers.
Intermediate & Advanced SEO | | seowork2140 -
What do you add to your robots.txt on your ecommerce sites?
We're looking at expanding our robots.txt, we currently don't have the ability to noindex/nofollow. We're thinking about adding the following: Checkout Basket Then possibly: Price Theme Sortby other misc filters. What do you include?
Intermediate & Advanced SEO | | ThomasHarvey0 -
Robots.txt, does it need preceding directory structure?
Do you need the entire preceding path in robots.txt for it to match? e.g: I know if i add Disallow: /fish to robots.txt it will block /fish
Intermediate & Advanced SEO | | Milian
/fish.html
/fish/salmon.html
/fishheads
/fishheads/yummy.html
/fish.php?id=anything But would it block?: en/fish
en/fish.html
en/fish/salmon.html
en/fishheads
en/fishheads/yummy.html
**en/fish.php?id=anything (taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier! As basically I'm wanting to block many URL that have BTS- in such as: http://www.example.com/BTS-something
http://www.example.com/BTS-somethingelse
http://www.example.com/BTS-thingybob But have other pages that I do not want blocked, in subfolders that also have BTS- in, such as: http://www.example.com/somesubfolder/BTS-thingy
http://www.example.com/anothersubfolder/BTS-otherthingy Thanks for listening0 -
How does the use of Dynamic meta tags effect SEO?
I'm evaluating a new client site which was built buy another design firm. My question is they are dynamically creating meta tags and I'm concerned that it is hurting their SEO. When I view the page source this is what I see. <meta name="<a class="attribute-value">keywords</a>" id="<a class="attribute-value">keywordsGoHere</a>" content="" /> <meta name="<a class="attribute-value">description</a>" id="<a class="attribute-value">descriptionGoesHere</a>" content="" /> <title id="<a class="attribute-value">titleGoesHere</a>">title> To me it looks like the tags are not being added to the page, however the title is showing when you view it in a browser and if use a spider view tool, it sees the title. I'm guess it is being called from a DB. So I'm a little concerned though that the search engines are not really seeing the title and description. I'm not worried about the keywords tag. Can anyone shed some light on how this might work? Why it might not being showing the text for the description in the page code and if that will hurt SEO? Thanks for the help!
Intermediate & Advanced SEO | | BbeS0 -
Soft 404's from pages blocked by robots.txt -- cause for concern?
We're seeing soft 404 errors appear in our google webmaster tools section on pages that are blocked by robots.txt (our search result pages). Should we be concerned? Is there anything we can do about this?
Intermediate & Advanced SEO | | nicole.healthline4 -
Noindex,follow is a waste of link juice?
On my wordpress shopping cart plugin, I have three pages /account, /checkout and /terms on which I have added “noindex,follow” attribute. But I think I may be wasting link juice on these pages as they are not to be indexed anyway, so is there any point giving them any link juice? I can add “noindex,nofollow” on to the page itself. However, the actual text/anchor link to these pages on the site header will remain “follow” as I have no means of amending that right now. So this presents the following two scenarios – No juice flows from homepage to these 3 pages (GOOD) – This would be perfect then, as the pages themselves have nofollow attribute. Juice flows from homepage to these pages (BAD) - This may mean that the juice flows from homepage anchor text links to these 3 pages BUT then STOPS there as they have “nofollow” attribute on that page. This will be a bigger problem and if this is the case and I cant stop the juice from flowing in, then ill rather let it flow out to other pages. Hope you understand my question, any input is very much appreciated. Thanks
Intermediate & Advanced SEO | | SamBuck1