Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Should I set up a disallow in the robots.txt for catalog search results?
-
When the crawl diagnostics came back for my site its showing around 3,000 pages of duplicate content. Almost all of them are of the catalog search results page. I also did a site search on Google and they have most of the results pages in their index too. I think I should just disallow the bots in the /catalogsearch/ sub folder, but I'm not sure if this will have any negative effect?
-
One step at a time = long term success. I wish you the best with it Jordan.
-
Thanks Alan, you are right this site has quite a long way to go. The first crawl was just finished and I notice that the most errors were due to dupe content so I decided I would try and tackle that first. Thank you for all the pointers, I will be taking a look at all those as soon as I can.
-
Totally agree with Alan, it can cause circular navigation problems for crawlers too.
-
Jordan,
Others might have a different view, however that's exactly what I recommend to clients. but only if you've got other html link based ways for bots to get to all the content in a direct manner, and have a good sitemap.xml file to reinforce that.
I am happy to see that you have a sound overall site architecture, however I see no robots.txt file at your root so I'm not sure what's up with that. Also your sitemap.xml file only has 43 URLs in it. that's a problem not because google can't find content by other means, it's just that I've found Google likes that reinforcement, and Bing especially does a better job discovering content with a proper sitemap.xml submitted through their webmaster system (they're less efficient at discovering content by other means).
I'd also suggest you have a big push ahead in dealing with near-duplicate content.
For example:
http://www.durafaucet.com/mk850-orb.html
http://www.durafaucet.com/kitchen-faucets/mk850.html
Sure, these are unique products. Except there's already so little unique content on either page that the common content compounded by the site-wide replication of top, sidebar and footer content means the total weight of uniqueness is on the very minor end of the spectrum.
And then there's the issue of a complete lack of inbound link authority - OpenSiteExplorer.org might be wrong, but currently shows almost no inbound links. Not only will you need inbound links to the home page, but also to as many inner pages as is realistic in terms of implementation capabilities go. This is especially true for category level pages. (including a variety of inbound link anchor text - brand, domain, keyword phrase and generic text).
So if you don't address those type of issues, removing all the dupes that show up in search now won't result in as much long-term value as you'll need.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page disappears from Google search results
Hi, I recently encountered a very strange problem.
Technical SEO | | JoelssonMedia
One of the pages I published in my website ranked very well for a couple of days on top 5, then after a couple of days, the page completely vanished, no matter how direct I search for it, does not appear on the results, I check GSC, everything seems to be normal, but when checking Google analytics, I find it strange that there is no data on the page since it disappeared and it also does not show up on the 'active pages' section no matter how many different computers i keep it open. I have checked to page 9, and used a couple of keyword tools and it appears nowhere! It didn't have any back links, but it was unique and high quality. I have checked on the page does still exist and it is still readable. Has this ´happened to anyone before? Any thoughts would be gratefully received.0 -
I have two robots.txt pages for www and non-www version. Will that be a problem?
There are two robots.txt pages. One for www version and another for non-www version though I have moved to the non-www version.
Technical SEO | | ramb0 -
Robots.txt & meta noindex--site still shows up on Google Search
I have set up my robots.txt like this: User-agent: *
Technical SEO | | RoxBrock
Disallow: / and I have this meta tag in my on a Wordpress site, set up with SEO Yoast name="robots" content="noindex,follow"/> I did "Fetch as Google" on my Google Search Console My website is still showing up in the search results and it says this: "A description for this result is not available because of this site's robots.txt" This site has not shown up for years and now it is ranking above my site that I want to rank for this keyword. How do I get Google to ignore this site? This seems really weird and I'm confused how a site with little content, that has not been updated for years can rank higher than a site that is constantly updated and improved.1 -
Do I need a separate robots.txt file for my shop subdomain?
Hello Mozzers! Apologies if this question has been asked before, but I couldn't find an answer so here goes... Currently I have one robots.txt file hosted at https://www.mysitename.org.uk/robots.txt We host our shop on a separate subdomain https://shop.mysitename.org.uk Do I need a separate robots.txt file for my subdomain? (Some Google searches are telling me yes and some no and I've become awfully confused!
Technical SEO | | sjbridle0 -
2 sitemaps on my robots.txt?
Hi, I thought that I just could link one sitemap from my site's robots.txt but... I may be wrong. So, I need to confirm if this kind of implementation is right or wrong: robots.txt for Magento Community and Enterprise ...
Technical SEO | | Webicultors
Sitemap: http://www.mysite.es/media/sitemap/es.xml
Sitemap: http://www.mysite.pt/media/sitemap/pt.xml Thanks in advance,0 -
Set Canonical for Paginated Content
Hi Guys, This is a follow up on this thread: http://azwa.1clkaccess.in/community/q/dynamic-url-parameters-woocommerce-create-404-errors# I would like to know how I can set a canonical link in Wordpress/Woocommerce which points to "View All" on category pages on our webshop.
Technical SEO | | jeeyer
The categories on my website can be viewed as 24/48 or All products but because the quanity constantly changes viewing 24 or 48 products isn't always possible. To point Google in the right direction I want to let them know that "View All" is the best way to go.
I've read that Google's crawler tries to do this automatically but not sure if this is the case on on my website. Here is some more info on the issue: https://support.google.com/webmasters/answer/1663744?hl=en
Thanks for the help! Joost0 -
Robots.txt and Multiple Sitemaps
Hello, I have a hopefully simple question but I wanted to ask to get a "second opinion" on what to do in this situation. I am working on a clients robots.txt and we have multiple sitemaps. Using yoast I have my sitemap_index.xml and I also have a sitemap-image.xml I do put them in google and bing by hand but wanted to have it added into the robots.txt for insurance. So my question is, when having multiple sitemaps called out on a robots.txt file does it matter if one is before the other? From my reading it looks like you can have multiple sitemaps called out, but I wasn't sure the best practice when writing it up in the file. Example: User-agent: * Disallow: Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-content/plugins/ Sitemap: http://sitename.com/sitemap_index.xml Sitemap: http://sitename.com/sitemap-image.xml Thanks a ton for the feedback, I really appreciate it! :) J
Technical SEO | | allstatetransmission0 -
Invisible robots.txt?
So here's a weird one... Client comes to me for some simple changes, turns out there are some major issues with the site, one of which is that none of the correct content pages are showing up in Google, just ancillary (outdated) ones. Looks like an issue because even the main homepage isn't showing up with a "site:domain.com" So, I add to Webmaster Tools and, after an hour or so, I get the red bar of doom, "robots.txt is blocking important pages." I check it out in Webmasters and, sure enough, it's a "User agent: * Disallow /" ACK! But wait... there's no robots.txt to be found on the server. I can go to domain.com/robots.txt and see it but nothing via FTP. I upload a new one and, thankfully, that is now showing but I've never seen that before. Question is: can a robots.txt file be stored in a way that can't be seen? Thanks!
Technical SEO | | joshcanhelp0