Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to prevent Google from crawling our product filter?
-
Hi All,
We have a crawler problem on one of our sites www.sneakerskoopjeonline.nl.
On this site, visitors can specify criteria to filter available products. These filters are passed as http/get arguments. The number of possible filter urls is virtually limitless.
In order to prevent duplicate content, or an insane amount of pages in the search indices, our software automatically adds noindex, nofollow and noarchive directives to these filter result pages. However, we’re unable to explain to crawlers (Google in particular) to ignore these urls.
We’ve already changed the on page filter html to javascript, hoping this would cause the crawler to ignore it. However, it seems that Googlebot executes the javascript and crawls the generated urls anyway.
What can we do to prevent Google from crawling all the filter options?
Thanks in advance for the help.
Kind regards,
Gerwin
-
The following is added to our robots.txt .. now lets wait and see the results
User-agent: * Disallow: /admin/
Disallow: /?
Allow /?product_date=&product_date2=*
Disallow /?product_date=&product_date2=&To check the working of the robots.txt i found a handy website;
-
The url looks like this;
http://www.sneakerskoopjeonline.nl/herensneakers?product_brand=
So just adding;
User-agent: *
Disallow: /*?product_brandShould do the trick?
Most important is that herensneakers itself should be indexed, followed and crawled -
I would use your robots.txt file to prevent them from crawling the specific strings / pages. Go into your Google Webmaster Tools and you can see all the information Google has on your site and any issues, you can also specify robots.txt information in there. That would be the best route as Google is obedient with what is on the robots.txt file. If you want more information about robots.txt, go here.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Brand name not ranking in Google
Hi Moz'ers, Could you help me with something I cannot seem to figure out by myself. In June 2017 my company started a rebranding campaign. We've changed our brand name and launched a new website: https://spotler.com. Everything is going fine, but if you Google our brand name "Spotler" our website doesn't show up. How can it be? Our domain authority is 38. It would be wonderful if you could help me. Let me know if you need more information. Best, Simone
Intermediate & Advanced SEO | | Spotler0 -
Google not Indexing images on CDN.
My URL is: http://bit.ly/1H2TArH We have set up a CDN on our own domain: http://bit.ly/292GkZC We have an image sitemap: http://bit.ly/29ca5s3 The image sitemap uses the CDN URLs. We verified the CDN subdomain in GWT. The robots.txt does not restrict any of the photos: http://bit.ly/29eNSXv. We used to have a disallow to /thumb/ which had a 301 redirect to our CDN but we removed both the disallow in the robots.txt as well as the 301. Yet, GWT still reports none of our images on the CDN are indexed.
Intermediate & Advanced SEO | | alphonseha
The above screenshot is from the GWT of our main domain.The GWT from the CDN subdomain just shows 0. We did not submit a sitemap to the verified subdomain property because we already have a sitemap submitted to the property on the main domain name. While making a search of images indexed from our CDN, nothing comes up: http://bit.ly/293ZbC1While checking the GWT of the CDN subdomain, I have been getting crawling errors, mainly 500 level errors. Not that many in comparison to the number of images and traffic that we get on our website. Google is crawling, but it seems like it just doesn't index the pictures!?
Can anyone help? I have followed all the information that I was able to find on the web but yet, our images on the CDN still can't seem to get indexed.
0 -
Multiple Ecommerce sites, same products
We are a large catalog company with thousands of products across 2 different domains. Google clearly knows that the sites are connected. Both domains are fairly well known brands - thousands of branded searches for each site per month. Roughly half of our products overlap - they appear on both sites. We have a known duplicate content issue - both sites having exactly the same product descriptions, and we are working on it. We've seen that when a product has different content on the 2 sites, frequently, both pages get to page 2 of the SERPs, but that's as far as it goes, despite aggressive white hat link building tactics. 1. Is it possible to get the same product pages on page 1 of the SERPs for both sites? (I think I know the answer...) 2. Should we be canonicalizing (is that a word?) products across the sites? This would get tricky - both sites have roughly the same domain authority, but in different niches. Certain products and keywords naturally rank better on 1 site or the other depending on the niche.
Intermediate & Advanced SEO | | AMHC0 -
Google Cache Is Blank for Text-only
Hi, I'm doing some SEO for www.suprafootwear.com, and for some reason when I go to text-only in google cache, nothing shows up. http://webcache.googleusercontent.com/search?q=cache:suprafootwear.com&es_sm=91&strip=1 That seems to be the case for all of the different pages on the site, but the content is still appearing on the serp. I have never seen this before, and I'm not sure what's happening. Any help would be greatly appreciated. Thanks!
Intermediate & Advanced SEO | | bigwavew0 -
Page position dropped on Google
Hey Guys, My web designer has recommended this forum to use, the reason being: my google position has been dropped from page 1 to page 10 in the last week. The site is weloveschoolsigns.co.uk, but our main business site is textstyles.co.uk the school signs are a product of text styles. I have been told off my SEO company, that because I have changed the school logo to the text styles logo, Google have penalised me for it, and dropped us from page 1 for numerous keywords, to page 10 or more. They have also said that duplicate content within the school site http://www.weloveschoolsigns.co.uk/school-signs-made-easy/ has also a contributed to the drop in positions. (this content is not on the textstyles site) Lastly they said, that having the same telephone number is a definate no no. They said that I have been penalised, because google see the above as trying to monopolise on the market. I don’t know if all this is true, as the SEO is way above my head, but they have quoted me £1250 to repair all the errors, when the site only cost £750. They have also mentioned that because of the above changes, the main text styles site will also be punished. Any thoughts on this matter would be much appreciated as I don't know whether to pay them to crack on, or accept the new positions. Either way I'm very confused. Thanks Thomas
Intermediate & Advanced SEO | | TextStylesUK0 -
Shopify Product Variants vs Separate Product Pages
Let's say I have 10 different models of hats, and each hat has 5 colors. I have two routes I could take: a) Make 50 separate product pages Pros: -Better usability for customer because they can shop for just masks of a specific color. We can sort our collections to only show our red hats. -Help SEO with specific kw title pages (red boston bruins hat vs boston bruins hat). Cons: -Duplicate Content: Hat model in one color will have almost identical description as the same hat in a different color (from a usability and consistency standpoint, we'd want to leave descriptions the same for identical products, switching out only the color) b) Have 10 products listed, each with 5 color variants Pros: -More elegant and organized -NO duplicate Content Cons: -Losing out on color specific search terms -Customer might look at our 'red hats' collection, but shopify will only show the 'default' image of the hat, which could be another color. That's not ideal for usability/conversions. Not sure which route to take. I'm sure other vendors must have faced this issue before. What are your thoughts?
Intermediate & Advanced SEO | | birchlore0 -
Should I prevent Google from indexing blog tag and category pages?
I am working on a website that has a regularly updated Wordpress blog and am unsure whether or not the category and tag pages should be indexable. The blog posts are often outranked by the tag and category pages and they are ultimately leaving me with a duplicate content issue. With this in mind, I assumed that the best thing to do would be to remove the tag and category pages from the index, but after speaking to someone else about the issue, I am no longer sure. I have tried researching online, but there isn't anything that provided any further information. Please can anyone with any experience of dealing with issues like this or with any knowledge of the topic help me to resolve this annoying issue. Any input will be greatly appreciated. Thanks Paul
Intermediate & Advanced SEO | | PaulRogers0