Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How can I prevent duplicate pages being indexed because of load balancer (hosting)?
-
The site that I am optimising has a problem with duplicate pages being indexed as a result of the load balancer (which is required and set up by the hosting company).
The load balancer passes the site through to 2 different URLs:
Some how, Google have indexed 2 of the same URLs (which I was obviously hoping they wouldn't) - the first on www and the second on www2.
The hosting is a mirror image of each other (www and www2), meaning I can't upload a robots.txt to the root of www2.domain.com disallowing all. Also, I can't add a canonical script into the website header of www2.domain.com pointing the individual URLs through to www.domain.com etc.
Any suggestions as to how I can resolve this issue would be greatly appreciated!
-
There are two ways to handle load balancing, and it appears that your hosting company / server company chose to use the DNS round-robin routing option.
According to the Wikipedia page on load balancing:
http://en.wikipedia.org/wiki/Load_balancing_(computing)"Load balancing usually involves dedicated software or hardware, such as a multilayer switch or a Domain Name System server process."
Round Robin DNS Load Balancing: Basically you use the DNS routing system to handle requests. When someone visits your site, 50% of the people are routed to www.domain.com, and 50% are routed to ww1.domain.com. Both sites contain the same identical content; it's the URLs that are slightly different. Sometimes the domains are the same; but you have different IP addresses for www.domain.com.
Advantages: you don't need a dedicated load balancing piece of software or hardware, so it's less expensive.
Disadvantages: this technique exposes the individual web servers to the end user seeing the site. You can also suffer from duplicate content penalties, too. Finally, if you are relying on the round robin DNS system for load balancing, and a DNS server or one of the Web servers goes down, there's not an easy fail-over (as many DNS records are cached).More about Round Robin DNS: http://en.wikipedia.org/wiki/Round-robin_DNS
Hardware / Software Load Balancer:
In this case, your DNS zone file tells the end user to go to one IP address when they type in www.domain.com. The hardware or software load balancer then sees the request, and then hands off the content to one of the web servers in a cluster.Advantages: No duplicate content penalty; to the end user, they just see one web server and not individual sub-domains (www.domain.com and ww1.domain.com). A load balancer can also cache specific items like a CSS page, so the load on the Web server is even more minimal.
Disadvantages: You're introducing another piece of hardware or software (i.e. more cost); this piece could also be a single point of failure into the mix. You need someone to figure out how to set this up and make sure it all works.
More on this type of Load Balancing: http://en.wikipedia.org/wiki/Load_balancing_(computing)#Internet-based_services
Load balancing can get complicated as soon as you have databases involved, but with a good design, multiple front end Web servers can talk to one single backend database server. The goal would be to cache as much content as possible as "static" elements, using caching systems like Varnish, that essentially turn database-driven pages into static, old-school HTML pages. And then only when someone needs to save something from the database (i.e. making a purchase on an eCommerce site), the system then interacts with it.
My recommendation:
(1) Move from the Round Robin Robin DNS to a hardware or software load balancer.(2) If that isn't an easy solution, implement the Round Robin DNS solution to use identical A records for each server.
For example, you might have identical entries in your DNS zone files for both DNS servers:
www.domain.com A 69.94.15.10
NS2.domain.com:
www.domain.com A 75.64.18.12This should at least eliminate your duplicate content issue, but you still do have a few disadvantages (described above). This also could lead to server issues, as the servers might be confused if they are the authoritative ones.
And if both servers are sending email, pay special attention to your SPF record, to make sure that you are allowing both IP addresses to be able to send email. (This is often overlooked.)
Hope this is helpful!
-- Jeff
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My product category pages are not being indexed on google can someone help?
My website has been indexed on google and all of its pages can be found on google except for the product category pages - which are where we want our traffic heading to, so this is a big problem for us. Our website is www.skirtinguk.com And an example of a page that isn't being indexed is https://www.skirtinguk.com/product-category/mdf-skirting-board/
Intermediate & Advanced SEO | | chelseaskirtinguk0 -
How to check if the page is indexable for SEs?
Hi, I'm building the extension for Chrome, which should show me the status of the indexability of the page I'm on. So, I need to know all the methods to check if the page has the potential to be crawled and indexed by a Search Engines. I've come up with a few methods: Check the URL in robots.txt file (if it's not disallowed) Check page metas (if there are not noindex meta) Check if page is the same for unregistered users (for those pages only available for registered users of the site) Are there any more methods to check if a particular page is indexable (or not closed for indexation) by Search Engines? Thanks in advance!
Intermediate & Advanced SEO | | boostaman0 -
Contextual FAQ and FAQ Page, is this duplicate content?
Hi Mozzers, On my website, I have a FAQ Page (with the questions-responses of all the themes (prices, products,...)of my website) and I would like to add some thematical faq on the pages of my website. For example : adding the faq about pricing on my pricing page,... Is this duplicate content? Thank you for your help, regards. Jonathan
Intermediate & Advanced SEO | | JonathanLeplang0 -
What is the point of having images clickable loading to their own page?
Hello, Noticed a lot of sites, usually wordpress (seems to be the default) have the images in their posts clickable that load to their own page, showing just the image, usually a .jpg page. I know these pages seem to be easily indexed into google image search and can drive traffic to those specific pages... My questions are... 1. What is the point of driving traffic to a page that is just the image, there are no links to other pages, no ads, nothing... 2. can you redirect these .jpg pages to the actual post page? I ask because on google image search, there are 3 links to click (website, image link, image page), when you click to view the image, it loads the .jpg page, why not have that .jpg redirect to the real content page that has ads and also has other links. Is this white-hat? 3. Do these pages with just images have any negative effect on optimization since they are just images, no content? 4. Can you monetize these .jpg pages? 5. What is the best practice? I understand there is value in traffic, but what is the point of image traffic if I can't monetize those pages?
Intermediate & Advanced SEO | | WebServiceConsulting.com0 -
Whats the best way to remove search indexed pages on magento?
A new client ( aqmp.com.br/ )call me yestarday and she told me since they moved on magento they droped down more than US$ 20.000 in sales revenue ( monthly)... I´ve just checked the webmaster tool and I´ve just discovered the number of crawled pages went from 3.260 to 75.000 since magento started... magento is creating lots of pages with queries like search and filters. Example: http://aqmp.com.br/acessorios/lencos.html http://aqmp.com.br/acessorios/lencos.html?mode=grid http://aqmp.com.br/acessorios/lencos.html?dir=desc&order=name Add a instruction on robots.txt is the best way to remove unnecessary pages of the search engine?
Intermediate & Advanced SEO | | SeoMartin10 -
Duplicate internal links on page, any benefit to nofollow
Link spam is naturally a hot topic amongst SEO's, particularly post Penguin. While digging around forums etc, I watched a video blog from Matt Cutts posted a while ago that suggests that Google only pays attention to the first instance of a link on the page As most websites will have multiple instances of a links (header, footer and body text), is it beneficial to nofollow the additional instances of the link? Also as the first instance of a link will in most cases be within the header nav, does that then make the content link text critical or can good on page optimisation be pulled from the title attribute? I would appreciate the experiences and thoughts Mozzers thoughts on this thanks in advance!
Intermediate & Advanced SEO | | JustinTaylor880 -
Can PDF be seen as duplicate content? If so, how to prevent it?
I see no reason why PDF couldn't be considered duplicate content but I haven't seen any threads about it. We publish loads of product documentation provided by manufacturers as well as White Papers and Case Studies. These give our customers and prospects a better idea off our solutions and help them along their buying process. However, I'm not sure if it would be better to make them non-indexable to prevent duplicate content issues. Clearly we would prefer a solutions where we benefit from to keywords in the documents. Any one has insight on how to deal with PDF provided by third parties? Thanks in advance.
Intermediate & Advanced SEO | | Gestisoft-Qc1 -
Are duplicate links on same page alright?
If I have a homepage with category links, is it alright for those category links to appear in the footer as well, or should you never have duplicate links on one page? Can you please give a reason why as well? Thanks!
Intermediate & Advanced SEO | | dkamen0