Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to de-index old URLs after redesigning the website?
-
Thank you for reading.
After redesigning my website (5 months ago) in my crawl reports (Moz, Search Console) I still get tons of 404 pages which all seems to be the URLs from my previous website (same root domain).
It would be nonsense to 301 redirect them as there are to many URLs. (or would it be nonsense?)
What is the best way to deal with this issue?
-
Thank you Clever PhD, really valuable insights!
-
I completely agree with all of the above - I've taken her point more like my own. Where receiving thousands of annoying 404 errors from pages that haven't existed for many months just gets annoying!

-
I respectfully disagree with all of the above. Please repeat after me, 404s are not bad, they are diagnostic, 404s are not bad, they are diagnostic, 404s are not bad, they are diagnostic.
After redesigning my website (5 months ago) in my crawl reports (Moz, Search Console) I still get tons of 404 pages which all seems to be the URLs from my previous website (same root domain).
**Part 1 Internal links that 404s from Moz Crawl: **The 404s that show up in the Moz crawl are only going to be from an internal link on your website. The Moz crawl only looks at internal links and not links from other website. In other words, if you see 404s in your Moz crawl, that means, somewhere, you are linking to those pages and that is why the 404s are showing up. Download the CSV and you will find them in your Moz crawl. Other tools such as screaming frog, Botify, Deep Crawl, will show you a similar analysis.
Simple solution. Go through your code and remove the internal links on your site that direct the Moz crawler to those pages and the 404s will go away. (FYI this same approach will work for any internal 301s) These 404 errors in the Moz report are great diagnostic signals on where to fix your site. It is bad for users to click on a link within your website and get sent to a page that does not exist.
**Part 2 external links from Search Console: **The 404s that show up in Search console can come from your internal links on your site AND external links from other sites. Google will keep trying to crawl these links due to other sites linking to pages on your site and your own internal links. For internal link fixing - see suggestion above. For external links you need a different approach.
Look at the external links, where are they coming from? Are they from quality websites? Do they go to formerly important pages on your websites (ie pages that were good converters? If so, then use the 301 redirect to send them to the correct replacement page (and this is not always the home page). You get users to the correct page and also any link equity is passed along as well and this can help with your site rankings. If the link goes to former page on your site that was not any good to start with and the links that come into it are poor quality, then you just let the page 404. Tools such as Moz Open Site Explorer or Ahrefs or Majestic can help with this assessment - but usually you can just look at a site linking to you and tell if it is crap or not.
You need to consider the above regardless of if you want to get the pages that are 404ing in question out of the Google index as if you get Google to remove the page from the index, it will then see the internal link on your site and then find the 404 again. If you have removed the links to the 404 pages on your site, eventually Google will stop crawling them and drop out of the index.
Important note regarding the use of robots.txt. Blocking Google from crawling the 404s will not remove the pages from the index, Google will just stop crawling them. Google has to be able to crawl the URL to see the 404 and then see that it is a bad page and then remove the page from the index. Blocking with robots.txt stops Google from doing that. As soon as you take the page out of robots Google will recrawl and the 404 shows up again. Robots.txt treats a symptom that is a red herring, allowing the 404 to occur takes care of the issue permanently.
Dead pages are a natural part of the web. Let Google see the 404 (if it truly is a page that should 404 and has no link equity that should be passed along with a 301). Google will crawl the 404 several times, you will see it in search console several times. It is ok. You are not penalized for X number of 404s. You may lose ranking if you 404 a page that Google used to rank well, but this is just because Google will not keep a page highly ranked that does not exist :-). Help Google out by cleaning up your internal link structure so when it sees that you do not link to the page any more, then that is a signal that the page should 404. Google knows that due to the nature of the web, pages will time out on occasion and show an error. Google will continue to recrawl a page just to make sure, it wants to give you the benefit of the doubt. Therefore, you have to give clear directives by not linking to dead pages so that after Google double and triple checks the page, it will finally drop it. You will see the 404 in your Search Console for several months then it will eventually go away.
Hope that makes sense. Good luck!
-
Hey Lana, If you really think that 301 does not make sense in that case you can always add the URLs in the robots.txt file and once Google will recrawl your website, Google will de-index the pages from the index.
Another thing you can do is using the de-index feature in Google webmaster tool. You can do that by getting in to your GWT, Optimization > Remove URLs and do that accordingly.
Hope this helps!
-
I see the point. Thanks Liam. As the most of our 404 pages starts with /en-GB/ i will do like this:
Disallow: /en-GB/
-
Hi Lana,
I've been having the same problem on one of our websites. I've been 301 redirecting over 5,000 URL's but still receive a lot of 404 errors. One of the main reasons for these 404 errors still appearing is other bots such as Bing Bot that is still crawling the old URL's.
To resolve this, I would just block them in your robots.txt file. We blocked our old product URL's that were under a "product directory like this:
User-agent: *
Disallow: /product/
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google indexed "Lorem Ipsum" content on an unfinished website
Hi guys. So I recently created a new WordPress site and started developing the homepage. I completely forgot to disallow robots to prevent Google from indexing it and the homepage of my site got quickly indexed with all the Lorem ipsum and some plagiarized content from sites of my competitors. What do I do now? I’m afraid that this might spoil my SEO strategy and devalue my site in the eyes of Google from the very beginning. Should I ask Google to remove the homepage using the removal tool in Google Webmaster Tools and ask it to recrawl the page after adding the unique content? Thank you so much for your replies.
Intermediate & Advanced SEO | | Ibis150 -
If my website uses CDN does thousands of 301 redirect can harm the website performance?
Hi, If my website uses CDN does thousands of 301 redirect can harm the website performance? Thanks Roy
Intermediate & Advanced SEO | | kadut1 -
Wrong URLs indexed, Failing To Rank Anywhere
I’m struggling with a client website that's massively failing to rank. It was published in Nov/Dec last year - not optimised or ranking for anything, it's about 20 pages. I came onboard recently, and 5-6 weeks ago we added new content, did the on-page and finally changed from the non-www to the www version in htaccess and WP settings (while setting www as preferred in Search Console). We then did a press release and since then, have acquired about 4 partial match contextual links on good websites (before this, it had virtually none, save for social profiles etc.) I should note that just before we added the (about 50%) new content and optimised, my developer accidentally published the dev site of the old version of the site and it got indexed. He immediately added it correctly to robots.txt, and I assumed it would therefore drop out of the index fairly quickly and we need not be concerned. Now it's about 6 weeks later, and we’re still not ranking anywhere for our chosen keywords. The keywords are around “egg freezing,” so only moderate competition. We’re not even ranking for our brand name, which is 4 words long and pretty unique. We were ranking in the top 30 for this until yesterday, but it was the press release page on the old (non-www) URL! I was convinced we must have a duplicate content issue after realising the dev site was still indexed, so last week, we went into Search Console to remove all of the dev URLs manually from the index. The next day, they were all removed, and we suddenly began ranking (~83) for “freezing your eggs,” one of our keywords! This seemed unlikely to be a coincidence, but once again, the positive sign was dampened by the fact it was non-www page that was ranking, which made me wonder why the non-www pages were still even indexed. When I do site:oursite.com, for example, both non-www and www URLs are still showing up…. Can someone with more experience than me tell me whether I need to give up on this site, or what I could do to find out if I do? I feel like I may be wasting the client’s money here by building links to a site that could be under a very weird penalty 😕
Intermediate & Advanced SEO | | Ullamalm0 -
Should I include URLs that are 301'd or only include 200 status URLs in my sitemap.xml?
I'm not sure if I should be including old URLs (content) that are being redirected (301) to new URLs (content) in my sitemap.xml. Does anyone know if it is best to include or leave out 301ed URLs in a xml sitemap?
Intermediate & Advanced SEO | | Jonathan.Smith0 -
Does Google Index URLs that are always 302 redirected
Hello community Due to the architecture of our site, we have a bunch of URLs that are 302 redirected to the same URL plus a query string appended to it. For example: www.example.com/hello.html is 302 redirected to www.example.com/hello.html?___store=abc The www.example.com/hello.html?___store=abc page also has a link canonical tag to www.example.com/hello.html In the above example, can www.example.com/hello.html every be Indexed, by google as I assume the googlebot will always be redirected to www.example.com/hello.html?___store=abc and will never see www.example.com/hello.html ? Thanks in advance for the help!
Intermediate & Advanced SEO | | EcommRulz0 -
Canonical URLs and Sitemaps
We are using canonical link tags for product pages in a scenario where the URLs on the site contain category names, and the canonical URL points to a URL which does not contain the category names. So, the product page on the site is like www.example.com/clothes/skirts/skater-skirt-12345, and also like www.example.com/sale/clearance/skater-skirt-12345 in another category. And on both of these pages, the canonical link tag references a 3rd URL like www.example.com/skater-skirt-12345. This 3rd URL, used in the canonical link tag is a valid page, and displays the same content as the other two versions, but there are no actual links to this generic version anywhere on the site (nor external). Questions: 1. Does the generic URL referenced in the canonical link also need to be included as on-page links somewhere in the crawled navigation of the site, or is it okay to be just a valid URL not linked anywhere except for the canonical tags? 2. In our sitemap, is it okay to reference the non-canonical URLs, or does the sitemap have to reference only the canonical URL? In our case, the sitemap points to yet a 3rd variation of the URL, like www.example.com/product.jsp?productID=12345. This page retrieves the same content as the others, and includes a canonical link tag back to www.example.com/skater-skirt-12345. Is this a valid approach, or should we revise the sitemap to point to either the category-specific links or the canonical links?
Intermediate & Advanced SEO | | 379seo0 -
De-indexed Link Directory
Howdy Guys, I'm currently working through our 4th reconsideration request and just have a couple of questions. Using Link Detox (www.linkresearchtools.com) new tool they have flagged up a 64 links that are Toxic and should be removed. After analysing them further alot / most of them are link directories that have now been de-indexed by Google. Do you think we should still ask for them to be removed or is this a pointless exercise as the links has already been removed because its been de-indexed. Would like your views on this guys.
Intermediate & Advanced SEO | | ScottBaxterWW0 -
Is it safe to redirect multiple URLs to a single URL?
Hi, I have an old Wordress website with about 300-400 original pages of content on it. All relating to my company's industry: travel in Africa. It's a legitimate site with travel stories, photos, advice etc. Nothing spammy about. No adverts on it. No affiliates. The site hasn't been updated for a couple of years and we no longer have a need for it. Many of the stories on it are quite out of date. The site has built up a modest Mozrank value over the last 5 years, and has a few hundreds organically achieved inbound links. Recently I set up a swanky new branded website on ExpressionEngine on a new domain. My intention is to: Shut down the old site Focus all attention on building up content on the new website Ask the people linking to the old site to my new site instead (I wonder how many will actually do so...) Where possible, setup a 301 redirect from pages on the old site to their closest match on the new site Setup a 301 redirect from the old site's home page to new site's homepage Sounds good, right? But there is one issue I need some advice on... The old site has about 100 pages that do not have a good match on the new site. These pages are outdated or inferior quality, so it doesn't really make sense to rewrite them and put them on the new site. I call these my "black sheep pages". So... for these "black sheep pages" should I (A) redirect the urls to the new site's homepage (B) redirect the urls the old site's home page (which in turn, redirects to the new site's homepage, or (C) not redirect the urls, and let them die a lonely 404 death? OPTION A: oldsite.com/page1.php -> newsite.com
Intermediate & Advanced SEO | | AndreVanKets
oldsite.com/page2.php -> newsite.com
oldsite.com/page3.php -> newsite.com
oldsite.com/page4.php -> newsite.com
oldsite.com/page5.php -> newsite.com
oldsite.com -> newsite.com OPTION B: oldsite.com/page1.php -> oldsite.com
oldsite.com/page2.php -> oldsite.com
oldsite.com/page3.php -> oldsite.com
oldsite.com/page4.php -> oldsite.com
oldsite.com/page5.php -> oldsite.com
oldsite.com -> newsite.com OPTION 😄 oldsite.com/page1.php : do not redirect, let page 404 and disappear forever
oldsite.com/page2.php : do not redirect, let page 404 and disappear forever
oldsite.com/page3.php : do not redirect, let page 404 and disappear forever
oldsite.com/page4.php : do not redirect, let page 404 and disappear forever
oldsite.com/page5.php : do not redirect, let page 404 and disappear forever
oldsite.com -> newsite.com My intuition tells me that Option A would pass the most "link juice" to my new site, but I am concerned that it could also be seen by Google as a spammy redirect technique. What would you do? Help 😐1