Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How would you create and then segment a large sitemap?
-
I have a site with around 17,000 pages and would like to create a sitemap and then segment it into product categories.
Is it best to create a map and then edit it in something like xmlSpy or is there a way to silo sitemap creation from the outset?
-
Thanks Saijo,
We are trying to silo product types/categories and break them into different sitemaps. I'm familiar with SF but I don't think it will create sitemaps with the granularity that we are looking for.
I'm using XMLSpy but I'm finding it hard to break out blocks of content.
-
To my knowledge, Screaming Frog doesn't allow you to create an XML sitemap. Perhaps Excel allows you to format the output from SF but I'm not sure. I did find a utility called XMLSpy which, though pricey, allows me to do some of the sorting I was looking for. Once sorted, I can manually pull out sections to segment my sitemap. It is a pain in the neck because I can determine a silo and do it automatically. That being said, I think I can develop a sitemap template and have our new web programmer to develop a way to auto generate a group of segmented sitemaps.
Anyone know if there is a canned solution that works with IIS?
-
If you site is structured such that the urls contain the categories you wish to sort , you can use something like Screaming Frog ( http://www.screamingfrog.co.uk/seo-spider/ ) and export all the urls and sort them out via excel in to categories and go that way
NOTE : the free version has a 500 url limit, so you might want to look at paid ( ask them if it can handle 17,00 urls before getting it ) or look at http://home.snafu.de/tilman/xenulink.html ( I haven't used it myself , so don't know if you can export stuff to excel from there )
Good luck mate , sounds like you have a big job ahead of you.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Search Console Showing 404 errors for product pages not in sitemap?
We have some products with url changes over the past several months. Google is showing these as having 404 errors even though they are not in sitemap (sitemap shows the correct NEW url). Is this expected? Will these errors eventually go away/stop being monitored by Google?
Technical SEO | | woshea0 -
Google Search console says 'sitemap is blocked by robots?
Google Search console is telling me "Sitemap contains URLs which are blocked by robots.txt." I don't understand why my sitemap is being blocked? My robots.txt look like this: User-Agent: *
Technical SEO | | Extima-Christian
Disallow: Sitemap: http://www.website.com/sitemap_index.xml It's a WordPress site, with Yoast SEO installed. Is anyone else having this issue with Google Search console? Does anyone know how I can fix this issue?1 -
301 Redirects Relating to Your XML Sitemap
Lets say you've got a website and it had quite a few pages that for lack of a better term were like an infomercial, 6-8 pages of slightly different topics all essentially saying the same thing. You could all but call it spam. www.site.com/page-1 www.site.com/page-2 www.site.com/page-3 www.site.com/page-4 www.site.com/page-5 www.site.com/page-6 Now you decided to consolidate all of that information into one well written page, and while the previous pages may have been a bit spammy they did indeed have SOME juice to pass through. Your new page is: www.site.com/not-spammy-page You then 301 redirect the previous 'spammy' pages to the new page. Now the question, do I immediately re-submit an updated xml sitemap to Google, which would NOT contain all of the old URL's, thus making me assume Google would miss the 301 redirect/seo juice. Or do I wait a week or two, allow Google to re-crawl the site and see the existing 301's and once they've taken notice of the changes submit an updated sitemap? Probably a stupid question I understand, but I want to ensure I'm following the best practices given the situation, thanks guys and girls!
Technical SEO | | Emory_Peterson0 -
Sitemap indexed pages dropping
About a month ago I noticed my pages indexed from my sitemap are dropping.There are 134 pages in my sitemap and only 11 are indexed. It used to be 117 pages and just died off quickly. I still seem to be getting consistant search traffic but I'm just not sure whats causing this. There are no warnings or manual actions required in GWT that I can find.
Technical SEO | | zenstorageunits0 -
Will an XML sitemap override a robots.txt
I have a client that has a robots.txt file that is blocking an entire subdomain, entirely by accident. Their original solution, not realizing the robots.txt error, was to submit an xml sitemap to get their pages indexed. I did not think this tactic would work, as the robots.txt would take precedent over the xmls sitemap. But it worked... I have no explanation as to how or why. Does anyone have an answer to this? or any experience with a website that has had a clear Disallow: / for months , that somehow has pages in the index?
Technical SEO | | KCBackofen0 -
Is it bad to have same page listed twice in sitemap?
Hello, I have found that from an HTML (not xml) sitemap of a website, a page has been listed twice. Is it okay or will it be considered duplicate content? Both the links use same anchor text, but different urls that redirect to another (final) page. I thought ideal way is to use final page in sitemap (and in all internal linking), not the intermediate pages. Am I right?
Technical SEO | | StickyRiceSEO1 -
Sitemap for dynamic website with over 10,000 pages
If I have a website with thousands of products, is it a good idea to create a sitemap for this website for the search engines where you show maybe 250 products on a page so it makes it easy for the search engine to find the part and also puts that part closer to the home page? Seems like google likes pages that are the closest to the home page (less clicks the better)
Technical SEO | | roundbrix0 -
Does 'framing' a website create duplicate content?
Something I have not come across before, but hope others here are able offer advice based on experience: A client has independently created a series of mini-sites, aimed at targeting specific locations. The tactic has worked very well and they have achieved a large amount of well targeted traffic as a result. Each mini-site is different but then in the nav, if you want to view prices or go to the booking page, that then links to what at first appears to be their main site. However, you then notice that the URL is actually situated on the mini-site. What they have done is 'framed' the main site so that it appears exactly the same even when navigating through this exact replica site. Checking the code, there is almost nothing there - in fact there is actually no content at all. Below the head, there is a piece of code: <frameset rows="*" framespacing=0 frameborder=0> <frame src="[http://www.example.com](view-source:http://www.yellowskips.com/)" frameborder=0 marginwidth=0 marginheight=0> <noframes>Your browser does not support frames. Click [here](http://www.example.com) to view.noframes> frameset> Given that main site content does not appear to show in the source code, do we have an issue with duplicate content? This issue is that these 'referrals' are showing in Analytics, despite the fact that the code does not appear in the source, which is slightly confusing for me. They have done this without consultation and I'm very concerned that this could potentially be creating duplicate content of their ENTIRE main site on dozens of mini-sites. I should also add that there are no links to the mini-sites from the main site, so if you guys advise that this is creating duplicate content, I would not be worried about creating a link-wheel if I advise them to link directly to the main site rather than the framed pages. Thanks!
Technical SEO | | RiceMedia0