XML Sitemaps Explained: Everything You Need to Know
Every few months someone asks me whether XML sitemaps still matter, usually right after they've read a tweet claiming Google "doesn't need them anymore." Let me put that to rest immediately: sitemaps matter, they just don't work the way most people think they do. If you've been copying sitemap best-practice articles that were written in 2016 and never updated, some of what you believe is probably wrong.
This is going to be a thorough walkthrough — what sitemaps actually do, which fields are useful versus ceremonial, how size limits work in practice, when you need a sitemap index, and how to submit properly in Search Console. No filler.
What a Sitemap Actually Does (and Doesn't Do)
An XML sitemap is a file that lists URLs you want search engines to crawl and consider for indexing. That's it. It is a discovery hint, not a guarantee of indexing, not a ranking signal, not a crawl-priority override.
Google's crawlers follow links. On a well-linked site — meaning your internal link structure connects everything logically — Googlebot would find every important page eventually without a sitemap at all. The sitemap becomes genuinely valuable in three specific situations:
- Large sites with deep or orphaned pages. If you have a product catalog with 80,000 SKUs and some of those pages receive zero internal links because your navigation only goes three levels deep, Googlebot may not find them for months. The sitemap shortens that window.
- New sites with few or no inbound links. Before other sites link to you, your crawl budget is thin. A sitemap submitted to Search Console seeds the initial crawl.
- Media-heavy sites with non-HTML content. Google has specialized sitemap extensions for images, video, and news. If you publish videos on your own domain rather than YouTube, a video sitemap is one of the few ways to get rich video results in search.
For a small brochure site with 20 pages and a solid internal link structure, the sitemap's practical impact is minimal. I still recommend having one — it's trivially cheap to maintain — but it won't move the needle the way fixing your heading hierarchy or improving Core Web Vitals would.
The lastmod and priority Fields: Mostly Theater
This is where I'm going to disagree with approximately 90% of existing sitemap tutorials.
The <priority> field is effectively ignored by Google. It was intended to tell crawlers which pages matter most, scored from 0.0 to 1.0. In practice, most CMS plugins set every page to 0.5 or set the homepage to 1.0 and everything else to 0.8, which communicates nothing meaningful. When everyone scores 0.8, the signal disappears. Google confirmed years ago that they don't use priority as a crawl or ranking input. You can include it if your CMS generates it automatically, but don't spend time crafting meaningful values — it won't matter.
<lastmod> is more nuanced. Google does use it, but conditionally and skeptically. The key thing Gary Illyes and others from the Search Relations team have said repeatedly: Google only trusts your <lastmod> value if it's consistent and accurate. If your CMS updates the lastmod timestamp every time you change a sidebar widget or add a comment, Googlebot learns that your lastmod is unreliable and stops factoring it in. When that happens, you've turned a useful signal into noise.
Done correctly, lastmod is valuable. A news site that publishes updated articles should reflect the actual content revision date. An e-commerce site should update lastmod when product prices or availability change meaningfully. A page whose content hasn't changed since 2022 should still show 2022. Accurate lastmod helps Google allocate crawl budget more efficiently by skipping pages that haven't changed since the last crawl.
<changefreq> is the other field that's widely misunderstood. Values like "daily" or "weekly" are suggestions, not schedules. Google treats them as weak hints at best. Skip populating changefreq or leave it at "monthly" as a generic default — it won't help, but it also won't hurt.
Size Limits You Need to Respect
The official limits are clear and worth memorizing:
- Maximum 50,000 URLs per sitemap file
- Maximum 50MB uncompressed (or 50MB compressed with gzip)
If your sitemap hits either limit, you need to split it. This is where most people run into trouble — they set up their CMS plugin years ago, never looked at the generated file again, and now have a 68,000-URL sitemap that search engines are partially or completely ignoring.
The gzip compression point is practical. A sitemap with 50,000 URLs as raw XML can easily reach 10-15MB. Gzip-compressed, that same file is often under 2MB. Most web servers can serve .xml.gz files with the right MIME type and Content-Encoding headers. Yoast, Rank Math, and most enterprise sitemap generators handle compression automatically, but if you've hand-rolled your sitemap generation, add it.
One subtlety: the 50,000 URL limit applies to indexable URLs in your sitemap. Don't include pages with noindex meta tags in your sitemap — it's contradictory to tell Google about a URL and simultaneously tell it not to index that URL. The two signals don't cancel each other gracefully; they just create ambiguity. Keep your sitemap clean: only pages you want indexed, that return 200, with canonical tags pointing to themselves.
Sitemap Index Files: Organizing at Scale
Once you have more than 50,000 URLs, or once it makes sense to organize different content types separately, you need a sitemap index file. This is a meta-sitemap — a file that lists other sitemap files rather than listing URLs directly.
The structure looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-posts.xml</loc>
<lastmod>2024-03-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2024-03-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-pages.xml</loc>
<lastmod>2024-03-14</lastmod>
</sitemap>
</sitemapindex>
You can have up to 50,000 child sitemaps in an index file, each containing up to 50,000 URLs — so the theoretical maximum across one index is 2.5 billion URLs. You're unlikely to need that ceiling.
The practical benefit of organized sitemap indexes goes beyond scale. When you split by content type — posts, products, categories, pages — you can diagnose crawl coverage issues more precisely in Search Console. If your sitemap-products.xml shows 12,000 submitted URLs but only 6,000 indexed, that's actionable. With a monolithic sitemap, spotting that pattern is harder.
For sites using Cloudflare or another CDN, consider whether your dynamically generated sitemaps are being cached aggressively. A cached sitemap that's six hours old when you've published 200 new articles creates an unnecessary discovery lag. Set a sensible Cache-Control header on sitemap files — something like max-age=3600 is usually reasonable.
Submitting to Google Search Console
The submission process is straightforward, but there are a few details that matter.
In Search Console, go to Indexing → Sitemaps in the left navigation. Enter the path to your sitemap (relative to your verified property root) and submit. If you're using an index file, submit just the index — you don't need to submit each child sitemap individually.
After submission, Search Console shows you a status line for each submitted sitemap: when it was last read, how many URLs were discovered, and whether any errors were found. "Read" means Google fetched the file. "Discovered" doesn't mean indexed — it means the URLs are in Google's crawl queue. The "Indexed" count in Search Console's broader coverage report is what actually tracks indexing status.
A few things to watch for after submission:
HTTP errors on the sitemap itself. If your sitemap URL returns a 403 or 500, Search Console will report a fetch error. Check that your sitemap isn't blocked by authentication, IP restrictions, or a broken CDN rule.
URLs in the sitemap blocked by robots.txt. This is a classic misconfiguration. If robots.txt disallows the /products/ path and your sitemap includes product URLs, Google won't crawl them. Search Console's URL Inspection tool can catch this: check individual URLs from your sitemap to verify they're crawlable.
Sitemap not updating after content changes. This usually means your CMS is either generating sitemaps statically (as a cached file that regenerates infrequently) or the server is caching the dynamic output too aggressively. Worth checking if you notice your new content taking longer than expected to appear in Search Console's discovered URLs.
One thing Search Console does not do: it doesn't prioritize indexing based on sitemap submission alone. Submitting a 40,000-URL sitemap doesn't mean all 40,000 URLs will be indexed promptly or ever. Indexing decisions depend on Google's assessment of page quality, crawl budget allocated to your domain, and whether the pages have meaningful, unique content. The sitemap gets the URLs in front of the crawler — what happens after that is up to Google.
When to Revisit Your Sitemap Setup
Most sites set up their sitemap once and forget it. There are a handful of triggers that should prompt you to audit your sitemap configuration:
- Site migration (domain change, HTTPS switch, URL restructuring)
- New content types or sections added to the site
- Sudden drops in Search Console's indexed page count
- CMS or plugin upgrades that change sitemap generation behavior
- Crossing the 50,000 URL threshold
For auditing, I recommend fetching your sitemap directly with curl and checking the URL count, then running a sample of URLs through Search Console's URL Inspection to verify they're crawlable and canonical. Screaming Frog can validate a sitemap at scale — crawl your sitemap URLs and flag redirects, noindex pages, or 4xx responses that have crept into your list.
The underlying point throughout all of this: a sitemap is infrastructure, not strategy. Getting it right removes obstacles to crawling and indexing. It doesn't generate rankings or traffic on its own. Build it cleanly, keep it accurate, and then spend the majority of your SEO effort on the things that actually move rankings — content quality, authority, and page experience. The sitemap is a prerequisite, not a lever.