XML Sitemap Generator
Paste your URLs below β generates a valid XML sitemap with lastmod, changefreq & priority. Auto-splits at 50,000 URLs into a sitemap index.
How to Generate a Valid XML Sitemap: A Step-by-Step Guide
An XML sitemap is one of the most foundational files you can place on your website. It tells search engine crawlers exactly which URLs exist on your site, when they were last updated, how often they change, and how important each page is relative to others. Without a sitemap, Googlebot and other crawlers must discover your pages entirely through link-following β a slow, unreliable process that can leave entire sections of your site unindexed for weeks.
This guide walks you through every aspect of building a correct, high-quality XML sitemap β from understanding the schema to handling large sites with tens of thousands of URLs.
What Exactly Is an XML Sitemap?
An XML sitemap is a structured text file written in the Sitemap Protocol format (defined at sitemaps.org). Every URL entry lives inside a <url> element within a <urlset> container. Each entry can include four fields:
- <loc> β The absolute URL of the page. This is the only required field.
- <lastmod> β The date the page was last meaningfully changed, in W3C Datetime format (YYYY-MM-DD).
- <changefreq> β A hint about how often the content changes: always, hourly, daily, weekly, monthly, yearly, or never.
- <priority> β A value from 0.0 to 1.0 indicating relative importance within your site. Default is 0.5.
One important nuance: Google's documentation explicitly states that changefreq and priority are hints, not commands. Googlebot decides its own crawl schedule. Still, these fields help signal your crawl intent and are respected by Bing and other crawlers more literally.
Step 1: Compile Your URL List
Before generating anything, you need a clean list of URLs you want indexed. Here is where most site owners make mistakes. Your sitemap should only contain canonical, publicly accessible URLs that return a 200 HTTP status code. Never include:
- Pages with
noindexmeta tags or X-Robots-Tag headers - Paginated pages beyond page 1 (unless each page has unique content worth indexing)
- URLs blocked by robots.txt
- Redirect URLs (301/302 targets)
- Duplicate URLs (non-canonical versions)
- Session IDs or UTM parameters in URLs
If you have a CMS like WordPress, you can export a URL list from your sitemap plugin, a crawl tool like Screaming Frog, or your server's access logs filtered to 200-status GET requests. For static sites, a simple directory scan works well.
Step 2: Set the Right Priority Values
Priority is often misused β many sites set every page to 1.0, which defeats the purpose entirely. A useful approach is to assign priority based on page depth and importance:
- 1.0 β Homepage only
- 0.8β0.9 β Top-level category pages, main landing pages
- 0.6β0.7 β Blog index, product listing pages
- 0.4β0.5 β Individual blog posts, product detail pages
- 0.2β0.3 β Tag pages, author pages, older archive content
This tiered structure communicates to crawlers where to spend their crawl budget first, which matters enormously for large sites with limited crawl allocation.
Step 3: Choose the Right changefreq
Match changefreq to reality. A news article published once and never updated should be never or yearly. A product page with a regularly updated price might be daily. A contact page almost never changes β use monthly or yearly. Inflating changefreq with "always" across your entire site wastes crawl budget and trains crawlers to distrust your signals.
Step 4: Generate the XML Structure
Using the tool above, paste your URLs (one per line) and select your defaults. The tool handles the XML encoding automatically β this matters because URLs containing ampersands (&), quotes, or angle brackets will break a sitemap if not properly escaped as XML entities. For example, & must become &.
The generated file starts with the XML declaration:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
The namespace declaration (xmlns attribute) is mandatory β without it, validators and some crawlers will reject the file.
Step 5: Handle Sites With More Than 50,000 URLs
The Sitemap Protocol imposes a hard limit: a single sitemap file may contain no more than 50,000 URLs and must not exceed 50 MB uncompressed. If your site exceeds this (common for e-commerce, news, or large blogs), you need a sitemap index file.
A sitemap index lists the locations of multiple individual sitemap files. The structure looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-1.xml</loc>
<lastmod>2025-06-01</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-2.xml</loc>
<lastmod>2025-06-01</lastmod>
</sitemap>
</sitemapindex>
The tool above handles this split automatically. When your URL count exceeds 50,000, it generates numbered sitemap files plus an index file. Enter your base URL (e.g., https://example.com) so the index file contains correct absolute <loc> values for each child sitemap.
Step 6: Validate Before Submitting
Before uploading your sitemap, validation catches structural errors that would silently cause crawlers to reject the file. Check these things:
- Every
<loc>is an absolute URL starting withhttp://orhttps:// - Special characters are XML-encoded (
&β&, etc.) - The file is valid UTF-8 encoding
- No more than 50,000 URLs per file
- The namespace is present on the root element
Step 7: Upload and Submit to Search Engines
Upload sitemap.xml (or your index file) to your site's root directory, making it accessible at https://yourdomain.com/sitemap.xml. Then:
- Google Search Console β Go to Sitemaps under Index, enter the path (
sitemap.xml), and click Submit. - Bing Webmaster Tools β Same process under the Sitemaps section.
- robots.txt reference β Add
Sitemap: https://yourdomain.com/sitemap.xmlat the bottom of your robots.txt file. This auto-discovers your sitemap for any crawler that reads robots.txt.
After submission, monitor the Sitemaps report in Search Console. Google will show how many URLs it discovered versus how many it indexed β this gap tells you about crawlability and indexing issues that go beyond the sitemap itself.
Common Mistakes to Avoid
Setting every page to priority 1.0 is the single most common error. Another frequent mistake is including URLs that redirect β submit the final destination URL only. Never include noindex pages in your sitemap; it sends contradictory signals (you are simultaneously telling Google "please index this" and "please don't index this"). Finally, keep your sitemap up to date. A stale sitemap with outdated lastmod dates or removed pages that now 404 actively hurts your crawl efficiency.
A well-maintained XML sitemap is not a one-time task β it is a living document that grows with your site. Automate its generation on your CMS or CI pipeline, and you will ensure search engines always have an accurate, up-to-date map of everything worth indexing on your site.