πŸ—ΊοΈ XML Sitemap Generator

Last updated: June 15, 2026

XML Sitemap Generator

Paste your URLs below β€” generates a valid XML sitemap with lastmod, changefreq & priority. Auto-splits at 50,000 URLs into a sitemap index.

Paste raw URLs (one per line). Lines with # are comments; URLs containing "noindex" are automatically excluded.
Used for <loc> in index file when URLs exceed 50,000.

How to Generate a Valid XML Sitemap: A Step-by-Step Guide

An XML sitemap is one of the most foundational files you can place on your website. It tells search engine crawlers exactly which URLs exist on your site, when they were last updated, how often they change, and how important each page is relative to others. Without a sitemap, Googlebot and other crawlers must discover your pages entirely through link-following β€” a slow, unreliable process that can leave entire sections of your site unindexed for weeks.

This guide walks you through every aspect of building a correct, high-quality XML sitemap β€” from understanding the schema to handling large sites with tens of thousands of URLs.

What Exactly Is an XML Sitemap?

An XML sitemap is a structured text file written in the Sitemap Protocol format (defined at sitemaps.org). Every URL entry lives inside a <url> element within a <urlset> container. Each entry can include four fields:

  • <loc> β€” The absolute URL of the page. This is the only required field.
  • <lastmod> β€” The date the page was last meaningfully changed, in W3C Datetime format (YYYY-MM-DD).
  • <changefreq> β€” A hint about how often the content changes: always, hourly, daily, weekly, monthly, yearly, or never.
  • <priority> β€” A value from 0.0 to 1.0 indicating relative importance within your site. Default is 0.5.

One important nuance: Google's documentation explicitly states that changefreq and priority are hints, not commands. Googlebot decides its own crawl schedule. Still, these fields help signal your crawl intent and are respected by Bing and other crawlers more literally.

Step 1: Compile Your URL List

Before generating anything, you need a clean list of URLs you want indexed. Here is where most site owners make mistakes. Your sitemap should only contain canonical, publicly accessible URLs that return a 200 HTTP status code. Never include:

  • Pages with noindex meta tags or X-Robots-Tag headers
  • Paginated pages beyond page 1 (unless each page has unique content worth indexing)
  • URLs blocked by robots.txt
  • Redirect URLs (301/302 targets)
  • Duplicate URLs (non-canonical versions)
  • Session IDs or UTM parameters in URLs

If you have a CMS like WordPress, you can export a URL list from your sitemap plugin, a crawl tool like Screaming Frog, or your server's access logs filtered to 200-status GET requests. For static sites, a simple directory scan works well.

Step 2: Set the Right Priority Values

Priority is often misused β€” many sites set every page to 1.0, which defeats the purpose entirely. A useful approach is to assign priority based on page depth and importance:

  • 1.0 β€” Homepage only
  • 0.8–0.9 β€” Top-level category pages, main landing pages
  • 0.6–0.7 β€” Blog index, product listing pages
  • 0.4–0.5 β€” Individual blog posts, product detail pages
  • 0.2–0.3 β€” Tag pages, author pages, older archive content

This tiered structure communicates to crawlers where to spend their crawl budget first, which matters enormously for large sites with limited crawl allocation.

Step 3: Choose the Right changefreq

Match changefreq to reality. A news article published once and never updated should be never or yearly. A product page with a regularly updated price might be daily. A contact page almost never changes β€” use monthly or yearly. Inflating changefreq with "always" across your entire site wastes crawl budget and trains crawlers to distrust your signals.

Step 4: Generate the XML Structure

Using the tool above, paste your URLs (one per line) and select your defaults. The tool handles the XML encoding automatically β€” this matters because URLs containing ampersands (&), quotes, or angle brackets will break a sitemap if not properly escaped as XML entities. For example, & must become &amp;.

The generated file starts with the XML declaration:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

The namespace declaration (xmlns attribute) is mandatory β€” without it, validators and some crawlers will reject the file.

Step 5: Handle Sites With More Than 50,000 URLs

The Sitemap Protocol imposes a hard limit: a single sitemap file may contain no more than 50,000 URLs and must not exceed 50 MB uncompressed. If your site exceeds this (common for e-commerce, news, or large blogs), you need a sitemap index file.

A sitemap index lists the locations of multiple individual sitemap files. The structure looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-1.xml</loc>
    <lastmod>2025-06-01</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-2.xml</loc>
    <lastmod>2025-06-01</lastmod>
  </sitemap>
</sitemapindex>

The tool above handles this split automatically. When your URL count exceeds 50,000, it generates numbered sitemap files plus an index file. Enter your base URL (e.g., https://example.com) so the index file contains correct absolute <loc> values for each child sitemap.

Step 6: Validate Before Submitting

Before uploading your sitemap, validation catches structural errors that would silently cause crawlers to reject the file. Check these things:

  1. Every <loc> is an absolute URL starting with http:// or https://
  2. Special characters are XML-encoded (& β†’ &amp;, etc.)
  3. The file is valid UTF-8 encoding
  4. No more than 50,000 URLs per file
  5. The namespace is present on the root element

Step 7: Upload and Submit to Search Engines

Upload sitemap.xml (or your index file) to your site's root directory, making it accessible at https://yourdomain.com/sitemap.xml. Then:

  • Google Search Console β€” Go to Sitemaps under Index, enter the path (sitemap.xml), and click Submit.
  • Bing Webmaster Tools β€” Same process under the Sitemaps section.
  • robots.txt reference β€” Add Sitemap: https://yourdomain.com/sitemap.xml at the bottom of your robots.txt file. This auto-discovers your sitemap for any crawler that reads robots.txt.

After submission, monitor the Sitemaps report in Search Console. Google will show how many URLs it discovered versus how many it indexed β€” this gap tells you about crawlability and indexing issues that go beyond the sitemap itself.

Common Mistakes to Avoid

Setting every page to priority 1.0 is the single most common error. Another frequent mistake is including URLs that redirect β€” submit the final destination URL only. Never include noindex pages in your sitemap; it sends contradictory signals (you are simultaneously telling Google "please index this" and "please don't index this"). Finally, keep your sitemap up to date. A stale sitemap with outdated lastmod dates or removed pages that now 404 actively hurts your crawl efficiency.

A well-maintained XML sitemap is not a one-time task β€” it is a living document that grows with your site. Automate its generation on your CMS or CI pipeline, and you will ensure search engines always have an accurate, up-to-date map of everything worth indexing on your site.

FAQ

Do I need to include every page on my site in the sitemap?
No β€” only include pages you want indexed. Exclude noindex pages, redirects, duplicate URLs, pages blocked by robots.txt, and any URL returning a non-200 HTTP status. Including unindexable pages wastes crawl budget and can confuse search engines with contradictory signals.
What is the difference between a sitemap and a sitemap index?
A regular sitemap (urlset) lists up to 50,000 individual URLs in one file. A sitemap index (sitemapindex) is a parent file that points to multiple individual sitemap files β€” required when your site exceeds 50,000 URLs or when you want to organize different content types (pages, images, videos) into separate sitemap files.
How important are changefreq and priority to Google?
Google treats both as hints rather than instructions β€” Googlebot uses its own signals to determine crawl frequency and priority. That said, accurate changefreq values help train the crawler's expectations over time, and priority helps when Google is rationing crawl budget across a large site. They are more strictly honored by Bing and other crawlers.
Should lastmod reflect the date I generated the sitemap or the date the page was actually changed?
It should reflect the actual date the page content was last meaningfully changed. Setting lastmod to today's date for all pages regardless of real changes is a common mistake β€” Google has stated it ignores lastmod values that appear inaccurate or inconsistently maintained.
Where should I place my sitemap file and how do search engines find it?
Place it in your site's root directory, accessible at https://yourdomain.com/sitemap.xml. Submit it manually in Google Search Console and Bing Webmaster Tools. Also add the line 'Sitemap: https://yourdomain.com/sitemap.xml' to your robots.txt file β€” this allows any crawler that reads robots.txt to auto-discover it without manual submission.
Can I have multiple sitemaps for different content types?
Yes. You can create separate sitemaps for pages, blog posts, images, videos, and news content. Organize them using a sitemap index file that references each specialized sitemap. Google supports image sitemaps and video sitemaps with additional namespace extensions beyond the standard URL fields.