🔍 Sitemap Validator & Checker

Last updated: February 24, 2026
.tw *, .tw *::before, .tw *::after { box-sizing: border-box; margin: 0; padding: 0; } .tw { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif; background: #0f172a; color: #e2e8f0; border-radius: 16px; padding: 28px; max-width: 860px; margin: 0 auto; } .tw h2 { font-size: 1.4rem; font-weight: 700; color: #f8fafc; margin-bottom: 6px; display: flex; align-items: center; gap: 10px; } .tw .tw-sub { font-size: 0.85rem; color: #94a3b8; margin-bottom: 22px; } .tw label { display: block; font-size: 0.82rem; font-weight: 600; color: #94a3b8; text-transform: uppercase; letter-spacing: 0.06em; margin-bottom: 7px; } .tw .tw-input-row { display: flex; gap: 10px; margin-bottom: 14px; } .tw input[type="text"], .tw textarea { width: 100%; background: #1e293b; border: 1.5px solid #334155; border-radius: 8px; color: #f1f5f9; font-size: 0.95rem; padding: 10px 14px; outline: none; transition: border-color 0.2s; } .tw input[type="text"]:focus, .tw textarea:focus { border-color: #6366f1; } .tw textarea { resize: vertical; min-height: 160px; font-family: 'SFMono-Regular', Consolas, monospace; font-size: 0.82rem; line-height: 1.6; } .tw .tw-tabs { display: flex; gap: 4px; background: #1e293b; border-radius: 8px; padding: 4px; margin-bottom: 18px; } .tw .tw-tab { flex: 1; padding: 8px 12px; border: none; border-radius: 6px; font-size: 0.85rem; font-weight: 600; cursor: pointer; background: transparent; color: #64748b; transition: all 0.2s; } .tw .tw-tab.active { background: #6366f1; color: #fff; } .tw .tw-tab:hover:not(.active) { color: #cbd5e1; } .tw .tw-panel { display: none; } .tw .tw-panel.active { display: block; } .tw .tw-note { font-size: 0.78rem; color: #64748b; margin-bottom: 14px; background: #1e293b; border-radius: 6px; padding: 8px 12px; border-left: 3px solid #6366f1; } .tw .tw-btn { background: linear-gradient(135deg, #6366f1, #8b5cf6); color: #fff; border: none; border-radius: 8px; padding: 12px 28px; font-size: 0.95rem; font-weight: 700; cursor: pointer; transition: opacity 0.2s, transform 0.15s; width: 100%; margin-top: 4px; } .tw .tw-btn:hover { opacity: 0.9; transform: translateY(-1px); } .tw .tw-btn:active { transform: translateY(0); } .tw .tw-results { margin-top: 22px; display: none; } .tw .tw-results.show { display: block; } .tw .tw-scorecard { display: grid; grid-template-columns: repeat(auto-fit, minmax(140px, 1fr)); gap: 12px; margin-bottom: 20px; } .tw .tw-card { background: #1e293b; border-radius: 10px; padding: 14px 16px; border: 1.5px solid #334155; } .tw .tw-card-label { font-size: 0.72rem; font-weight: 600; text-transform: uppercase; letter-spacing: 0.06em; color: #64748b; margin-bottom: 5px; } .tw .tw-card-val { font-size: 1.6rem; font-weight: 800; line-height: 1; } .tw .tw-card-val.green { color: #22c55e; } .tw .tw-card-val.red { color: #ef4444; } .tw .tw-card-val.yellow { color: #f59e0b; } .tw .tw-card-val.blue { color: #38bdf8; } .tw .tw-card-val.purple { color: #a78bfa; } .tw .tw-section { background: #1e293b; border-radius: 10px; border: 1.5px solid #334155; margin-bottom: 14px; overflow: hidden; } .tw .tw-section-head { padding: 12px 16px; font-size: 0.88rem; font-weight: 700; cursor: pointer; display: flex; align-items: center; justify-content: space-between; user-select: none; transition: background 0.15s; } .tw .tw-section-head:hover { background: #243048; } .tw .tw-section-head .tw-badge { font-size: 0.72rem; font-weight: 700; padding: 2px 8px; border-radius: 20px; margin-left: 8px; } .tw .tw-badge.pass { background: #14532d; color: #86efac; } .tw .tw-badge.fail { background: #7f1d1d; color: #fca5a5; } .tw .tw-badge.warn { background: #78350f; color: #fde68a; } .tw .tw-badge.info { background: #1e3a5f; color: #93c5fd; } .tw .tw-section-arrow { font-size: 0.8rem; color: #64748b; transition: transform 0.2s; } .tw .tw-section-body { display: none; padding: 0 16px 14px; } .tw .tw-section-body.open { display: block; } .tw .tw-issue-list { list-style: none; margin-top: 4px; } .tw .tw-issue-list li { font-size: 0.82rem; padding: 6px 0; border-bottom: 1px solid #334155; display: flex; gap: 8px; align-items: flex-start; line-height: 1.5; color: #cbd5e1; } .tw .tw-issue-list li:last-child { border-bottom: none; } .tw .tw-issue-list li .icon { flex-shrink: 0; margin-top: 1px; } .tw .tw-url-table { width: 100%; border-collapse: collapse; font-size: 0.78rem; margin-top: 6px; } .tw .tw-url-table th { text-align: left; padding: 6px 8px; background: #0f172a; color: #64748b; font-weight: 700; text-transform: uppercase; font-size: 0.7rem; letter-spacing: 0.05em; } .tw .tw-url-table td { padding: 6px 8px; border-top: 1px solid #334155; color: #94a3b8; word-break: break-all; vertical-align: top; } .tw .tw-url-table td:first-child { color: #e2e8f0; width: 55%; } .tw .tw-url-table tr:nth-child(even) td { background: #18253a; } .tw .status-pass { color: #22c55e; } .tw .status-fail { color: #ef4444; } .tw .status-warn { color: #f59e0b; } .tw .status-info { color: #38bdf8; } .tw .tw-overall { padding: 14px 18px; border-radius: 10px; margin-bottom: 18px; font-weight: 700; font-size: 1rem; display: flex; align-items: center; gap: 10px; } .tw .tw-overall.pass { background: #14532d33; border: 1.5px solid #22c55e44; color: #86efac; } .tw .tw-overall.fail { background: #7f1d1d33; border: 1.5px solid #ef444444; color: #fca5a5; } .tw .tw-overall.warn { background: #78350f33; border: 1.5px solid #f59e0b44; color: #fde68a; } .tw .tw-urls-preview { max-height: 300px; overflow-y: auto; scrollbar-width: thin; scrollbar-color: #334155 transparent; } .tw .tw-more-btn { background: #1e293b; border: 1.5px solid #334155; color: #94a3b8; border-radius: 6px; padding: 6px 14px; font-size: 0.78rem; cursor: pointer; margin-top: 8px; width: 100%; transition: border-color 0.2s; } .tw .tw-more-btn:hover { border-color: #6366f1; color: #c7d2fe; } .tw .tw-empty { color: #475569; font-size: 0.82rem; text-align: center; padding: 12px 0; }

🔍 Sitemap Validator & Checker

Validate sitemap.xml structure, URL count, lastmod dates, encoding and indexability issues — paste XML directly or enter a URL hint.

⚠️ Browser security blocks direct sitemap fetching (CORS). Paste the XML above after viewing source, or use this field to enter your sitemap URL and paste the XML manually. The URL is used for protocol/pattern analysis.
]]> The Complete Sitemap Validation Checklist: Every Check That Determines If Google Indexes Your Pages

A sitemap.xml file is your direct communication channel to search engine crawlers. Done right, it tells Googlebot exactly which pages exist, when they were last updated, and how important they are relative to each other. Done wrong — even with subtle XML encoding issues or a wrong date format — it can silently cause Google to skip pages you spent months building. This checklist walks through every validation point that matters.

Check 1: Does Your XML Actually Parse?

The most basic failure mode is a sitemap that looks fine in a text editor but breaks the moment a crawler's XML parser hits it. Common culprits are stray characters before the XML declaration, mismatched tags, or an unclosed element after the last URL. Always run your sitemap through a strict XML parser — not just a browser, which often silently auto-corrects malformed markup. If your CMS generates the sitemap dynamically, test what's actually being served at the URL, not the template.

The XML declaration line should look exactly like this: <?xml version="1.0" encoding="UTF-8"?>. Nothing — no whitespace, no BOM character, no byte-order mark — should precede it. A BOM (the invisible U+FEFF character that some Windows text editors add) will cause XML parsers to reject the file immediately.

Check 2: Root Element Must Be <urlset> or <sitemapindex>

A standard sitemap has <urlset> as its root element. A sitemap index — which lists other sitemaps rather than individual pages — uses <sitemapindex>. Neither is optional or interchangeable. Using the wrong root element, or having a root element with a typo (<UrlSet>, <url_set>), means the crawler cannot identify the document as a valid sitemap at all.

Check 3: Namespace Declaration

The sitemap namespace must be declared on the root element: xmlns="http://www.sitemaps.org/schemas/sitemap/0.9". Miss this and the document is ambiguous XML — technically parseable, but not identifiable as a sitemap. Some crawlers will still process it; others won't. Don't rely on the lenient path.

Check 4: The 50,000 URL and 50 MB Hard Limits

Google enforces two hard limits on sitemap files: 50,000 URLs per file and 50 MB uncompressed file size. Exceed either and the crawler stops processing at that point — silently. Pages beyond the cutoff simply don't get submitted. If your site has more than 50,000 URLs, you must split them across multiple sitemap files and reference them from a sitemap index. Use gzip compression (sitemap.xml.gz) to reduce file size — Google handles compressed sitemaps natively and many sites cut their file size by 70–80% this way.

Check 5: Every URL Must Be Absolute and HTTPS

The <loc> element inside each <url> entry must contain a fully qualified absolute URL including the scheme. /about is invalid. example.com/about is invalid. https://example.com/about is correct. Additionally, if your site serves over HTTPS (which it should), using HTTP URLs in the sitemap is a mismatch — Google may crawl the HTTP version and encounter a redirect, wasting crawl budget.

Check 6: Escaped Special Characters in URLs

XML has reserved characters that must be escaped. The most common mistake is an ampersand in a URL query string. In HTML you might write ?color=red&size=large, but in XML that & must be written as &amp;. Unescaped ampersands cause XML parse errors that break every URL after that point in the file. Similarly, <, >, and " all need their respective XML entities if they appear in attribute values.

Check 7: lastmod Format Must Be W3C Datetime

The <lastmod> tag accepts only W3C datetime format. The simplest valid form is YYYY-MM-DD (e.g., 2024-06-15). The full form includes time and timezone offset: 2024-06-15T14:30:00+05:30 or with UTC: 2024-06-15T09:00:00Z. What doesn't work: June 15, 2024, 15/06/2024, 2024-6-15 (no zero padding), or Unix timestamps.

More important than format: the date must be accurate. Google's John Mueller has explicitly said that inaccurate lastmod values (like setting every page to today's date, or never updating them) train Googlebot to ignore the field entirely. Only update lastmod when the page's content actually changes.

Check 8: changefreq and priority Are Hints, Not Commands

If you include <changefreq>, it must be one of exactly seven values: always, hourly, daily, weekly, monthly, yearly, or never. Any other string is invalid. For <priority>, the value must be a decimal between 0.0 and 1.0 inclusive. A value of 1 is valid (interpreted as 1.0), but 2.0 or -1 are not.

Be aware that Google largely ignores both of these fields. They provide hints at best. Crawl frequency is determined by Google's own crawl budget algorithms, not your declared changefreq. Priority only has meaning relative to other URLs on the same domain, and even then its impact is minimal. Don't invest significant effort optimizing these fields.

Check 9: No Duplicate URLs

Every URL in your sitemap should appear exactly once. Duplicate entries waste space, confuse crawlers, and signal poor sitemap generation hygiene. Duplicates commonly appear when sitemaps are generated from database queries without DISTINCT, or when sitemap plugins pull from multiple sources that overlap. If you have canonical URL handling on your site, ensure the sitemap contains only canonical URLs — not the non-canonical variants.

Check 10: URLs Must Match the Sitemap's Domain

A sitemap submitted through Google Search Console for example.com may only contain URLs on example.com. Cross-domain URLs are silently ignored. This catches teams that accidentally include CDN URLs, staging server URLs, or partner site links. It also matters if you recently migrated domains — old URLs from the previous domain won't be processed through the new sitemap.

Check 11: Sitemap URL Must Be Discoverable

Beyond the file's contents, the sitemap must be discoverable. The standard is referencing it in robots.txt: Sitemap: https://example.com/sitemap.xml. You can also submit it directly in Google Search Console (which provides coverage reports and error details that passive crawling doesn't). Both methods work and aren't mutually exclusive — do both. The sitemap URL itself must return a 200 HTTP status code, not a redirect. Googlebot doesn't follow redirects to sitemaps.

What to Fix First

If your sitemap has multiple issues, prioritize in this order: XML parse errors first (nothing else matters if the file can't be parsed), then missing or incorrect namespace, then URL format issues (non-absolute, spaces, unescaped characters), then file size and count limits, then lastmod format errors, and finally optional tag validation. Structural issues stop the entire file from being processed; data quality issues only affect individual entries.

Run validation after every sitemap regeneration — especially after CMS updates, URL restructuring, or adding new content types. A valid sitemap isn't a one-time task; it's an ongoing maintenance item that directly affects how quickly new and updated content appears in search results.

]]>

FAQ

Why is my sitemap not being indexed by Google even though it exists?
The most common reasons are: the sitemap URL returns a redirect (Google doesn't follow sitemap redirects — it must return a 200 directly), the XML is malformed and fails to parse, the sitemap isn't referenced in robots.txt or submitted in Google Search Console, or the file exceeds the 50 MB / 50,000 URL limits. Use Google Search Console's Sitemaps report for the specific error Google encountered.
How many URLs can I put in a single sitemap.xml file?
Google's limit is 50,000 URLs per sitemap file, and the uncompressed file size must be under 50 MB. If your site has more URLs, split them across multiple sitemap files (e.g., sitemap-posts.xml, sitemap-pages.xml) and reference all of them from a sitemap index file (sitemapindex.xml). The index file itself is also limited to 50,000 entries.
Does the lastmod date actually affect how often Google crawls my pages?
Only if you keep it accurate. Google uses lastmod as a hint when the dates reflect genuine changes. If you set every URL's lastmod to today's date or never update it, Google learns to ignore the field. An accurate lastmod — updated only when content actually changes — signals Googlebot to revisit that specific page sooner, which is valuable for news sites and frequently updated content.
Should I include noindex pages in my sitemap?
No. Including a page in your sitemap signals that you want it indexed, while a noindex meta tag says you don't. These conflicting signals confuse crawlers and waste crawl budget. Keep your sitemap strictly to pages you want indexed — canonicalized, indexable, returning 200 status codes. Remove redirecting pages, error pages, duplicate content, and any URL with a noindex directive.
What's the difference between a sitemap and a sitemap index?
A sitemap (urlset) directly lists individual page URLs. A sitemap index (sitemapindex) lists other sitemap files rather than pages — it's a directory of sitemaps. Use a sitemap index when you have more than 50,000 URLs, want to separate sitemaps by content type (posts vs. products vs. images), or want to submit multiple sitemaps with a single entry in Google Search Console.
Does sitemap priority (0.0–1.0) actually affect search rankings?
No. Priority is a relative hint within your own sitemap about which pages matter more to you, but Google explicitly states it doesn't use priority values for ranking. It may slightly influence crawl priority among your own pages, but the effect is minimal and inconsistent. Most SEOs recommend either omitting the priority tag entirely or setting all values to 0.5, rather than spending time optimizing something search engines largely disregard.