Schema Markup and Sitemaps: Answering the Questions SEOs Actually Ask

Every few months I get the same Slack message from someone on my team: "Hey, quick question about schema..." and then what follows is never quick. It's a genuinely confusing corner of SEO where the official docs are technically correct but weirdly evasive about the real-world edge cases. So I've compiled the questions I actually get asked — the ones that show up in Reddit threads, SEO Discord servers, and panicked DMs on launch day.

No fluff. No "it depends" without an actual answer. Let's go.

Schema Markup Questions

Does Google actually use my schema, or is it just for show?

Both, depending on what type. Some schema types directly unlock rich results you can see in the SERPs — star ratings from Review, FAQ dropdowns from FAQPage, breadcrumbs, event dates, product prices. Others influence how Google understands your content without producing a visible badge. There's no official list of "these do nothing," which is the frustrating part. The pragmatic answer: implement types that have documented rich result eligibility first. For everything else, it's structured data that may inform Knowledge Graph entries or future features you can't predict.

I added JSON-LD schema. The Rich Results Test passes. Why am I not getting rich results?

Passing validation is not the same as eligibility. Google's thresholds for actually showing rich results are undisclosed but clearly involve: page quality signals, niche (some industries are more trust-restricted than others), whether the schema is accurate and not misleading, and pure crawl frequency. A page that gets crawled once a month won't get rich results applied quickly even with flawless markup. Give it 4-6 weeks minimum after confirming the page has been re-crawled via Search Console's URL Inspection tool.

Can I put schema on elements that load via JavaScript / lazy loading?

Yes, but with an asterisk. JSON-LD in the <head> renders independently of your lazy-loaded content, which is why JSON-LD is almost always the better choice over Microdata for any JavaScript-heavy site. The issue comes when people try to dynamically inject JSON-LD via JavaScript after page load — Google's renderer handles this but there's latency, and depending on how slow your hydration is, the data might not be present during the indexing pass. The safest pattern: serve the JSON-LD in the initial HTML response, not injected post-render. If your product data or review data only exists in an API call that fires after page load, you have a deeper architectural problem to solve.

What's the actual difference between JSON-LD, Microdata, and RDFa? Which should I use?

JSON-LD goes in a script block, completely separate from your HTML content. Microdata and RDFa are woven into the actual HTML elements with attributes. Google supports all three. In practice, use JSON-LD — it's easier to maintain, easier to debug, doesn't break when your front-end team refactors the HTML, and is what Google explicitly recommends in their own documentation. Microdata made sense in 2012. The only scenario where I'd still reach for Microdata is when you're working in a legacy CMS where you cannot add arbitrary script blocks but can add HTML attributes.

My schema has errors in Search Console but rich results are showing. Do I fix them?

Yes, fix them. What you're seeing is Google being graceful enough to parse around errors, which is common — Google is extremely good at fuzzy interpretation. But errors are warnings that something is wrong, and at any point Google could tighten validation, lose a previously parseable field, or silently downgrade the feature. The most common errors I see: datePublished in wrong format (must be ISO 8601), @type values with typos, aggregateRating with ratingCount missing. These all have a five-minute fix.

Can you have multiple schema types on one page?

Absolutely. An e-commerce product page might legitimately carry Product, BreadcrumbList, FAQPage, and Organization all at once. Stack them as separate JSON-LD script blocks or nest them where the spec permits (e.g., Review inside Product). The one thing to avoid: using multiple conflicting instances of the same type that describe the same entity with different values. That's not multiple types — that's contradictory data and Google handles it unpredictably.

Does schema help with AI Overviews or featured snippets?

Schema is not a direct trigger for featured snippets — those come from content structure and query match. For AI Overviews, the evidence is thinner but the working theory (backed by anecdotal data from several SEOs I trust) is that entity-rich structured data helps Google confidently cite a source. FAQPage schema in particular seems to have at least correlation with AI Overview inclusion, though nobody can claim causation with any certainty yet. Mark it up. Don't do it only for this reason.

Sitemap Questions

How often should I update my sitemap?

The <changefreq> tag is largely ignored. Don't stress about it. What actually matters: your sitemap should reflect your current URL inventory with reasonable accuracy. For sites publishing daily content, regenerate it daily. For e-commerce with frequent product changes, automated sitemap regeneration on publish/unpublish is the right move. For a 30-page brochure site, once a week is fine. The bigger issue is stale sitemaps listing URLs that 404 or redirect — that's actively unhelpful to Googlebot.

Should I split my sitemap into multiple files?

Only if you need to. The limit is 50,000 URLs or 50MB per sitemap file. Beyond that, use a sitemap index file that points to multiple sitemap files. The other good reason to split: segmentation. Separate sitemaps for blog posts, products, and category pages make it easy to diagnose in Search Console if one section is getting discovered but not indexed. This is genuinely useful data.

Why are URLs in my sitemap not getting indexed?

The sitemap is a suggestion, not a command. Being in the sitemap doesn't guarantee indexing. The most common reasons URLs in a sitemap don't get indexed: thin or duplicate content (Google decides it's not worth indexing), poor crawl budget allocation (large sites with low-authority pages can't get everything crawled), internal linking is non-existent so the page has no context, or the page was previously indexed but de-indexed due to a quality issue. Check URL Inspection for the specific page — it'll tell you "Crawled but not indexed" vs "Discovered, not yet crawled" vs "Not indexed" with a reason.

Does my sitemap need to include every URL on the site?

No. Only include URLs you actually want indexed. This means: no noindex pages, no parameter-based duplicate URLs (unless you've canonicalized properly), no pagination beyond page 1 in most cases, no admin paths. Sitemaps that include hundreds of noindexed or canonicalized URLs are just noise that might waste crawl budget. Be intentional. The sitemap should represent your site's indexable content, not a raw dump of every URL your CMS generates.

Can I use a sitemap to speed up indexing of new content?

You can signal it faster. Submitting an updated sitemap via Search Console and pinging Google's sitemap endpoint (https://www.google.com/ping?sitemap=YOUR_SITEMAP_URL) does prompt a faster crawl attempt. For genuinely important new content, the best acceleration is internal linking — getting the new URL linked from a high-authority page that Googlebot visits frequently. Sitemap submission alone on a low-authority domain without internal links is often a 2-3 week wait anyway.

Robots.txt Questions

I accidentally blocked Googlebot with robots.txt. How bad is it?

Depends on how long it was live. If it was hours, probably fine — Google re-crawls frequently enough that the damage is minimal and recovery is fast once you fix it. Days or weeks is more serious; indexed pages may start dropping as Googlebot can't re-confirm them. Fix the robots.txt, submit your sitemap, and request indexing for your most important URLs through Search Console. Recovery usually happens within days to a few weeks, not months, assuming the underlying content quality is intact.

Can robots.txt block specific parameters?

You can use wildcards in the Disallow path to catch parameter patterns, like Disallow: /*?sort=. This works for Googlebot. But robots.txt is a blunt instrument — it blocks crawling but not indexing (a page can be indexed via external links even if blocked from crawling). For parameter handling, Google Search Console's URL parameters tool (now largely deprecated) or canonical tags on the parameter URLs are more surgical. For critical duplicate URL suppression, noindex is more reliable than robots.txt alone.

Should robots.txt disallow my staging site?

Your staging site should not be publicly accessible in the first place — password protect it at the server level. Robots.txt on staging is a defense-in-depth layer, not a primary protection. If your staging site is publicly accessible with no auth and only robots.txt blocking Googlebot, you're one misconfigured directive away from your test content appearing in search results. This happens more often than developers expect.

One More Thing

The tools that make this manageable: Google's Rich Results Test for schema validation (always test both the live URL and the code snippet), Screaming Frog for bulk sitemap auditing and extracting all URLs that appear in sitemaps vs. what's actually crawlable, and Search Console's Index Coverage report for the big picture on what's indexed vs. what's being ignored. These three tools together answer 90% of the questions in this post without needing to guess.

Schema markup, sitemaps, and robots.txt are not glamorous. They're the unglamorous plumbing that makes the rest of SEO work correctly. Get them right once, set up monitoring so you catch regressions, and move on to the parts of SEO that actually require creative thinking.