Search Engine Optimization (SEO), Technical SEO

XML Sitemaps: How to Create and Optimize Your Sitemap for SEO (2026 Guide)

Futuristic Marketing Services » Search Engine Optimization (SEO) » XML Sitemaps: How to Create and Optimize Your Sitemap for SEO (2026 Guide)

50K

maximum URLs per sitemap file — use sitemap index above this limit

48hrs

typical time for Google to process a newly submitted sitemap

43%

of pages discovered by Google via sitemap vs internal links (Google data)

robots.txt Sitemap: declaration is all Googlebot needs to find your sitemap

What Is an XML Sitemap and Why Does It Matter for SEO?

An XML sitemap is a structured file that lists every important URL on your website essentially a roadmap you provide to search engine crawlers. It tells Google, Bing, and other search engines which pages exist on your site, when they were last updated, and implicitly, which ones you consider most important.

Google discovers pages in two primary ways: following links from other pages it already knows about, and reading sitemap files you explicitly provide. For well-established sites with strong internal linking, Google finds most content through link discovery. But for new sites, pages with few internal links, content updated frequently, or large e-commerce catalogues with thousands of product pages, sitemaps are essential for ensuring complete crawl coverage.

The SEO impact of a well-maintained sitemap is twofold: it ensures your pages get discovered and indexed, and it helps Google allocate crawl budget efficiently by signalling which URLs are canonical, current, and important. An outdated or error-filled sitemap, on the other hand, can waste crawl budget on dead URLs and send conflicting signals about your site’s content.

What a Sitemap Does and Does Not Do

DOES: Helps Google discover URLs it might not find through links alone.

DOES: Provides metadata (lastmod date) that helps Google prioritise recrawling updated content.

DOES: Supports image, video, and news discovery through specialised sitemap types.

DOES NOT: Guarantee indexing. Google decides whether to index each URL independently.

DOES NOT: Override robots.txt or noindex directives. Blocked pages stay blocked.

DOES NOT: Directly improve rankings. Sitemaps are a discovery and crawl efficiency tool, not a ranking signal.

Section 1: The 5 Types of XML Sitemaps

There is not just one kind of sitemap. Different content types require different sitemap formats. Understanding which types apply to your site is the first step to comprehensive sitemap coverage:

XML Sitemap

Primary type

Lists all important URLs for Google to crawl and index. Standard format supported by all major search engines.

Image Sitemap

Media discovery

Lists image URLs for Google Images indexing. Essential for e-commerce and photography sites with JS-loaded images.

Video Sitemap

Video indexing

Provides video metadata (title, description, thumbnail, duration) to Google for video rich results.

News Sitemap

News publishers

Required for Google News inclusion. Covers articles published in last 2 days. Separate from main sitemap.

Sitemap Index

Large sites

A sitemap of sitemaps. Required when you have 50,000+ URLs or split by content type (pages, posts, products).

Sitemap Type	File Name Convention	Max URLs	Who Needs It	Key Requirement
XML Sitemap (Standard)	sitemap.xml or sitemap_index.xml	50,000 per file	Every website	Must contain only canonical, indexable 200-status URLs
Sitemap Index	sitemap_index.xml	Unlimited (links to individual sitemaps)	Sites with 50,000+ URLs or multiple content types	Each child sitemap must also be under 50,000 URLs and 50MB
Image Sitemap	sitemap-images.xml or embedded in main sitemap	1,000 images per URL entry	E-commerce, photography, media-rich sites	Must declare image sitemap namespace in XML header
Video Sitemap	sitemap-videos.xml	50,000 video entries	Sites with video content pages	thumbnail_loc, title, and description are required fields
News Sitemap	news-sitemap.xml	Articles from last 2 days only	Google News publishers	publication_date must be within 48 hours. Cannot include older content.

Section 2: XML Sitemap Format Code Structure and Required Fields

A valid XML sitemap follows the Sitemap Protocol specification at sitemaps.org. The format is straightforward but the details matter. Here is the anatomy of a correct sitemap file:

Standard XML Sitemap Complete Example

Standard XML Sitemap (sitemap.xml)

				
					<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <!-- Homepage -->
  <url>
    <loc>https://futuristicmarketingservices.com/</loc>
    <lastmod>2026-03-01</lastmod>
    <changefreq>monthly</changefreq>
    <priority>1.0</priority>
  </url>
  <!-- Blog Post -->
  <url>
    <loc>https://futuristicmarketingservices.com/Blogs/seo/xml-sitemap-guide/</loc>
    <lastmod>2026-03-20</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
  <!-- Service Page -->
  <url>
    <loc>https://futuristicmarketingservices.com/services/seo/</loc>
    <lastmod>2026-02-15</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.9</priority>
  </url>
</urlset>

Sitemap Index File For Large Sites

Sitemap Index (sitemap_index.xml)

				
					<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <!-- Homepage -->
  <url>
    <loc>https://futuristicmarketingservices.com/</loc>
    <lastmod>2026-03-01</lastmod>
    <changefreq>monthly</changefreq>
    <priority>1.0</priority>
  </url>
  <!-- Blog Post -->
  <url>
    <loc>https://futuristicmarketingservices.com/Blogs/seo/xml-sitemap-guide/</loc>
    <lastmod>2026-03-20</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
  <!-- Service Page -->
  <url>
    <loc>https://futuristicmarketingservices.com/services/seo/</loc>
    <lastmod>2026-02-15</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.9</priority>
  </url>
</urlset>

Image Sitemap For E-Commerce and Media Sites

Image Sitemap (embedded in standard sitemap)

				
					<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
  <url>
    <loc>https://futuristicmarketingservices.com/Blogs/seo/image-seo-guide/</loc>
    <image:image>
      <image:loc>https://futuristicmarketingservices.com/images/image-seo-guide-infographic.webp</image:loc>
      <image:title>Image SEO Guide Infographic - 12 Optimization Steps</image:title>
      <image:caption>Complete image SEO optimization process covering alt text, compression, file naming, and Google Images ranking.</image:caption>
    </image:image>
    <image:image>
      <image:loc>https://futuristicmarketingservices.com/images/webp-vs-jpeg-comparison.webp</image:loc>
      <image:title>WebP vs JPEG file size comparison chart</image:title>
    </image:image>
  </url>
</urlset>

Section 3: XML Sitemap Fields Complete Reference

Understanding which fields are required, recommended, and optional and how Google actually uses each prevents common mistakes like inflating priority values or updating lastmod unnecessarily:

XML Tag	Status	Purpose	Example Value
<loc>	Required	The full canonical URL of the page. Must include https:// and exactly match the canonical URL.	https://domain.com/blog/seo-guide/
<lastmod>	Recommended	Date page content was last meaningfully changed. ISO 8601 format. Helps Google prioritise recrawling.	2026-03-19
<changefreq>	Optional	Hint to crawlers how often content changes. Google may ignore it. Values: always, hourly, daily, weekly, monthly, yearly, never.	weekly
<priority>	Optional	Relative priority 0.0–1.0 vs other URLs in your sitemap. Google states it largely ignores this field. Default is 0.5.	0.8
<image:image>	Image sitemaps	Adds image data to URL entry. Includes <image:loc> (image URL), <image:title>, and <image:caption>.	See image sitemap section
<video:video>	Video sitemaps	Adds video metadata: thumbnail_loc, title, description, content_loc or player_loc, duration.	See video sitemap section
<news:news>	News sitemaps	Required for Google News: publication name/language, publication_date, title. Only for articles < 2 days old.	See news sitemap section

The Truth About changefreq and priority

Google has publicly stated that it largely ignores both <changefreq> and <priority> values.

changefreq: Google determines its own crawl frequency based on actual content change signals, not your declared frequency. Including it does no harm but provides minimal value.

priority: The 0.0–1.0 priority value is relative within your own sitemap it does not compare your pages to other sites. Google has stated it uses this signal minimally. Do not inflate all pages to 1.0 (meaningless) or spend time optimising values.

Focus instead on: (1) Keeping <lastmod> accurate for pages that actually change, (2) Ensuring sitemap cleanliness no errors, no noindex, no redirects.

Section 4: What to Include and Exclude from Your Sitemap

The quality of your sitemap is more important than its size. A sitemap containing noindex pages, 404 errors, and redirect chains sends confused signals to Google and wastes crawl budget. Be selective your sitemap should represent the curated list of pages you want Google to find, crawl, and index:

Include/Exclude	URL Type	Why
INCLUDE	Published blog posts and articles	Core indexed content primary crawl targets
INCLUDE	Service and product pages	Commercial pages highest business value
INCLUDE	Category and collection pages	Faceted navigation hubs if they have unique content
INCLUDE	Location and landing pages	Local SEO pages important for local pack rankings
INCLUDE	Cornerstone and pillar pages	High-authority pages prioritise crawl budget here
INCLUDE	Author and about pages (with content)	Entity pages support E-E-A-T signals
EXCLUDE	noindex pages	Never include contradicts the noindex directive signal
EXCLUDE	Paginated pages (page/2, page/3)	Only include if canonical points to self, not page 1
EXCLUDE	URL parameter variations	Filter, sort, and tracking URLs create duplicate content
EXCLUDE	Thank you / confirmation pages	No SEO value private user journey endpoints
EXCLUDE	404 and error pages	Including broken URLs wastes crawl budget
EXCLUDE	Login, account, and checkout pages	Non-indexable by design exclude to save crawl budget
EXCLUDE	Thin or duplicate content pages	Including dilutes sitemap quality signal
EXCLUDE	Staging or development URLs	Should be noindex anyway never in production sitemap

The Noindex + Sitemap Contradiction

Including a page in your sitemap AND adding noindex to that page sends contradictory signals to Google.

Your sitemap says: “Please index this page.”

Your noindex tag says: “Do not index this page.”

Google will ultimately respect the noindex directive but the contradiction wastes crawl budget and creates unnecessary confusion.

Rule: If a page has noindex, remove it from the sitemap. If it’s in the sitemap, remove the noindex tag. Never both.

Check: Screaming Frog → Mode: List → paste sitemap URLs → filter by Meta Robots “noindex” to find all violations at once.

Section 5: How to Create an XML Sitemap By Platform

The right sitemap creation method depends entirely on your platform. Here is the recommended approach for every major CMS and development stack:

Platform	Recommended Method	How It Works	Quick Setup
WordPress	Yoast SEO (free) or Rank Math (free)	Auto-generated at /sitemap_index.xml. Separates posts, pages, categories, authors. Updates automatically on publish.	Install plugin → enable sitemap in settings → submit /sitemap_index.xml to GSC
Shopify	Built-in (automatic)	Shopify auto-generates /sitemap.xml covering pages, products, collections, and blogs. No plugin needed.	Submit yourdomain.com/sitemap.xml directly in Google Search Console
Wix	Built-in (automatic)	Wix auto-generates sitemap accessible at /sitemap.xml. Limited control over included/excluded URLs.	Submit yourdomain.com/sitemap.xml in GSC. Use Wix SEO settings to exclude pages
Squarespace	Built-in (automatic)	Auto-generated at /sitemap.xml. Includes all published pages. Tag/category pages included by default.	Submit /sitemap.xml in GSC. Use page settings to noindex unwanted pages
Custom/Static	Generate manually or with tools	Use Screaming Frog (crawl → Sitemaps → Create XML Sitemap) or online generators like XML-Sitemaps.com.	Generate → upload to root → add to robots.txt → submit in GSC
Next.js	next-sitemap package (npm)	npm package generates sitemap automatically from page routes. Configure in next-sitemap.config.js.	npm install next-sitemap → configure → build generates sitemap.xml
Drupal	Simple XML Sitemap module	Generates comprehensive sitemap with fine-grained control over included entity types and regeneration intervals.	Install module → configure at /admin/config/search/simplesitemap
Laravel/PHP	spatie/laravel-sitemap package	Programmatic sitemap generation. Can crawl site or build from database queries. Supports large site chunking.	composer require spatie/laravel-sitemap → schedule generation in Kernel.php

Manual XML Sitemap Creation Step by Step

For small static sites (under 100 pages) or situations where a CMS approach is not possible, you can create a sitemap manually:

1. List all your important URLs. Open a spreadsheet and list every URL you want indexed. Include homepage, service pages, blog posts, and location pages. Exclude everything in the "exclude" list from Section 4.
2. Format as XML. Create a new file called sitemap.xml. Start with the XML declaration and urlset opening tag. Add one block per page with at minimum a tag.
3. Validate the XML. Paste your sitemap XML at xmlvalidation.com or use the W3C XML validator to confirm no syntax errors. A single malformed character breaks the entire file.
4. Upload to site root. Place sitemap.xml at the root of your domain: https://yourdomain.com/sitemap.xml. Not in a subdirectory.
5. Add to robots.txt. Add Sitemap: https://yourdomain.com/sitemap.xml to your robots.txt file. This allows all crawlers (not just Google) to discover your sitemap.
6. Submit to Google Search Console. Log into GSC > Indexing > Sitemaps > Add a new sitemap. Enter the sitemap path and click Submit.

Section 6: Submitting Your Sitemap to Google Step by Step

Creating a sitemap is only half the job. Google must be explicitly informed about it through Google Search Console for maximum benefit. Simply placing the file at /sitemap.xml is not enough Google may not discover it for weeks or months without a formal submission.

Step	Action	Detail
1	Verify your property in GSC	Ensure you have verified ownership of your HTTPS property in Google Search Console. HTTPS and HTTP are separate properties.
2	Navigate to Sitemaps	In GSC left sidebar: Indexing > Sitemaps. This shows all previously submitted sitemaps and their status.
3	Enter your sitemap URL	In the “Add a new sitemap” field, enter only the path (not full URL). Example: sitemap_index.xml or sitemap.xml. Click Submit.
4	Verify submission status	Sitemap should show status “Success”. If it shows an error, click the sitemap URL to see specific error details.
5	Check URLs Discovered vs Indexed	After 24–72 hours, check URLs Discovered (total in sitemap) vs URLs Indexed (actually indexed by Google). Large gaps need investigation.
6	Monitor for errors regularly	Return to GSC Sitemaps weekly for new sites, monthly for established sites. Sitemap errors can silently block indexing.

Submitting to Bing Webmaster Tools

Bing does not receive sitemap information from Google Search Console. To ensure Bing indexes your content, submit your sitemap separately at Bing Webmaster Tools (bing.com/webmasters). The process is identical to GSC: verify property ownership → navigate to Sitemaps → enter sitemap URL → Submit.

Bing also supports the IndexNow protocol a faster URL submission method where you ping Bing directly when you publish or update a page, rather than waiting for crawl discovery. WordPress plugins like Rank Math and IndexNow for WordPress support this automatically.

The robots.txt Sitemap Declaration

Adding your sitemap URL to robots.txt ensures every crawler that reads robots.txt (which is all major crawlers) can find your sitemap, independent of GSC submission. This is a simple one-line addition to your robots.txt file:

robots.txt Sitemap Declaration

				
					User-agent: *
Disallow:

# Declare all sitemaps here
Sitemap: https://futuristicmarketingservices.com/sitemap_index.xml

# For multiple sitemaps, list each separately:
Sitemap: https://futuristicmarketingservices.com/sitemap-posts.xml
Sitemap: https://futuristicmarketingservices.com/sitemap-images.xml

Section 7: Monitoring Sitemaps in Google Search Console

Submitting a sitemap is a one-time action. Monitoring it is an ongoing responsibility. The GSC Sitemaps report is your primary dashboard for understanding how Google is processing your sitemap and identifying issues that need attention.

Understanding the GSC Sitemap Report

GSC Metric	What It Shows	What to Look For	Action if Problem
Status	Success, Pending, or Error for each submitted sitemap	“Error” status prevents Google from reading sitemap. “Pending” means not yet processed.	Click the error status to see specific error message and URL of first problematic entry
Discovered URLs	Total number of URLs Google found in your sitemap	Should match the actual count in your sitemap file. Large discrepancy = parsing error.	Re-open sitemap XML to check for malformed entries or encoding issues
Indexed URLs	How many of your sitemap URLs Google has actually indexed	Large gap between Discovered and Indexed = Google is finding but not indexing pages.	Use URL Inspection Tool on non-indexed pages to diagnose individual page issues
Last Read	When Google last crawled and processed the sitemap	If Last Read is weeks or months ago, Google may not be receiving your latest sitemap updates.	Request sitemap re-read or trigger by updating sitemap and pinging GSC via API
Coverage errors	URLs in sitemap flagged as “Excluded” by Google	Common: “Page with redirect”, “Alternate page with proper canonical tag”, “Crawled – not indexed”	Each exclusion reason requires different action diagnose individually in Coverage report

Diagnosing "Discovered Currently Not Indexed" in GSC

“Discovered currently not indexed” is one of the most common and frustrating GSC statuses. It means Google found your URL (from sitemap or links) but has not indexed it yet and is not actively crawling it. Common causes:

Low-quality content: Google does not believe the page provides enough unique value to index. Review content quality, thin content under 500 words, content too similar to other indexed pages, or pure duplicate content are common causes.
Crawl budget constraints: On large sites, Google may have a limited crawl budget that does not reach all pages. Improve internal linking to important pages, remove low-value pages from the sitemap, and improve site speed to allow Google to crawl more pages per session.
Sitemap quality signal: If your sitemap contains many low-quality, redirected, or noindex pages, Google lowers its trust in the entire sitemap. Clean up sitemap quality and the indexation rate for good pages typically improves.
New pages awaiting crawl queue: Newly published pages can take 1–4 weeks to be indexed even after sitemap submission. Use GSC URL Inspection > Request Indexing for priority pages to jump the queue.

Section 8: Sitemap Strategy for Large Sites

E-commerce sites, large blogs, and enterprise websites present unique sitemap challenges. A single sitemap file cannot handle 500,000 product pages and even if it could, a single undifferentiated list of all URLs makes it harder for Google to understand your content architecture.

Sitemap Index Architecture for Large Sites

The recommended approach for large sites is a sitemap index file that links to separate, type-specific child sitemaps. This architecture has three advantages: it stays within the 50,000 URL per file limit, it allows Google to prioritise crawling specific content types, and it makes sitemap management much easier for your team.

Site Type	Recommended Sitemap Structure	Priority Order
Large Blog(10,000+ posts)	sitemap_index.xml → sitemap-posts-1.xml, sitemap-posts-2.xml (chunked by date) + sitemap-pages.xml + sitemap-images.xml	Prioritise: recent posts → cornerstone content → older posts
E-Commerce(100,000+ products)	sitemap_index.xml → sitemap-products-[category].xml + sitemap-pages.xml + sitemap-collections.xml + sitemap-images.xml	Prioritise: in-stock products → categories → pages → out-of-stock
News Publisher	sitemap_index.xml → news-sitemap.xml (last 2 days only) + sitemap-articles.xml (archive) + sitemap-pages.xml	Prioritise: today’s news sitemap → recent archive → pages
Multi-Location Business	sitemap_index.xml → sitemap-pages.xml + sitemap-locations.xml + sitemap-blog.xml	Prioritise: location pages → service pages → blog posts
SaaS / Software	sitemap_index.xml → sitemap-pages.xml + sitemap-blog.xml + sitemap-docs.xml + sitemap-changelog.xml	Prioritise: pricing + product → docs → blog → changelog

Crawl Budget and Sitemaps

Crawl budget is the number of pages Googlebot crawls on your site within a given timeframe. For small sites (under 1,000 pages), crawl budget is rarely a constraint Google crawls everything. For large sites (100,000+ pages), crawl budget management becomes critical.

Remove low-value pages from sitemaps. Thin category pages, paginated archive pages, and session-ID URLs consume crawl budget without providing indexing value. Removing them focuses budget on important pages.
Improve site speed. Google can crawl more pages per day on fast sites. A 3-second TTFB means Google spends 3 seconds waiting before each page on 100,000 pages, that is 300,000 seconds of wasted crawl time.
Strengthen internal linking. Sitemaps help discovery, but internal links drive prioritised crawling. Pages with many internal links get crawled more frequently than orphan pages with sitemap entries but no links.
Use lastmod accurately. Accurate lastmod dates tell Google when pages need recrawling. Sites that update all lastmod dates daily without changing content lose Google's trust in their lastmod signals leading to less efficient recrawl scheduling.

Section 9: Common Sitemap Errors and How to Fix Them

Error	Cause	Fix
“Could not fetch” error in GSC	Sitemap file inaccessible wrong file path, server error, or robots.txt blocking Googlebot	Verify sitemap is accessible at the submitted URL. Check robots.txt does not block Googlebot from /sitemap.xml.
URLs in sitemap returning 404	Pages deleted, URL changed, or CMS removed content that sitemap still references	Remove deleted URLs from sitemap. Update changed URLs to new destination. Check CMS plugin settings to auto-remove deleted posts.
Redirect URLs in sitemap	Sitemap contains URL A which 301-redirects to URL B sitemap should only contain URL B	Update sitemap to contain only the final destination URL. Never include redirect source URLs.
noindex pages in sitemap	Pages with both sitemap entry and noindex meta tag contradictory signals	Remove noindex pages from sitemap. Or remove noindex if the page should be indexed. Never both.
Wrong URLs (HTTP instead of HTTPS)	Sitemap generated before HTTPS migration or plugin misconfigured after migration	Update all sitemap URLs to HTTPS. Change WordPress Site URL in Settings to https://. Regenerate sitemap.
XML parsing error / malformed XML	Invalid characters in URLs (unencoded &, spaces), broken XML structure, encoding issue	Validate sitemap at xmlvalidation.com. Encode special characters: & → & Space → %20. Check encoding is UTF-8.
“Indexed though blocked by robots.txt”	URL in sitemap but also blocked by robots.txt contradictory signals	Remove URL from sitemap AND from robots.txt block, OR keep it blocked and remove from sitemap. Never both.
Sitemap over 50,000 URLs or 50MB	Single sitemap file exceeds Google’s limits	Split into multiple sitemap files and create a sitemap index file linking to all child sitemaps.

Section 10: Complete XML Sitemap Audit Checklist 12 Points

Use this checklist when creating a new sitemap, auditing an existing site, or diagnosing indexation problems. A clean sitemap is one of the quickest technical SEO wins available on any site.

#	Task	How to Do It	Phase	Done
1	All important URLs present	Crawl site with Screaming Frog and compare to sitemap URLs. Any important page missing from sitemap should be added.	Content audit	🗹
2	No noindex pages in sitemap	Screaming Frog → Sitemap → filter by “noindex”. All noindex URLs must be removed from sitemap immediately.	Technical	🗹
3	No 4xx URLs in sitemap	Any 404 or 410 in sitemap wastes crawl budget. Filter Screaming Frog by Response Code. Remove all non-200 URLs.	Technical	🗹
4	No redirect URLs in sitemap	Sitemap should contain final destination URLs only. No 301/302 redirected URLs include the target URL, not the redirect source.	Technical	🗹
5	URLs match canonical tags	Every URL in sitemap must exactly match its own rel=canonical tag. Mismatches confuse Google about preferred URL.	Canonicalization	🗹
6	No parameter URLs included	Filter, sort, and tracking parameter URLs (?colour=blue, ?utm_source=email) must be excluded. Only clean canonical URLs.	Deduplication	🗹
7	Sitemap under 50,000 URLs	Each sitemap file has a 50,000 URL limit and 50MB uncompressed limit. Use sitemap index file to split large sites.	Size limits	🗹
8	Submitted to Google Search Console	Sitemap must be actively submitted not just accessible at the URL. Submit in GSC > Indexing > Sitemaps.	Submission	🗹
9	Declared in robots.txt	Add Sitemap: https://yourdomain.com/sitemap.xml to robots.txt. Helps crawlers discover sitemap without GSC submission.	Discovery	🗹
10	lastmod dates are accurate	Only update <lastmod> when content meaningfully changes. Constant false updates reduce Google’s trust in your timestamps.	Metadata	🗹
11	Sitemap index for large sites	Sites with multiple content types (posts, products, pages) should use a sitemap index file linking to separate type-specific sitemaps.	Architecture	🗹
12	Monitor GSC for errors monthly	Check GSC Sitemaps monthly. Investigate any gaps between URLs Discovered and URLs Indexed. Fix coverage errors promptly.	Monitoring	🗹

Section 11: XML Sitemap Dos and Don'ts

DO (Sitemap Best Practice)	DON’T (Sitemap Mistake)
DO include only canonical, indexable URLs	DON’T include noindex, 404, or redirected URLs in sitemap
DO submit sitemap to Google Search Console	DON’T just publish sitemap submit it actively in GSC
DO declare sitemap URL in robots.txt	DON’T rely on GSC alone robots.txt declaration helps all crawlers
DO use sitemap index for sites with 1,000+ pages	DON’T put 50,000+ URLs in a single sitemap file
DO update lastmod only when content meaningfully changes	DON’T update lastmod daily to manipulate crawl frequency
DO monitor GSC for Discovered vs Indexed URL gaps	DON’T ignore sitemap errors in GSC for months at a time
DO separate sitemaps by content type for large sites	DON’T mix all URL types in one unmanageable sitemap file
DO use HTTPS URLs consistently throughout sitemap	DON’T mix HTTP and HTTPS URLs in the same sitemap

Section 12: Best XML Sitemap Tools today

Tool	Price	What It Does	Best For
Google Search Console	Free	Sitemap submission, monitoring, and error reporting. Shows URLs Discovered vs Indexed. Primary tool for all sitemap management.	Essential use for every site
Screaming Frog SEO Spider	Free / £149/yr	Crawl site and generate XML sitemaps. Audit existing sitemaps for errors, noindex pages, redirects, and missing canonical matches.	Sitemap generation and auditing
Yoast SEO (WordPress)	Free / $99/yr	Auto-generates sitemap_index.xml for WordPress. Separates posts, pages, categories. Updates automatically on publish/unpublish.	WordPress fastest setup
Rank Math (WordPress)	Free / $59/yr	Generates comprehensive sitemaps with image sitemap, video sitemap, and news sitemap support. Better control than Yoast for complex sites.	WordPress most feature-rich
XML-Sitemaps.com	Free (500 URLs) / $3.99	Online sitemap generator. Crawls site and generates XML. Free tier limited to 500 URLs. Good for small static sites.	Small static sites without CMS
Bing Webmaster Tools	Free	Submit sitemaps to Bing independently. Bing does not receive sitemap data from Google Search Console. Separate submission required.	Submitting to Bing search
Sitebulb	From $14/mo	Site crawler with detailed sitemap audit reports. Visual sitemap coverage maps. Identifies orphan pages not in sitemap.	Agency-level sitemap auditing
SE Ranking	From $44/mo	SEO platform with sitemap monitoring. Alerts when sitemap errors occur. Tracks indexation coverage over time.	Ongoing sitemap health monitoring

Section 13: 4 Critical XML Sitemap Mistakes

Mistake 1: Including Noindex Pages in the Sitemap

The most common technical SEO contradiction is including noindex pages in the XML sitemap. A noindex page tells Google “do not index this” but a sitemap entry says “please find and crawl this.” Google will respect the noindex directive and not index the page, but it still crawls the URL to read the noindex tag, wasting crawl budget.

More importantly, a sitemap full of noindex pages signals to Google that your sitemap is poorly maintained. Google’s crawler documentation explicitly warns that a sitemap containing many excluded pages reduces the overall crawl efficiency signal. Audit your sitemap with Screaming Frog by crawling in List mode (paste your sitemap URLs) and filtering by noindex. Remove all violations immediately.

Mistake 2: Not Updating the Sitemap After Publishing New Content

Many WordPress sites use plugins that auto-update sitemaps but custom sites, static sites, and some headless CMS setups require manual sitemap updates. New blog posts, new service pages, and new product pages that are not added to the sitemap may take significantly longer to be discovered and indexed than pages that appear in an up-to-date sitemap.

For high-frequency publishing operations (daily blog posts, frequent product launches), configure automatic sitemap regeneration as part of your publishing workflow. For manual sitemap sites, establish a process to update the sitemap within 24 hours of publishing new content. Request indexing via GSC URL Inspection for high-priority new pages rather than waiting for scheduled crawls.

Mistake 3: Ignoring the Discovered vs Indexed Gap in GSC

One of the most actionable signals in Google Search Console is the gap between “URLs Discovered” (how many URLs Google found in your sitemap) and “URLs Indexed” (how many Google actually indexed). A large gap say 10,000 discovered but only 4,000 indexed indicates that Google is systematically refusing to index a large portion of your content.

This is not a sitemap problem to fix it is a content quality signal. Google is saying these pages do not meet its quality threshold for indexing. Common causes: thin content (pages under 300 words with little unique value), near-duplicate content (product pages with minimal distinguishing content), or technical issues (pages loading too slowly for Google to fully render). The sitemap fixes the discovery problem; content and technical improvements fix the indexation rate.

Mistake 4: Submitting the Sitemap Once and Never Monitoring

Many site owners submit their sitemap to GSC on launch day and never return to the Sitemaps report. Meanwhile, plugin updates change the sitemap URL structure, migrations break sitemap accessibility, or new content types are added that should have separate sitemaps all while GSC silently reports errors that nobody reads.

Make the GSC Sitemaps report part of your monthly SEO review. Check status (Success vs Error), review Discovered vs Indexed numbers, look for new coverage errors in the Coverage report filtered by sitemap. Set up GSC email notifications for critical errors. Sitemaps are living documents that require ongoing maintenance, not a one-time setup task.

Section 14: Frequently Asked Questions About XML Sitemaps

Q1: Does an XML sitemap improve Google rankings?

An XML sitemap does not directly improve Google rankings it is not a ranking signal. What it does is improve crawl coverage and indexation completeness. If important pages on your site are not being indexed because Google is not discovering them, adding those pages to a sitemap can lead to them being indexed, which in turn allows them to rank. The ranking improvement comes from the indexation, not the sitemap itself. Sitemaps are most impactful for: new sites with few external links, large sites where internal linking is incomplete, pages deep in the site architecture that crawlers might not naturally reach, and pages with content updated frequently that you want recrawled quickly.

Q2: How do I create an XML sitemap for WordPress?

The easiest way to create an XML sitemap for WordPress is through an SEO plugin. Yoast SEO (free) automatically generates a sitemap at /sitemap_index.xml and keeps it updated as you publish and unpublish content. Rank Math (free) does the same with additional features including image sitemaps, video sitemaps, and more granular control over which post types are included. After installing either plugin, enable the sitemap feature in the plugin settings, then submit the sitemap URL to Google Search Console. No manual XML editing required. For e-commerce sites on WooCommerce, both plugins also include product and product category pages in the sitemap.

Q3: What is the difference between a sitemap and robots.txt?

A sitemap and robots.txt serve opposite but complementary purposes. An XML sitemap tells search engines which pages you want them to find and index it is an invitation list. robots.txt tells search engines which pages they are not allowed to crawl it is a no-entry sign. A robots.txt file can include a Sitemap: directive to help crawlers find your sitemap, linking the two files. The critical rule: never include pages in your sitemap that are also blocked by robots.txt. And never use robots.txt to block pages you want indexed use noindex meta tags instead, since robots.txt blocking prevents crawling but not necessarily indexing (Google can still index a URL it has seen linked elsewhere even if it cannot crawl it).

Q4: How often should I update my XML sitemap?

Your sitemap should reflect your current site content at all times. For sites using WordPress plugins like Yoast SEO or Rank Math, the sitemap updates automatically when you publish, update, or delete content no manual intervention needed. For manually managed sitemaps, update within 24 hours of publishing important new content. The date for each URL should only be updated when you meaningfully change that page's content not on a schedule. Sites that update all lastmod dates daily (a common spam technique) train Google to distrust their lastmod signals, reducing recrawl efficiency. Update accurately and let Google's own crawl patterns handle frequency.

Q5: How many URLs can I put in an XML sitemap?

Each individual XML sitemap file has a limit of 50,000 URLs and 50 megabytes (uncompressed). For sites exceeding these limits, use a sitemap index file an XML file that lists multiple individual sitemap files. A sitemap index can reference an unlimited number of child sitemaps, each of which must individually be under 50,000 URLs and 50MB. Most large e-commerce and publishing sites use a sitemap index architecture with separate sitemaps for different content types: posts, pages, products, categories, and images. Google Search Console allows you to submit both the sitemap index and individual child sitemaps though submitting the index alone is sufficient if it correctly references all children.

Q6: Should I include category and tag pages in my sitemap?

Whether to include category and tag pages depends on their content quality. Category pages with unique descriptive content, good internal linking, and a meaningful collection of posts are worth including they can rank for topic-level queries and support topical authority. Tag pages, on the other hand, are frequently thin: they list posts by a common tag with no unique page-level content. Including hundreds of thin tag pages in a sitemap dilutes sitemap quality and wastes crawl budget. The recommended approach: include category pages with real content, apply noindex to tag pages (or carefully consider whether any specific tags have enough content to be worth indexing), and never include both in the sitemap if they have the same posts.

Q7: What happens if my sitemap has errors?

Sitemap errors fall into two categories with different implications. Structural errors (malformed XML, wrong namespace, encoding issues) prevent Google from reading the sitemap at all the GSC Sitemaps report shows "Could not fetch" or "Parsing error." All URLs in the sitemap are effectively invisible to Google until the error is fixed. Content errors (individual URLs returning 404, being noindex, or being redirects) do not break the whole sitemap Google reads the other URLs normally but ignores the problematic entries. The risk is that a high proportion of errors trains Google to trust your sitemap less, potentially leading to less frequent recrawls. Fix structural errors immediately as they block all discovery. Fix content errors as part of regular monthly maintenance.

Q8: Do I need a sitemap for a small website?

For very small websites under 20 pages, well-internally-linked, with some external backlinks a sitemap is optional. Google will discover and index all pages through links without one. However, sitemaps are still recommended even for small sites for two reasons. First, they accelerate the initial indexation of new pages especially useful when launching a site before it has accumulated external links. Second, they provide a mechanism for submitting to Google Search Console and monitoring indexation coverage, which is valuable regardless of site size. The effort to create a sitemap for a small site using a CMS plugin or a generator tool is minimal, and the monitoring benefits alone make it worthwhile.

Q9: How do I submit a sitemap to Google?

To submit a sitemap to Google, you need Google Search Console access for your verified domain. Navigate to GSC and in the left sidebar find Indexing > Sitemaps. In the "Add a new sitemap" field, enter just the path of your sitemap (e.g., sitemap.xml or sitemap_index.xml not the full URL). Click Submit. Google will attempt to fetch and process the sitemap and report success or error status. Check back after 24–48 hours to see the Discovered URLs count and verify status shows "Success." Additionally, add your sitemap URL to robots.txt (Sitemap: https://yourdomain.com/sitemap.xml) to ensure all crawlers find it, not just Google.

Q10: What is a sitemap index file and when do I need one?

A sitemap index file is a meta-sitemap that lists other sitemap files rather than individual URLs. It is required when your site has more than 50,000 URLs (the per-file limit) or when you want to organise URLs into separate sitemaps by content type (posts sitemap, products sitemap, images sitemap). The sitemap index file itself is very simple it just contains entries each with a pointing to a child sitemap URL and an optional . Most WordPress SEO plugins (Yoast, Rank Math) create a sitemap index automatically at /sitemap_index.xml, with separate child sitemaps for posts, pages, categories, and custom post types. Submit the sitemap index URL to GSC Google will automatically find and process all child sitemaps.

Q11: Can I have multiple sitemaps for one website?

Yes having multiple sitemap files for one website is not just allowed but recommended for large sites. A sitemap index file can reference dozens or hundreds of individual sitemap files. Common multi-sitemap architectures include: separate sitemaps per content type (posts, pages, products, categories), sitemaps chunked by date for large news archives, and separate sitemaps for different media types (images, videos). Submit the sitemap index file to Google Search Console and it will automatically discover all child sitemaps. You can also submit individual child sitemaps directly if you want to see separate Discovered/Indexed counts per content type, which is useful for diagnosing indexation issues specific to one content type.

Q12: Why is Google not indexing pages from my sitemap?

When Google discovers but does not index pages from your sitemap, the cause is almost always content quality rather than a technical sitemap issue. Google's indexing is selective it only indexes pages it considers worth surfacing to users. Common causes of non-indexation despite sitemap inclusion: thin content (pages with minimal unique text), near-duplicate content (pages too similar to already-indexed pages), slow page load times (Google may abandon crawls of very slow pages), poor internal linking to the page (suggesting Google considers it low-priority), or a history of low-quality content on the domain reducing Google's trust in new pages. Use GSC's URL Inspection tool on non-indexed pages to see Google's specific reason for exclusion, then address the underlying issue rather than the sitemap.

Share this post :

Devyansh Tripathi

Devyansh Tripathi is a digital marketing strategist with over 5 years of hands-on experience in helping brands achieve growth through tailored, data-driven marketing solutions. With a deep understanding of SEO, content strategy, and social media dynamics, Devyansh specializes in creating results-oriented campaigns that drive both brand awareness and conversion.

All Posts