Quality Engineering
min read
August 11, 2023

A Beginner's Guide to Sitemap Testing: Part 1

A Beginner's Guide to Sitemap Testing: Part 1
Table of contents

A sitemap is a comprehensive list of URLs on a website that provides information about the location and content of the website. It includes pages, videos, images, and files, along with their relationships to each other. This serves as a roadmap for search engines to navigate and understand the website's content.

There are different sitemap extensions, such as:

  • XML Sitemap
  • Mobile Sitemap
  • Video Sitemap
  • Image Sitemap


XML sitemap

An XML sitemap is a file that enumerates all the pages and other content present on a website, accompanied by metadata. It is aimed at assisting search engines in comprehending the site's structure. Google introduced the XML sitemap format in 2005 to streamline the process of crawling and indexing websites. XML sitemaps encompass metadata regarding URLs, which includes:

  • The date of the last update
  • The frequency of changes
  • Its priority compared to other pages
XML Sitemap

Mobile sitemap

A mobile sitemap is tailored for mobile devices. It comprises a catalog of URLs from a website, optimized for seamless mobile viewing. This version may also include extra metadata about each URL, encompassing details like the last updated date, the frequency of changes, and its priority in relation to other pages.

It is important to note that a distinct mobile sitemap is required only if the website includes a feature phone (WAP/WML) version. For the majority of websites, a unified sitemap is adequate for both desktop and mobile URLs.


Video sitemap

A video sitemap is a specialized form of a sitemap that contains additional information about videos hosted on a website. It offers essential metadata about the video content, including the title, description, thumbnail URL, and runtime.

This information aids search engines like Google in comprehending and indexing video content, thereby simplifying users' direct discovery and access to videos from search results.

Through the creation of a video sitemap, website owners can enhance the visibility and discoverability of the video content, potentially resulting in increased traffic, engagement, and conversions.

The namespace that defines the tags for video sitemaps is as follows:

Namespace to defines tags for video sitemaps

Image sitemap

An image sitemap provides search engines with information about the images on a website. This assists search engines in crawling and indexing images, potentially boosting visibility and attracting traffic from image search results.

The image sitemap includes metadata about each image, such as its URL, type, and caption. Using an image sitemap aids search engines in comprehending a website's content, contributing to improved overall search engine optimization.

The namespace establishes the image tags utilized in the image sitemap to relay metadata about website images to search engines like Google. For instance:

Namespace to defines tags for image sitemaps


Advantages of sitemaps

SEO ranking

Sitemaps offer a clear overview of a website's structure, simplifying user navigation and content discovery. A properly organized sitemap can significantly impact website ranking. It contains a list of the most crucial pages on a site, ensuring search engines can locate and crawl them effectively.


User-friendly navigation

Sitemaps enhance website navigation for both search engine crawlers and human users by presenting an organized and easily-followed layout of website content.


Improved crawling

Without a sitemap, search engines might struggle to crawl and index all pages on the site, potentially affecting the site's rankings. The term "improved crawling" refers to how search engine crawlers navigate and index a website more efficiently and accurately when a properly organized sitemap is present, instead of relying solely on the site's architecture.


Notifying about updates

By including metadata in sitemaps — such as the last modification date and different language versions of a page — one can effectively communicate updates to search engines and provide precise information about a page's content.


Checklist for testing sitemaps

Manually check the location of the sitemap

The most common (and simple) way to locate the XML sitemap of a website is to check manually. To initiate the process, input the website URL into the browser and explore various iterations, experimenting with different variations such as:

http://www.{yourwebsiteurl}.com/sitemap.xml

Here are a few additional quick tests to attempt:

  • /sitemap-index.xml
  • /sitemap.php
  • /sitemap.txt
  • /sitemap.xml.gz
  • /sitemap/
  • /sitemap/sitemap.xml
  • /sitemapindex.xml
  • /sitemap/index.xml
  • /sitemap1.xml


Best practices to follow

  • The best practice for a sitemap is to place it in the root directory Baseurl/sitemap.xml
  • The sitemap protocol format consists of XML tags. It is important to ensure that all data values in a sitemap are properly entity-escaped and the file is encoded in UTF-8 format. It will help ensure that search engines can correctly interpret the data in the sitemap, and accurately index the website.


Guidelines to follow

Format the sitemap correctly to ensure search engines can read and interpret it properly, and follow the steps below:

  • Start the sitemap with an opening <urlset> tag and ending it with a closing </urlset> tag. Adhere to the proper protocol standard specified within the <urlset> tag.
  • Specify the namespace (protocol standard) within the <urlset> tag.
  • Add each URL as a parent XML tag <url>.
  • Include a <loc> child entry for each <url> parent tag to specify the URL.


Check Robots.txt file

A robots.txt file is a text document that guides web robots or crawlers on which parts of a website they can or cannot access. It prevents the site from getting overwhelmed by requests.

Crawlers usually check the robots.txt file first when they visit a site. So, it's important to include a sitemap path there. This tells search engines about the pages you want them to find. To see if it's set up correctly, enter your site's URL followed by '/robots.txt' in your browser, for example, https://www.qed42.com/robots.txt.

For more details, check The Importance of Robots.txt.

Robots.txt

Test mobile XML sitemap

To test a mobile XML sitemap, utilize online tools to check for validity and identify any errors or warnings. One useful tool is Google Search Console, offering a Mobile Usability report to pinpoint mobile compatibility issues on site.

Another option is the W3C mobileOK Checker, which assesses mobile sitemap and offers feedback on areas for enhancement. This tool measures the site's mobile usability against industry standards, suggesting improvements to enhance mobile compatibility.


Alternate language directive for XML sitemap

If there are different language versions of the website, use the "hreflang" tag in the XML sitemap to indicate the relationship between pages in various languages. This can help search engines understand which version of the website to display in search results based on user language and location preferences. Additionally, including this information in the XML sitemap can help search engines properly index the website for international SEO.


Pages added to optimize crawling and indexing

The sitemap should not include all  pages. If everything is included, it may lead to poorly optimized crawling, which means crawling low-quality pages. This can hinder the indexing of high-quality pages on the site because search engines might not have the resources to crawl them

Ensure that the pages included in a sitemap:

  • Respond with a 200 code
  • Are not blocked by robots.txt
  • Don't include a no-index meta robots tag
  • Have canonical versions


Make sure that the sitemap is broken into smaller sections if it contains more than 50,000 URLs. The maximum size of the sitemap should be 10 MB.

Check uniformity and completeness of URLs within the sitemap by confirming the presence of the HTTP/HTTPS protocol and "www" in all URLs.

Utilize sitemap extensions for additional media content such as images, videos, and news.


Conclusion

In this blog, we've covered various sitemap types: XML, mobile, video, and image sitemaps. We've looked at how they can benefit SEO ranking, improve navigation, and enhance crawling. We've stressed the importance of correctly formatting and validating sitemaps to ensure search engines interpret and index the site accurately.

Don't forget to use the checklist we provided to test the sitemap! We've emphasized testing mobile XML sitemaps and using alternate language directions to enhance crawling and indexing.

In the 'A Beginner's Guide to Sitemap Testing: Part 2' blog post, we delve into the tools that assist with sitemap testing. These tools include Google Search Console, XML Sitemap Validator, and SEOptimer. Additionally, we demonstrate real-time examples using Screamingfrog to test a sitemap.

Remember, a well-tested and properly optimized sitemap is a valuable asset for improving your website's search engine visibility and user experience.

Written by
Artwork by
No art workers.
We'd love to talk about your business objectives