In the past weeks we have noticed some very interesting information and confirmations from Google about sitemaps and their processing, especially from the well known duo John Mueller and Gary Illyes. Below, you can see a summary of 5 of their most important and most interesting facts that they have mentioned about sitemaps:
In Search Console, you can see more indexed pages than submitted
It’s strange, but sometimes, in Google Search Console, in your sitemap files section, you can see more indexed pages than you’ve submitted through sitemap files. Have you ever seen that before? This can occur if you have submitted more than one sitemap and some URLs appear in more than one of them.
According to John’s tweet, it seems like the Search Console counts the number of submitted URLs based on the number of unique URLs inside all of your sitemaps, but the number of indexed URLs from the number of indexed URLs inside each of your sitemaps (it means, it also counts the same URLs used in another sitemap).
The same URL in multiple sitemaps is counted separately, which is why you could see something like that. I'd keep URLs in a single sitemap.
— John ☆.o(≧▽≦)o.☆ (@JohnMu) August 16, 2017
Google ignores tag in sitemaps
According to another one of John’s tweets, it seems like Google ignores
John’s tweet about how Google ignores priority in sitemap:
We ignore priority in sitemaps.
— John ☆.o(≧▽≦)o.☆ (@JohnMu) August 17, 2017
This tweet has confirmed the first part of an article from Seroundtable from 2015 about priority tags:
https://www.seroundtable.com/google-priority-change-frequency-xml-sitemap-20273.html
<changefreq>
tag is probably not very useful in sitemaps but that hasn’t been confirmed by Google yet.
Google considers <loc> and <lastmod> tags as the two of the most important parts of every sitemap (lastmod only if it’s used correctly). If you want to learn more about how to use and properly format lastmod from Gary Illyes then you can read the response on Stackoverflow here:
https://stackoverflow.com/questions/31349345/how-to-properly-format-last-modified-lastmod-time-for-xml-sitemaps?stw=2
Google doesn’t support nested sitemaps, use sitemap index instead
If you input your sitemap URL into your other sitemap then Google is probably going to have problems processing them. If you want to help Google map your other sitemaps of your website, use sitemap index instead of that:
We support sitemap index files, but not nested sitemap files (sitemap in sitemap index is OK, sitemap in sitemap not)
— John ☆.o(≧▽≦)o.☆ (@JohnMu) August 13, 2017
Google also doesn’t support nested sitemap indexes
What’s more interesting is how Google handles nested sitemap index files (sitemap indexes inside sitemap index). In this case, Google is strict and will probably not be able to process files like these either:
Off-hand, it looks like you have sitemap index files in sitemap index files, which isn't supported.
— John ☆.o(≧▽≦)o.☆ (@JohnMu) June 28, 2017
Submitting a sitemap with NOINDEX URLs can speed up the deindexation process
If you have any URL and you want to force Google to crawl it, you can use Submit URL tool from Google. But what if you have a lot of these URLs?
Gary Illyes confirmed on Twitter that anything you input into your sitemap will generally be processed sooner. So if you need to let Google know about a bunch of NOINDEX URLs, you can simply add them temporarily inside your sitemap:
@nishanthstephen generally anything you put in a sitemap will be picked up sooner
— Gary "鯨理" Illyes (@methode) October 13, 2015