Sitemap & Robots.txt
What are sitemaps and robots.txt?
Two key tools for helping search engines understand and properly index your website are an XML sitemap file and Robots.txt file.
- XML sitemap: An XML document that shows search engines the overall structure and inter-relationship of content on your site.
- Robots.txt: File declaring what should be excluded from indexing.
Why they matter
In combination both files should provide a comprehensive and accurate picture of what content you want a search engine to index and how it is organized.
What to do
In short: Use a sitemap that lists what you want indexed, and doesn’t list what you don’t want indexed. List folders or files you don’t want indexed on your robots.txt file as well.
Additional technical items:
- Content: We recommend listing all and only resolvable HTML pages, excluding parameterized content and archive files.
- Date-based archive pages such as the below have little value for organic search
- https://www.yoursite.com/2010/05/01/
- https://www.yoursite.com/2010/05/07/
- https://www.yoursite.com/2010/05/14/
- Date-based archive pages such as the below have little value for organic search
- Hosting: Hold sitemaps on a verified domain.
- Size: Individual sitemaps should be 50MB or less when uncompressed and each should hold no more than 50,000 individual URLs.
- Placement: The sitemap should either be placed at /sitemap.xml or the location signposted from the robots.txt file.
Multiple sitemaps can be listed in a sitemap index file (at /sitemap_index.xml) for easy parsing by search engines.
Localized content & Hreflang tags
What it is
Hreflang tags show the relationships between pages on the same topic for different regions or languages.
The example above shows search engines where equivalent homepages are in different languages.
- On the German language homepage
- <link rel=”alternate” href=”https://yoursite.com/” hreflang=”en-us” />
- On the US English language homepage
- <link rel=”alternate” href=”https://yoursite.com/” hreflang=”de-de” />
Hreflang tags can also be used across different domains. For example:
- On the German language homepage
- <link rel=”alternate” href=”https://yoursite.com/” hreflang=”en-us” />
- On the US English language homepage
- <link rel=”alternate” href=”https://yoursite.de/” hreflang=”de-de” />
Why it matters
Setting hreflang tags helps search engines serve the right content to users in the right place. It also helps search engines understand what might look like near duplicate content that is in fact aimed at audiences in different locations.
What to do
- When hosting content in more than one language, use hreflang tags to denote the relationship between pages across domains or within a single domain.
- Moz.com provides an excellent overview, which is kept up to date.