X-Robots-Tag: Technical Guide to HTTP Indexing Headers

The X-Robots-Tag is an HTTP response header that tells search engine crawlers how to index and display web resources. Unlike the robots meta tag, which lives in a page's HTML code, this header works for both HTML and non-HTML files like PDFs, images, and videos. Use it to manage search engine behavior at the server level, especially for resources where you cannot add HTML tags.

What is X-Robots-Tag?

The X-Robots-Tag is a de-facto standard used to communicate with search bots and web crawlers. While it is not part of an official specification, it is widely supported by major search engines. Google [added support for the X-Robots-Tag directive in July 2007] (Yoast) to give site owners more flexibility in controlling how specific file types appear in search results.

It functions as part of the HTTP response for a URL. When a crawler requests a file, the server sends this header to provide indexing rules before or as the content is being processed.

Why X-Robots-Tag matters

This header provides several advantages for technical SEO and site management:

Non-HTML control: It is the only way to apply indexing rules like noindex to files like PDFs, Word documents, or image files that lack a <head> section.
Crawl budget efficiency: The header provides instructions before the crawler fully processes the content, helping search engines spend less time on unimportant pages.
Site-wide application: You can set rules for entire directories or specific file extensions across a whole site using server configuration files.
Granular snippet control: You can specify how much text or video content search engines can use in their results pages.
Link equity management: Use the nofollow directive to prevent crawlers from following links on specific files, helping maintain the distribution of link juice.

How X-Robots-Tag works

The X-Robots-Tag is typically added through your web server's configuration files (such as .htaccess on Apache or .conf on NGINX). It can apply to all crawlers or be targeted to specific user agents by including the bot name before the directive.

The syntax allows for multiple rules to be combined in a single header, separated by commas. If a crawler encounters multiple X-Robots-Tags or a conflict between this header and a robots meta tag, it will follow the most restrictive rule. For example, if one tag says max-snippet:50 and another says nosnippet, the nosnippet rule wins.

Directives and Syntax

Standard indexing rules include: * noindex: Prevents the resource from appearing in search results. * nofollow: Tells bots not to follow links found in the resource. * none: Equivalent to both noindex and nofollow. * noimageindex: Prevents images on the page from being indexed. * nosnippet: Stops text snippets or video previews from showing in results. * unavailable_after: [date]: Requests that a resource be removed from search results after a specific time.

Best practices

Use it for non-HTML resources. If you have sensitive PDF documents or images that should not appear in search results, the X-Robots-Tag is the standard tool for blocking them.

Optimize your crawl budget. On large websites, use the header to provide indexing instructions to crawlers before they load the full content. This ensures bots spend their time on high-value pages.

Combine directives for efficiency. You can set multiple rules in one line, such as X-Robots-Tag: noindex, nofollow, to keep your HTTP headers clean.

Target specific crawlers cautiously. You can specify rules for googlebot while allowing other bots to crawl normally, but remember that search engines use the sum of all negative rules that apply to them.

Verify header implementation. Use tools like Google Search Console's URL Inspection or browser extensions to confirm the server is correctly sending the headers.

Common mistakes

Mistake: Blocking the URL in robots.txt while using an X-Robots-Tag. Fix: Ensure the URL is crawlable. If a bot is blocked by robots.txt, it will never see the HTTP header, meaning it might still index the URL if it finds a backlink elsewhere.

Mistake: Putting robots directives inside the robots.txt file. Fix: Use the X-Robots-Tag instead. Google [officially stopped accepting indexing directives like noindex within the robots.txt file in September 2019] (SE Ranking).

Mistake: Leaving noindex headers on live staging pages. Fix: Always audit your HTTP headers during a site migration or when moving from staging to production. A forgotten noindex can result in a total loss of organic traffic.

Mistake: Removing URLs from the sitemap immediately after adding a noindex tag. Fix: Keep noindexed pages in your sitemap until they are fully removed from search indexes so that crawlers find the new directive faster.

Examples

Example scenario: Blocking PDF files

If you want to prevent all PDF documents on an Apache server from being indexed, you would add this to your .htaccess file:

<Files ~ "\.pdf$">
  Header set X-Robots-Tag "noindex, nofollow"
</Files>

Example scenario: targeted bot instructions

In NGINX, you can tell Googlebot not to follow links while allowing a fictional "BadBot" No access at all:

location / {
  add_header X-Robots-Tag "googlebot: nofollow";
  add_header X-Robots-Tag "BadBot: noindex, nofollow";
}

Example scenario: Scheduling content removal

To tell search engines to stop showing a page after a specific date (using ISO 8601 or RFC 822 format):

X-Robots-Tag: unavailable_after: 2025-12-31 23:59:59 EST

X-Robots-Tag vs Robots Meta Tag

Feature	Robots Meta Tag	X-Robots-Tag
Location	HTML `<head>` section	HTTP response header
File Types	HTML only	HTML and non-HTML (PDF, images, etc.)
Implementation	Easy (CMS or HTML)	Moderate (Server config)
Bulk Editing	Complicated	Easy via server rules
Support	Most search engines	Most major search engines

Rule of Thumb: Use the robots meta tag for simple, page-level HTML changes. Use the X-Robots-Tag for non-HTML files, site-wide rules, or to save crawl budget on large sites.

FAQ

Can I use both a robots meta tag and an X-Robots-Tag on the same page? Yes, you can use both, but it is usually unnecessary. Search engines will combine the rules from both. If there is a conflict, they will follow the most restrictive instruction. For example, if the meta tag says index but the HTTP header says noindex, the page will not be indexed. It is cleaner to choose one method and use it consistently to avoid confusion during technical audits.

Does X-Robots-Tag help with crawl budget? Yes, it can be more efficient than a meta tag. Because the X-Robots-Tag is part of the HTTP header, the crawler receives the indexing instructions before or during the initial fetch. On very large sites, providing these instructions at the header level allows the bot to understand its restrictions without having to fully parse the HTML body. This helps search engines manage their resources more effectively when visiting your site.

Will NGINX and Apache both support X-Robots-Tag? Yes, both major web server types support adding custom headers like X-Robots-Tag. On Apache, you typically use the Header set directive within a .htaccess or httpd.conf file. On NGINX, you use the add_header directive within your server or location blocks. Because it uses regular expressions in these configuration files, you can target specific file extensions or entire directories very easily.

How do I check if my X-Robots-Tag is working? You can check the "Response Headers" section of any URL using your browser's Developer Tools (Network tab). Look for the X-Robots-Tag entry. Alternatively, you can use the URL Inspection tool in Google Search Console. It will specifically show if a page is "Excluded by 'noindex' tag" and clarify whether that directive was found in a meta tag or an HTTP header.

Why is my 'noindex' header being ignored? The most common reason is a conflict with your robots.txt file. If the robots.txt file "disallows" a crawler from visiting a URL, the crawler will never download the page to see the HTTP header. Consequently, if the page was previously indexed, it might stay in the index because the bot cannot "see" the new noindex instruction. You must allow the page in robots.txt for the X-Robots-Tag to be detected.