Skip to content

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Sep 1, 2025

Changes Made

Adds noindex tag to all docs that are not /stable/

As per https://aioseo.com/docs/when-to-use-noindex-or-the-robots-txt/

The biggest difference to understand is that if you want search engines to not include content in search results, then you MUST use the NOINDEX tag and you MUST allow search engines to crawl the content. If search engines CANNOT crawl the content then they CANNOT see the NOINDEX meta tag and therefore CANNOT exclude the content from search results.

So if you want content not to be included in search results, then use NOINDEX. If you want to stop search engines crawling a directory on your server because it contains nothing they need to see, then use “Disallow” directive in your robots.txt file.

i.e. we should:

  1. Allow for crawling of all docs (fix robots.txt) NOTE: this is not done in this PR, see below for why
  2. Add a canonical link for everything that is not stable NOTE: this is done already
  3. Add a tag for everything that is not stable

This PR just does (3). This means that starting from 0.6.0, all versioned docs will have the noindex tag. However, it is not yet safe to modify our robots.txt because all docs <0.6.0 still do not contain noindex. We should only modify the robots.txt once we have that guarantee.

Testing

@jaychia jaychia requested a review from ccmao1130 as a code owner September 1, 2025 20:47
@github-actions github-actions bot added the docs label Sep 1, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR implements SEO optimization for the documentation site by adding a noindex meta tag to all documentation pages that are not the stable version. The implementation involves three key components working together:

  1. MkDocs Hook System (docs/hooks.py): A new hook file that extracts the READTHEDOCS_VERSION environment variable and makes it available in the MkDocs configuration as config.extra.rtd_version. This follows the standard MkDocs plugin pattern for extending functionality.

  2. Template Logic (docs/overrides/main.html): The HTML template is modified to include a conditional Jinja2 block that adds <meta name="robots" content="noindex"> when config.extra.rtd_version != 'stable'. This ensures only non-stable versions get the noindex directive.

  3. Configuration Integration (mkdocs.yml): The hooks system is enabled by adding the hooks file to the MkDocs configuration, allowing the hook to execute during the build process.

This change addresses a common documentation SEO challenge where multiple versions of the same documentation can compete in search results. By implementing the noindex tag, only the stable version will appear in search engines while development, beta, and older versions remain accessible but hidden from search indexing. The approach follows SEO best practices by allowing crawling (so search engines can see the noindex tag) while preventing indexing of non-canonical versions. This is part of a phased rollout strategy where robots.txt modifications will follow once all versions below 0.6.0 include the noindex tag.

Confidence score: 5/5

  • This PR is safe to merge with minimal risk as it implements a well-established SEO best practice with straightforward logic
  • Score reflects simple, isolated changes that follow standard MkDocs patterns and have clear, predictable behavior
  • No files require special attention as all changes are straightforward and well-contained

3 files reviewed, no comments

Edit Code Review Bot Settings | Greptile

@jaychia jaychia merged commit 293ff11 into main Sep 1, 2025
30 checks passed
@jaychia jaychia deleted the jay/docs-noindex branch September 1, 2025 20:57
venkateshdb pushed a commit to venkateshdb/Daft that referenced this pull request Sep 6, 2025
## Changes Made

Adds noindex tag to all docs that are not `/stable/`

As per https://aioseo.com/docs/when-to-use-noindex-or-the-robots-txt/

> The biggest difference to understand is that if you want search
engines to not include content in search results, then you MUST use the
NOINDEX tag and you MUST allow search engines to crawl the content. If
search engines CANNOT crawl the content then they CANNOT see the NOINDEX
meta tag and therefore CANNOT exclude the content from search results.
> 
> So if you want content not to be included in search results, then use
NOINDEX. If you want to stop search engines crawling a directory on your
server because it contains nothing they need to see, then use “Disallow”
directive in your robots.txt file.

i.e. we should:

1. Allow for crawling of all docs (fix robots.txt) `NOTE: this is not
done in this PR, see below for why`
2. Add a `canonical` link for everything that is not `stable` `NOTE:
this is done already`
3. Add a <noindex> tag for everything that is not `stable`

This PR just does (3). This means that starting from 0.6.0, all
versioned docs will have the `noindex` tag. However, it is not yet safe
to modify our robots.txt because all docs <0.6.0 still do not contain
`noindex`. We should only modify the robots.txt once we have that
guarantee.

## Testing

* Confirmed that <noindex> tag now appears in this build:
https://getdaft-docs--5105.org.readthedocs.build/en/5105/quickstart/
* Previously not appearing in
https://docs.daft.ai/en/v0.5.22/quickstart/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant