feat(website): add llms.txt support for LLM-friendly content#13932
feat(website): add llms.txt support for LLM-friendly content#13932
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
Add support for generating llms.txt and .llms.md files for Quarto websites, providing LLM-friendly markdown versions of HTML pages. Features: - New `llms-txt: true` option in website config - Generates .llms.md companion files alongside HTML output - Creates llms.txt index file linking to all markdown pages - Converts HTML to clean markdown using Pandoc with Lua filter - Handles callouts (blockquotes with bold type markers) - Converts images to markdown syntax - Converts internal links from .html to .llms.md - Respects draft settings (excludes drafts from output) - Cleans listing pages (removes empty links, category badges) - Matches sitemap behavior for incremental builds New files: - src/project/types/website/website-llms.ts - src/resources/filters/llms/llms.lua Test coverage: - Basic file generation - Content conversion (callouts, code, tables, links) - Draft handling - Listing page cleanup Co-Authored-By: Claude Opus 4.5 <[email protected]>
20d8156 to
b5a829e
Compare
…tests - Add **/*.llms.md to projectHiddenIgnoreGlob() to prevent cascading renders of llms.txt companion files - Fix ensureLlmsTxt* test functions to use dirname(htmlFile) instead of treating file path as directory - Update llms-txt test files to use correct two-element array format for regex matches [matches, no_matches] - Add render-project: true where needed for llms.txt generation Co-Authored-By: Claude Opus 4.5 <[email protected]>
…tibility Use pathWithForwardSlashes() to ensure paths in llms.txt use forward slashes on all platforms. Also adds changelog entry for the llms-txt feature. Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Overall, it looks good! I think this is a great way to do it. So I had a look at pkgdown implementation to compare:
For example Definition list are converted to bullet list. Probably because GFM does not support them
unless activated but is this really GFM syntax ? An example of their output: https://pkgdown.r-lib.org/llms.txt I am thinking among good ideas:
Just some ideas - I am sure we'll have more feedback when this will be tested. |
| return { | ||
| name: `File ${llmsFile} exists`, | ||
| verify: (_output: ExecuteOutput[]) => { | ||
| verifyPath(llmsFile); | ||
| return Promise.resolve(); | ||
| }, |
There was a problem hiding this comment.
We have fileExists() if we want to refactor and avoid duplication
Lines 228 to 236 in 6c8a9b1
| return { | ||
| name: `File ${llmsFile} does not exist`, | ||
| verify: (_output: ExecuteOutput[]) => { | ||
| verifyNoPath(llmsFile); | ||
| return Promise.resolve(); | ||
| }, |
There was a problem hiding this comment.
We have pathDoNotExists() if we want to reuse and avoid code duplication
Lines 238 to 246 in 6c8a9b1
| // Verify the llms.txt index file in a website output directory. | ||
| // Takes the HTML file path and looks for llms.txt in the same directory. | ||
| export const ensureLlmsTxtRegexMatches = ( | ||
| htmlFile: string, | ||
| matchesUntyped: (string | RegExp)[], | ||
| noMatchesUntyped?: (string | RegExp)[], | ||
| ): Verify => { | ||
| const llmsTxtPath = join(dirname(htmlFile), "llms.txt"); | ||
| return verifyFileRegexMatches(regexChecker, `Inspecting ${llmsTxtPath} for Regex matches`)(llmsTxtPath, matchesUntyped, noMatchesUntyped); | ||
| }; |
There was a problem hiding this comment.
This verify helper is to be used only for index.qmd or another .qmd test that will be at the root of the output dir right ?
If we need to have verify function that works on output-dir as input, it is just a matter of adding the function as special handling in smoke-all.test.ts and you could have
export const ensureLlmsTxtRegexMatches = (
outputDir: string,
matchesUntyped: (string | RegExp)[],
noMatchesUntyped?: (string | RegExp)[],
): Verify => {
const llmsTxtPath = join(outputDir, "llms.txt");
return verifyFileRegexMatches(regexChecker, `Inspecting ${llmsTxtPath} for Regex matches`)(llmsTxtPath, matchesUntyped, noMatchesUntyped);
};But I guess this is just a matter of being sure to use ensureLlmsTxtRegexMatches() only in compatible source document.
Just a thought while reviewing the new functions in verify.ts
| return { | ||
| name: `File ${llmsTxtPath} exists`, | ||
| verify: (_output: ExecuteOutput[]) => { | ||
| verifyPath(llmsTxtPath); | ||
| return Promise.resolve(); | ||
| }, | ||
| }; |
There was a problem hiding this comment.
Same - could be fileExists()
| return { | ||
| name: `File ${llmsTxtPath} does not exist`, | ||
| verify: (_output: ExecuteOutput[]) => { | ||
| verifyNoPath(llmsTxtPath); | ||
| return Promise.resolve(); | ||
| }, | ||
| }; |
There was a problem hiding this comment.
And same could be pathDoNotExists
Add support for generating llms.txt and .llms.md files for Quarto websites, providing LLM-friendly markdown versions of HTML pages.
Features:
llms-txt: trueoption in website configNew files:
Test coverage: