AI Markdown / Accept / .md ext / format

So, been a while since doing anything SS related with content types. Did a doc chat search, but only listed markdown documents.

This is completely raw thought from a X post about Vercel so I apologize that this isn’t well thought out. I know SS could do it, but was thinking maybe implement it more first class in SS using a convention of SS team’s choosing. PerplexityAI response on implementation.

The goal is if the markdown plugin (or different plugin for this specific case) is loaded render a markdown format of the same html, using the markdown view template if exists ( someview.cshtml → someview.md ) or somehow translate the html view to markdown as default.

Using API format/content types
.md
?format=md
accept:text/markdown

As outlined in the X post and from our estimation this would save lots of data transfer and tokens if it could be implemented. And if it was done directly in SS (via plugin) it could be quickly done.

Yes there is still the question of how to let AI know about this route since it isn’t a standard yet.

I’m curious on SS team’s thoughts on this as AI becomes more of a consumer of content.

Typically you wouldn’t convert html to markdown, you would maintain your content in markdown and then generate your HTML website from that.

The proposed standard for serving AI-friendly content is AI Agents is to serve markdown via /llms.txt and /llms-full.txt endpoints.

This is what the razor-ssg and razor-press templates let you do, where all content is maintained in markdown and both include support for llms-txt and llms-full.txt:

Here are some examples of llms.txt and llms-full.txt files generated from the Razor SSG and Razor Press website Templates:

Yeah, I’m thinking of existing sites where the content lives in html first and most likely didn’t use razor-ssg or razor-press. But yeah I see problems with generating md from html. I think the best is to have a .md view living next to the cshtml. I would think AI could generate a close enough template for the .md files, but would need the SS content type md plugin to know how to switch.

Yeah, for most basic sites that is enough. But for large sites that are more diverse it would be too big or give too much that may not be specific enough. I can see llms.txt being the entry point and pointing to specific categories (llms-space.txt llms-galaxy.txt llms-universe.txt that more align with what the AI is looking for. Then those llms-xyz.txt can point to the correct .md path. Similar to how a sitemap index file is currently used, but for llms.

Don’t know why you would go your own route instead of using llms.txt like most sites serving AI friendly content, inc Cloudflare, Vercel, Anthropic, etc:

Not sure how many MB you have of plain text docs you want to make available, all of ServiceStack docs fits in 1.5MB. It only requires an automated script that scans your docs to generate a plain text file.

Or you could maintain several /llms.txt for different sites but I don’t see why it needs to be more complicated than that.

We’ll do some research into the size. But I suspect the llms-full.txt would be gigantic for diverse content websites. Though I do like the idea that it is one/two files. The other issue would be keeping it updated as content is added/changed/removed, but maybe a daily refresh would be enough.

Thinking out loud… I wonder how using llms.txt affects (if at all) AI’s that use real time search for their results? Ultimately we want the latest results shown (AI SEO, GEO, AEO), while also using the least amount of data transfer for AI.

No idea, but if the world’s largest companies (inc LLM Authors) are adopting a simple standard, I’m personally not looking elsewhere for a more complicated solution.

The llms.txt in ServiceStack SSG Templates are pre-generated per commit on deployment. So docs.servicestack.net is automatically in-sync without any additional effort/maintenance.

So before AI these websites would do GB’s of data transfer per week. With AI we’re now doing TB’s of data transfer a week. Which is why we’re looking into what’s possible to reduce the data transfer.

It’s unlikely maintaining a llms.txt would reduce AI traffic, it’s effectively a problem for every website on the Internet. Many companies are adding a verify human guard to prevent bot traffic.

I’m personally using a User Agent Blocker to block bot traffic.

Yeah we’ve already have developed AI bot limiter and are still seeing how it affects SEO/GEO/AEO traffic. It’s still early so can’t draw any conclusions, but seems to have some effect on users coming from real time AI searches. Though it’s hard to draw a conclusion as AI changes.

FYI: https://x.com/Cloudflare/status/2021955521213800489

Time to consider not just human visitors, but to treat agents as first-class citizens. Cloudflare’s network now supports real-time content conversion to Markdown at the source using content negotiation headers.

all agents have to do is send “Accept: text/markdown” and we’ll automatically convert the response to Markdown on-the-fly for any enabled site. way fewer tokens. no need to handle it on your own.