The internet was AI-native (25 years ago, not now)

Argument

If you want to build an AI tool that can perform some service, the state of the art is MCP (Model context protocol) wherein the functionality is exposed via a server that defines actions the AI can perform. Server connections must be configured by the user beforehand. This means using and discovering new functionality requires plugging in new MCP servers, and new functionality cannot be dynamically added by the running client.

In contrast, plain ol' websites can provide dynamic functionality. They can specify arbitrary new actions in the form of buttons, forms, & links. MCP servers can't do this. They can't say "go to this other mcp server now", or "Now that I've given this response, here's a new set of possible actions you can perform, dynamically provided by the server." This is blocked by the user manually needing to add new MCP connections with predefined protocols.

The thing is... why don't we just make AI, you know, use websites?

There is no good reason why the content of a website can't represent the actions an AI could take just as well as what a human could do. Want to provide AI with an action? Put a button on the html of your site, and give the AI the ability to click the button and get a new HTML page back with the new state of the program - and we've just built (reused) a universal AI interface!

But this doesn't work. You run into another problem espoused by the modern direction of the internet. You see, this would have worked 25 years ago, when html websites were simple, with a few links, forms, and buttons, and not much more. but now you have sites where the semantic structure of HTML has taken a backstage to dynamic functionality provided by javascript. Using javascript to makes UIs much more dynamic is the reality of the modern interactive internet, but it makes the HTML much less meaningful. The application state is not just links and buttons - it's links and buttons, plus arbitrary code execution. This isn't a self-describing interface in the same way.

What's interesting is that Roy Fielding, the CREATOR of modern internet API design (REST), had a vision where "hypermedia is the engine of application state" - where the HTML is self-descriptive of what actions can be performed. Is this not exactly what we want to build reusable AI interfaces? Have we not lost this to time?

Thanks for listening to my ramblings.

I'm interested in HTMX and LD-JSON as possible solutions to this problem. Links for more reading below.