HTML vs Markdown vs SOM: Which Format Should Your AI Agent Use?
Every AI agent that browses the web faces the same question: how do you represent a web page to a language model? The default answer, raw HTML, is expensive and slow. A typical page dumps 30,000+ t...

Source: DEV Community
Every AI agent that browses the web faces the same question: how do you represent a web page to a language model? The default answer, raw HTML, is expensive and slow. A typical page dumps 30,000+ tokens into your context window, most of it CSS classes and layout divs. But what are the actual alternatives? And do they work? We ran WebTaskBench, 100 tasks across GPT-4o and Claude Sonnet 4, to find out. The results surprised us. The Three Representations When an agent needs to understand a web page, there are three common approaches: 1. Raw HTML The DOM as-is. Every <div>, every class="sc-1234 flex items-center gap-2", every inline script. This is what most agents send today. <div class="sc-1234 flex items-center gap-2 px-4 py-2"> <a href="/about" class="text-blue-500 hover:underline font-medium tracking-tight text-sm">About</a> <span class="text-gray-400">|</span> <a href="/pricing" class="text-blue-500 hover:underline font-medium tracking-tight tex