Machine Legibility Index

Whose voice is
in the room?

MLI measures whether your site can be found, parsed, trusted, and acted on by AI agents — or whether another voice answers in your place.

4 pillars 12 criteria scored 0–100 audit runs in < 1 min

Read the Framework How to Run an Audit

The web has a new class of user

For most of the web's history, websites had one type of user: a human. That changed. Websites are now increasingly visited, read, and acted upon by AI agents — software that navigates interfaces, fills forms, extracts information, and completes tasks on behalf of a person. These agents power AI search, voice assistants, research tools, and agentic applications that don't just answer questions but actually do things: book appointments, compare services, file forms, surface resources.

A human reads visual context and infers meaning from layout. An AI agent cannot. It reads the underlying code — HTML, ARIA attributes, structured data — and if that code doesn't clearly communicate what an element is, what the page is about, or whether the information is current, the agent fails silently or moves on. Sites built only for human perception are, increasingly, hard to surface — or skipped entirely — by the agents that mediate access to them.

This is structural, not stylistic. The same content can be highly legible or barely parseable to an agent, depending on whether it's served as structured data with explicit relationships or as unstructured prose the agent has to interpret. For agents, format is meaning.

By the numbers

The signals an AI agent uses to find and trust a site are concentrated where resources are.

94.1%: declare a robots.txt crawler contract — the practice is decades old and near-universal.
4.5%: have extended that contract to AI crawlers specifically — up from 2.6% the year prior.
5×: the gap between top-1,000 sites and the broader web in declaring AI directives (20.9% vs. 4%).

Source: Web Almanac 2025, HTTP Archive — Generative AI chapter. The Web Almanac is a community-run annual survey of millions of pages, with no commercial stake in any finding.

Different sites, different stakes

Agent illegibility plays out two ways. The mechanism is the same — the site that gets surfaced is the site whose claims are machine-readable — but the costs are different for a commercial site than for a public-interest one. MLI measures where any given site stands.

Commercial reach

Brands, retailers, and service businesses whose customers increasingly arrive — or are routed away — through AI-mediated channels. A booking agent that can't read a hotel's date inputs sends users to a competitor. A shopping assistant that can't find Product schema doesn't surface the product. The failure is invisible: no error, no crash, no "agent abandoned this funnel" metric. The site just doesn't get surfaced.

Civic and service access

Consider a Spanish-speaking person who has just received a Notice to Appear in immigration court. They have a short window — sometimes weeks — to find legal help. They ask an AI agent in Spanish what to do. The agent surfaces results: federal agency pages, nonprofit legal aid clinics, restrictionist organizations, notarios advertising services they aren't licensed to provide.

The agent doesn't adjudicate which voice is most authoritative. It surfaces what is most legible.

The community legal aid clinic with strong programs but no LegalService schema, no availableLanguage: "es" declaration, and no machine-readable dates on its rights-and-deadlines page is, to the agent, not in the conversation.

Same content, two surfaces

A service rendered for a human reader, and the structured data underneath. The agent works from the right column.

What a human sees

Casa Esperanza Legal Clinic

Free immigration legal help in Maryland.

Languages:: English, Spanish
Cost:: Free
Walk-ins:: Tuesdays 9am–12pm

What an agent reads

{
  "@context": "https://schema.org",
  "@type": "LegalService",
  "name": "Casa Esperanza Legal Clinic",
  "areaServed": "Maryland",
  "availableLanguage": ["en", "es"],
  "isAccessibleForFree": true,
  "openingHours": "Tu 09:00-12:00",
  "serviceType": "Immigration legal services"
}

Both columns describe the same clinic. The left is rendered text — a visual layout the agent doesn't see. The right is the structured markup that lets an agent answer "where can I get free immigration help in Maryland in Spanish?" by filtering on jurisdiction, cost, and language. Without the right column, the agent has nothing to filter on — even when the left column is on the page.

What MLI measures

MLI is a structured audit across four dimensions. Each pillar has three criteria; each criterion is scored 1–5. The total MLI is a 0–100 score.

Identity

Who does the site say it is?

Whether the organization, its services, and its authority are declared in machine-readable form — or left for an agent to infer from third-party mentions. Without an Organization declaration, aggregators and review platforms describe you before you do. For a public-interest organization, it means an aggregator's outdated record — or a lookalike — answers in your place. For a business, a Yelp profile answers first.

Reachability

Can an agent fetch and read the content?

Whether the site declares a contract for AI access, serves content in initial HTML rather than behind JavaScript, and reaches users in their language. A site with Spanish-speaking visitors that doesn't declare hreflang or availableLanguage loses ranking advantage in language-specific agent queries — it's discoverable, but lower in results.

Structure

Can the agent understand what it reads?

Once an agent has fetched your content, can it tell what kind of information it is and extract the right pieces? The page needs to declare its type (FAQ, service listing, contact page), hierarchy, and structure. Without it, an agent reads the text but doesn't know whether to extract as Q&A pairs, location details, or narrative prose.

Currency

Are the site's claims still true?

Whether dates, deadlines, eligibility windows, and service availability are machine-readable. Without them, an agent has to guess whether to surface the information or treat it as stale — and stale guesses get skipped.

Read the full criteria specification →

What your score means

Scores translate to four bands, each describing a real position in agent-mediated discovery.

80–100

In the room

Agents can find, parse, trust, and act on this site reliably.

60–79

Partially audible

Agents surface this site sometimes; some claims are illegible.

40–59

Hard to surface

Agents can find the site but cannot reliably parse or act on it.

0–39

Not in the conversation

Agents cannot meaningfully read this site. Another voice answers in its place.

Ready to audit your site?

Install the MLI Chrome extension and run an audit of any page in seconds. Or read the full methodology to understand what each criterion measures and why.

Get the Extension Read the Methodology

Whose voice isin the room?