What We Learned

2026-04-25
framework designscoringUX of audit reports

Curated findings beat deterministic mappings

The original extension mapped score thresholds to priority bands automatically: every criterion scoring 1 became a P1, every 2 a P2, and so on. The output of an audit on a low-scoring site was a wall of 18 findings — overwhelming, undifferentiated, not actionable.
We replaced that with a curated rule: surface the top 5–7 findings ranked by score impact if fixed. The remaining issues still appear in a roll-up, but the headline is a focused fix list. The framework is more honest about what matters, and the report is something a reader can actually do something with.

Why it matters. Scoring everything is not the same as helping someone act. The transformation from "exhaustive" to "prioritized" is where a diagnostic becomes a tool. This pattern almost certainly generalizes: any audit-shaped tool with a long-tail finding distribution probably wants curation, not deterministic mapping.
2026-04-25
framework designscoringtrust

Skip-if rules eliminate phantom findings

Early versions of the audit defaulted criteria to "not implemented" when the underlying feature wasn't present on the page. A static landing page without a form would score 1 on form-related criteria, dragging the total down for a "failure" that wasn't real. Reading the report, you'd see a list of fixes that didn't apply to your site — eroding trust in everything else the report said.
The fix: each criterion has an explicit skip-if rule. If the page genuinely doesn't have an actionable form, the action-completion criteria are skipped, not failed — excluded from the pillar average and from the findings list entirely. Skipped criteria produce no findings at all.

Why it matters. A diagnostic tool earns its credibility one false positive at a time. Phantom findings are how trust leaks. Conditional criteria — clearly named, transparently applied — let the tool say "this doesn't apply" instead of "this failed." Same information density, dramatically more trustworthy.
2026-04-25
framework designnarrativepositioning

Naming the second audience is what makes the framework worth building

The earliest framing of MLI was "AI agent legibility helps brands sell more." That's true but small — it's an SEO-adjacent argument, and it competes with a thousand existing tools in the same lane. The full claim turned out to be bigger: agent legibility decides whose voice gets surfaced when an AI answers a question. For commercial users, the cost of illegibility is reach. For public-interest users — community legal aid clinics, mutual aid networks, public health departments — the cost is narrative authority and service access. Same mechanism, different axes.
The shift from one consequence-class to two changed everything downstream: the criteria gained a multilingual reach dimension, the source-authority criterion grew teeth (notarios as the discriminator case), the readiness bands got renamed toward language that names what's actually at stake.

Why it matters. A framework written for one audience makes the second audience invisible. If the second audience is the one whose stakes are higher, that invisibility is the framework's first failure. Naming both — and being honest about which one is the wedge — expands what the framework can be without diluting it. The commercial case still works; the public-interest case is what the framework is for.
2026-04-28
framework designprecisionagent economics

Absolutist language in criteria imposes token cost; costs become ranking penalties

While auditing methodology language, we found three criteria using absolute claims: "invisible," "unfilterable," "excluded from queries." These words look credible but they're imprecise — sites aren't invisible, they're hard to find in specific contexts. The problem: absolutist language requires agents to exhaustively verify the claim. Precise language ("ranked lower in filtered queries") uses fewer tokens to evaluate.
Under token-budget constraints, agents optimizing for cost will prefer cheaper sites to audit. Absolutist consequence language doesn't just harm credibility — it becomes a de facto legibility penalty. Sites using absolute claims cost more to evaluate, so agents naturally deprioritize them.
We revised three criteria: "invisible to language-filtered queries" → "loses ranking advantages in language-specific queries"; "invisible in the long tail" → "excluded from service-type filters and buried in results"; "invisible to filtered queries" → "cannot be filtered on specific criteria."

Why it matters. The framework advises sites on how to be legible to agents. If the methodology itself uses imprecise language, it both undermines credibility and models bad design. More important: token efficiency may be the signal that determines which sites agents actually evaluate. If the hypothesis holds, "be precise about consequences, not absolute" becomes core MX design guidance.
2026-05-02
methodologyclaims precisionresearch alignment

Language declarations create discovery advantage, not invisibility

The original MLI language included the claim: "A site without explicit language declarations becomes invisible to language-filtered agent queries." During refinement against source citations, this turned out to be overstated. The actual consequence is more nuanced: sites without language declarations rank lower in language-filtered queries and agents must rely on slower content inference. They remain discoverable, but disadvantaged.
The shift: "invisible" → "discovery-disadvantaged." Sites without hreflang or availableLanguage schema are still found by content-analyzing agents; they just lose ranking advantage to competitors with declared languages. For immigrant-serving organizations, this creates a discoverability gap, not absence.

Why it matters. A framework earns credibility by being honest about what it measures. Absolute claims ("invisible", "excluded") break when tested against real agent behavior. Language precision — naming the actual mechanism (slower inference, lower rank) rather than overstating consequence (invisibility) — keeps the framework usable for people building on it. This pattern generalizes: every claim should be verified against primary sources or marked as inference.
2026-05-09
framework designagent behaviorpositioning

MLI operates at inference time, not training time

The intuitive argument for MLI is that AI systems learn from well-structured sites during training, so agent-friendly MX shapes the model itself. That's partially true but not the primary mechanism, and the distinction matters for how the work gets framed.
Training pipelines filter for quality through deduplication, toxicity removal, and perplexity scoring — not through semantic structure or agent-friendliness in the MX sense. A site with perfect MLI compliance but low traffic or poor prose may contribute no more to training signal than an unstructured equivalent.
The mechanism that actually matters is inference-time retrieval: when an agent browses the open web to answer a question, right now, in real time. At that moment, whether your content is structured, labeled, and parseable determines whether the agent can read it, extract meaning from it, and cite it.

Why it matters. Framing the work around training data leads to arguments that are hard to test and easy to dismiss. Framing it around inference-time legibility is falsifiable: run queries, observe citations, measure gaps. It also shifts the urgency — training data is historical, inference is happening continuously. Every query run against a public-interest site is an opportunity that either gets captured or missed.
2026-05-09
industry landscapeequityadvocacy

Two parallel tracks determine whether MLI's problem gets solved before it's foreclosed

MLI addresses a legibility problem: public-interest sites are structurally illegible to agents and therefore underrepresented in AI-mediated answers. But a parallel track is running simultaneously that forecloses the legibility question before it's even asked.
Track 1 — Legibility: Agents browse the open web. Can they read your content? MLI addresses this. It's fixable. Track 2 — Foreclosure: AI companies are signing direct licensing deals with major publishers — News Corp, Associated Press, Financial Times, Reddit, Stack Overflow — to build curated source pools for both training and retrieval. Queries resolved from those pools never reach the open web. The question "is your site legible?" is never asked, because your site was never in the consideration set. Inclusion is determined by brand recognition, traffic volume, and legal risk calculus — not public interest.
These tracks run simultaneously in the same systems. Right now, the long-tail open-web track is large enough that MLI-level legibility makes a real difference. If the ratio shifts — more queries resolved from partnership pools, fewer requiring open-web retrieval — the surface where MLI matters shrinks.

Why it matters. MLI's strategic position is clearest on the open-web track, and defensible as long as that track remains significant. But the foreclosure track is a different kind of problem — one that requires advocacy and policy engagement, not technical standards. Naming both tracks prevents the mistake of solving legibility for organizations that were already excluded at the partnership layer. The organizations most likely to be excluded from partnership deals are exactly the ones MLI exists to help.
2026-05-09
research focusequitypublic interest

Long-tail queries are where MLI has the most leverage and the stakes are highest

Publisher partnerships cover the head of the query distribution well: common health questions (WebMD, Mayo Clinic), breaking news (AP, Reuters), financial data (Bloomberg, Financial Times). The long tail — the vast space of specific, local, and specialized questions — is what agents must browse the open web to answer.
Long-tail queries are disproportionately the questions that matter most to people with the fewest resources: "What are the income limits for the Chicago Housing Authority waitlist?" "Does this specific nonprofit offer emergency rental assistance in this neighborhood?" "What languages does this community health clinic serve?" No publisher partner covers these. When an agent gets asked, it goes to the source. If those sites are illegible — PDFs instead of structured HTML, eligibility criteria buried in unformatted FAQs, no semantic labels on services — the agent can't parse them, gets the wrong answer, or says it doesn't know.
The organizations that answer these questions are almost universally under-resourced technically, running sites on outdated CMS platforms with no semantic structure. They serve the most specific, high-stakes human needs. They will never be in a publisher partnership deal.

Why it matters. The long-tail overlap between "what agents must browse for" and "what vulnerable populations need answered correctly" is the core equity argument for MLI. It's also where the empirical work is most tractable: run real agent queries, document what gets retrieved, measure the gap between what the sites contain and what agents return. That's a falsifiable, publishable finding.
2026-05-09
framingadvocacypositioning

The work is strongest as a structural accountability argument, not a technology adoption program

The charitable framing of MLI: "Help underserved organizations become legible to AI agents." This is true and actionable, but it positions the work as a technical aid program — something to implement, not something to demand.
The structural framing is stronger and truer: AI companies are making policy decisions about whose voices get heard. Those decisions are currently unaccountable. MLI is a framework for what accountability should look like. Four arguments support this frame: the reciprocity argument (AI companies scraped public-interest content to build training data, then excluded those communities from the retrieval layer); quality-equity alignment (systems drawing from diverse, legible public-interest content are more accurate for the majority of human needs); connection to existing rights frameworks (WCAG and Section 508 gained regulatory teeth through civil rights advocacy — MLI positions itself as the AI-era extension); and the empirical gap (the work is most powerful when it documents current harm, not future risk).

Why it matters. The charitable frame positions MLI as something organizations should want. The structural frame positions it as something AI companies should be required to account for. Both are true; only one moves policy. The braver version of this work names the companies making unaccountable sourcing decisions, documents the decisions, and makes a public case for democratic oversight of the information layer.
2026-05-09
equity argumentclaims precision

The equity argument isn't that AI can't assess trust — it's that the signals are unequally distributed

Early framing of the MLI equity case relied on a claim that turned out to be wrong: "AI agents can't evaluate trustworthiness — they can't assess case outcomes, staff credentials, or community reputation." That's falsifiable. AI assistants do draw on reputation signals: reviews, news coverage, third-party mentions, Wikipedia entries. A well-documented organization gets those signals surfaced.
The correct argument is structural: organizations with large web footprints generate trust signals naturally. Community legal aid clinics, mutual aid networks, and public health programs — built on years of referral and relationship, not marketing — often don't. The agent surfaces the most legible organization, not necessarily the most trustworthy one, because the signals that would establish trust don't exist in machine-readable form for under-resourced orgs. Structured data declarations are one of the few trust signals within their control.

Why it matters. "AI can't assess trust" is both wrong and weaker than the real argument. The actual claim — that the web presence required to generate trust signals is itself unequally distributed — is more honest, more defensible, and lands harder. It names a structural problem (who has marketing infrastructure) rather than a technical limitation (what AI can do).
2026-05-09
accessibilityframework designpositioning

MX builds on accessibility's technical foundation — it doesn't descend from the movement

The About page originally described MX as "rooted in accessibility," implying the disability rights movement is MX's founding origin. That's overclaiming. The actual relationship is technical: the patterns that make a site accessible to screen readers — semantic HTML, ARIA attributes, landmark structure, programmatic state — are the same patterns that make a site legible to AI agents. Both require content to be readable by something other than the human eye. The movement's technical wins are MX's foundation; they are not its origin story.
MX extends that foundation into territory accessibility doesn't cover: structured data, schema.org markup, multilingual reach, machine-readable freshness. A site that passes WCAG has done much of the MLI groundwork; a site that fails WCAG will almost always fail an MLI audit for the same structural reasons. But MX goes further, and the further it goes, the more it owes to — and diverges from — the accessibility work it builds on.

Why it matters. Overclaiming lineage creates two problems: it misrepresents what the disability rights movement built (standards for human access, not agent legibility), and it obscures what MX actually contributes (the extension beyond accessibility into machine-readable structure). The right framing is shared substrate plus extension — more honest and more useful for practitioners trying to understand what MLI requires that WCAG compliance alone won't cover.
2026-05-10
methodologyevidenceresearch

Mechanism claims survive on primary specs; magnitude claims need sources without commercial stake

A pass through our citations applying a stricter source standard — no source with a direct commercial interest in the finding being true — found that almost all of MLI's quantitative magnitude claims failed it. SEO agencies citing AI search growth. Structured-data tool vendors citing structured-data citation lift. Accessibility-audit firms citing nonprofit accessibility rates. The methodological work behind each looked sound; the source didn't.
What survived: mechanistic claims grounded in primary specifications (W3C, schema.org, IETF RFCs, crawler-operator documentation), peer-reviewed work, and government/judicial sources. The Web Almanac — HTTP Archive's annual community-run survey, no product to sell — became the surviving primary source for web-baseline statistics.
What this forced: separating mechanism claims (how AI agents read structured data, why specific service-type schema is filterable, what hreflang declarations do) from magnitude claims (how much more often, what percentage, how many times faster). Mechanism survives on primary specs. Magnitude waits for evidence from sources without commercial stake — which here means waiting for the framework's own proof-of-concept sequence to produce findings.

Why it matters. Frameworks operating in fields dominated by vendor research face this routinely. Most don't confront it: they cite the available evidence even when its provenance is compromised. Confronting it produces a stronger, narrower position — the framework can be definite about how things work and honest about the unanswered question of how much. The deferral does work: it names exactly the empirical gap the POC sequence is supposed to close.

This log grows as the work does. If you're running MLI audits on public-interest sites and hit something that changed how you work, the format is open — methodology is CC-BY-SA 4.0.

Curated findings beat deterministic mappings

Skip-if rules eliminate phantom findings

Naming the second audience is what makes the framework worth building

Absolutist language in criteria imposes token cost; costs become ranking penalties

Language declarations create discovery advantage, not invisibility

MLI operates at inference time, not training time

Two parallel tracks determine whether MLI's problem gets solved before it's foreclosed

Long-tail queries are where MLI has the most leverage and the stakes are highest

The work is strongest as a structural accountability argument, not a technology adoption program

The equity argument isn't that AI can't assess trust — it's that the signals are unequally distributed

MX builds on accessibility's technical foundation — it doesn't descend from the movement

Mechanism claims survive on primary specs; magnitude claims need sources without commercial stake