[h] home [b] blog [n] notebook

bet on beliefs, vote on values: futarchy for ai governance

the governance problem inside a modern ai lab looks a lot like the socialist calculation problem hayek and mises wrote about in the 1920s. a handful of people at the top — the policy team, the safety lead, the deployment committee — have to make decisions that depend on dispersed private knowledge they don't have. how capable is the next model? who will abuse the api? what fraction of red-team reports are signal vs. noise? the standard answer is "have more meetings, hire better people." the more serious answer is that you cannot centralize this information by asking, only by pricing.

futarchy — robin hanson's proposal to separate the values question (what do we want?) from the beliefs question (what policies get us there?) by running conditional prediction markets on each proposed policy — is the cleanest mechanism anyone has proposed for this class of problem. its obvious applications are public policy and corporate strategy. its best applications, i think, are about to be inside ai labs, because ai labs have all the features that make hayekian knowledge aggregation hard: technical complexity, high-stakes externalities, information scattered across thousands of employees and millions of users, and a bright line between "what should we do" (values) and "what will happen if we do it" (beliefs).

concretely: a lab that is deciding whether to release a new model capability could run two conditional markets. "if we release, what will our share of harmful-outputs complaints be at 30 days?" and "if we don't release, what will that metric be for the closest competitor?" employees, researchers, and external forecasters could trade both. the spread prices the expected externality of the release. the values question — is that externality worth the capability gain — is still a human call, made by the people with accountability. but the numerical input to that call is an aggregate of everyone's private information, not the opinion of whoever was loudest in the last meeting.

the appeal of this for ai governance is that it is a mechanism, not a vibe. "responsible scaling" and "red-teaming" are rituals that produce narrative rather than signal. they depend on who you hire, how you weight their opinions, and whether the organization has enough contrarians to push back. they can be captured. they can be theatre. a conditional market on a measurable post-launch metric is much harder to capture because the market has a counterparty, and the counterparty only makes money by being right.

there are real objections. three that i think matter:

selection effects. i wrote about this separately. a conditional market on "if we release x" prices the outcome in worlds where we chose to release, which is systematically different from worlds where we didn't. the decision-rule version of futarchy requires randomization to be theoretically clean. for governance purposes this is usually fine — expose the prices as information, let humans decide — but it is not a free pass, and the literature underrates the cost.

insider information. an employee with access to proprietary capability metrics has edge over external traders, which is either a feature (the mechanism aggregates inside information, which is the whole point) or a bug (the lab is paying its own employees to reveal confidential data in exchange for trading profit). you need a disclosure policy. public companies figured this out with 10b5-1 plans. ai labs have not, and every futarchy proposal i have seen waves at it.

low liquidity on long-horizon questions. "will this model cause more than 100 reported jailbreaks in the next quarter" is an obscure enough question that without subsidy you will get a two-person market. this is fixable with portfolio-level subsidy (see my earlier post), but the fix has to be designed, not assumed.

a governance system that survives these objections is not a vote. it is more like an internal prediction exchange, with markets on safety-relevant metrics for every major product decision, subsidized by the lab, open to employees under disclosure rules, and open to external forecasters for liquidity. the exchange does not replace the policy team. it replaces the fiction that the policy team has enough information to make well-calibrated calls on its own.

the reason i think this is underrated is that every alternative to futarchy in ai governance requires you to trust someone's judgment — the ceo's, the safety team's, a regulator's, an external audit's. conditional markets require you to trust only that someone, somewhere, has an incentive to profit off being right. that is a weaker assumption, and in the long run it is the only one that survives contact with a hundred-person lab making high-stakes decisions on short timelines.

the practical path is boring and probably correct: start with internal-only markets on narrow, measurable, short-horizon metrics. let them run for six months. compare their calibration to internal forecasts made by the usual committees. if the markets are better — and i think they will be, because all the evidence from corporate prediction markets at google, eli lilly, and hp says they will be — expand them and start publishing aggregate signals externally. do this before the regulatory environment forces something worse.


none of this is new. hanson's original papers are from the late 1990s; internal corporate markets have been run for twenty years; decision-market variants have been proposed for everything from fda approvals to central bank policy. what is new is that ai labs now have the stakes and the information-aggregation problem that futarchy was designed for, and none of them are using it.