Vitalik's new article: the new paradigm of future governance 'AI engine + human steering wheel'

Another approach being explored in many use cases is to let a simple mechanism become the game rule and let AI become the player.

Title: 'AI as the engine, humans as the steering wheel'

Author: Vitalik, Founder of Ethereum

Compilation: Baishui, Golden Finance

If you ask people what they like about democratic structures, whether it's government, the workplace, or blockchain-based DAOs, you often hear the same arguments: they avoid centralization of power, they provide strong guarantees for users, because no one can completely change the direction of the system at will, and they can make higher-quality decisions by collecting the views and wisdom of many people.

When you ask people what aspects they don't like about democratic structures, they often give the same complaints: ordinary voters are not sophisticated enough because each voter has very little chance to influence the outcome, few voters engage in high-quality thinking in decision-making, and you often get low participation (making the system vulnerable to attack) or in fact, centralization, because everyone defaults to trusting and replicating the views of some influential people.

The goal of this article is to explore a paradigm where perhaps AI can be used to allow us to benefit from democratic structures without negative effects. "AI is the engine, humans are the steering wheel". Humans provide only a small amount of information to the system, perhaps only a few hundred, but it is all well thought out and of extremely high quality. AI treats this data as an "objective function" and works tirelessly to make a lot of decisions to do its best to achieve these goals. In particular, this article will explore an interesting question: Can we do this without putting a single AI at the center, but instead relying on a competitive open market in which any AI (or human-machine hybrid) can freely participate?

Table of Contents

  • Why not just let an AI take charge directly?
  • Futarchy
  • Distilling human judgment
  • Deep funding
  • Increase privacy
  • Benefits of engine + steering wheel design

Why not let an AI be in charge directly?

The simplest way to incorporate human preferences into AI-based mechanisms is to create an AI model and have humans input their preferences in some way. There are straightforward methods to do this: you simply place a text file containing a list of instructions from individuals into the system prompts. Then, you can use one of the many 'agent AI frameworks' to give the AI access to the internet, provide it with the keys to your organization's assets and social media profiles, and you are all set.

After several iterations, this may be sufficient to meet the needs of many use cases, and I fully expect that in the near future, we will see many structures involving AI reading group instructions (or even real-time reading group chats) and taking action.

The governance mechanism of this structure is not ideal as a long-term institution. One valuable attribute that long-term institutions should have is trusted neutrality. In my post introducing this concept, I listed the four valuable attributes of trusted neutrality:

  • Do not write specific people or specific results into the mechanism
  • Open-source and publicly verifiable execution
  • Keep it simple
  • Don't change often

LLM (or AI agent) satisfies 0/4. The model inevitably encodes a significant amount of specific human and outcome preferences during its training process. Sometimes this can lead to surprising preferences in AI direction, for example, a recent study showed that a major LLM places more emphasis on life in Pakistan than in the United States (!!). It can be open weights but far from open source; we really don't know what devil is hidden deep in the model. It is the opposite of simple: the Kolmogorov complexity of LLM is billions of bits, roughly equivalent to the sum of all U.S. laws (federal + state + local). And due to the rapid development of AI, you must change it every three months.

For this reason, I support another approach that is being explored in many use cases is to make a simple mechanism the game rules and let AI be the player. It is this insight that makes the market so efficient: rules are a relatively foolish property system, with edge cases adjudicated by the court system, which slowly accumulates and adjusts precedents, and all intelligence comes from entrepreneurs operating at the edge.

!

A single 'gamer' can be an LLM, a group of LLMs interacting with each other and calling various Internet services, various AI + human combinations, and many other constructs; as a mechanism designer, you don't need to know. The ideal goal is to have a mechanism that can run automatically - if the goal of the mechanism is to select what to fund, then it should be as much like Bitcoin or Ethereum block rewards as possible.

The benefits of this approach are:

  • It avoids incorporating any single model into the mechanism; instead, you will get an open market composed of many different participants and architectures, each with its own biases. Open models, closed models, agent groups, human + AI hybrids, robots, infinite monkeys, etc. are all fair games; the mechanism does not discriminate against anyone.
  • The mechanism is open source. Although players are not, the game is open source — and this is a pattern that has been quite well understood (for example, political parties and markets operate in this way)
  • The mechanism is very simple, so the mechanism designer has relatively few ways to encode his biases into the design
  • The mechanism will not change, even from now until the singularity, the architecture of the underlying participants needs to be redesigned every three months.

The goal of the mentoring mechanism is to faithfully reflect the fundamental goals of the participants. It only needs to provide a small amount of information, but it should be high-quality information.

You can think of the mechanism as leveraging the asymmetry between proposing and verifying answers. This is similar to the difficulty of solving Sudoku, but it is easy to verify whether the solution is correct. You (i) create an open market, allowing players to act as "solvers," and then (ii) maintain a mechanism operated by humans to perform much simpler tasks of verifying proposed solutions.

Futarchy

Futarchy was originally proposed by Robin Hanson, meaning "vote for values, but bet on beliefs". The voting mechanism selects a set of goals (which can be any goals, but they must be measurable), and then combines them into a measure M. When you need to make a decision (for simplicity, let's assume it's YES/NO), you will set up a conditional market: you ask people to bet (i) whether they will choose YES or NO, (ii) if they choose YES, the value of M, otherwise zero, (iii) if they choose NO, the value of M, otherwise zero. With these three variables, you can determine whether the market believes that YES or NO is more favorable for M.

The 'company stock price' (or for cryptocurrencies, token price) is the most commonly referenced metric because it is easy to understand and measure, but the mechanism can support multiple metrics: monthly active users, median self-reported happiness for certain groups, some quantifiable decentralized metrics, etc.

Futarchy was originally invented before the age of artificial intelligence. However, Futarchy naturally fits the paradigm of 'complex resolver, simple validator' described in the previous section, and traders in Futarchy can also be artificial intelligence (or a combination of human + artificial intelligence). The role of the 'resolver' (prediction market traders) is to determine how each proposed plan will affect the value of future indicators. This is difficult. If the resolver is correct, they will make money, if the resolver is wrong, they will lose money. Validators (those who vote on indicators, adjust indicators if they notice them being 'manipulated' or becoming obsolete, and determine the actual value of the indicators at some future time) only need to answer a simpler question 'What is the current value of this indicator?'

Distilling human judgment

Distillation of human judgment is a class of mechanisms, the working principle of which is as follows. There are a large number (think: 1 million) of questions that need to be answered. Natural examples include:

  • How much credit should each person in this list receive for their contributions to a project or task?
  • Which of these comments violates the rules of the social media platform (or sub-community)?
  • Which of these given Ethereum addresses represent real and unique individuals?
  • Which of these physical objects contribute positively or negatively to the aesthetics of their environment?

You have a team that can answer these questions, but the cost is to spend a lot of effort on each answer. You only ask the team to answer a few questions (for example, if the total list has 1 million items, the team may only answer 100 of them). You can even ask the team indirect questions: don't ask 'What percentage of the total credit should Alice receive?', but ask 'Should Alice or Bob receive more credit, and by how many times?' When designing the jury mechanism, you can reuse proven mechanisms in the real world, such as appropriation committees, courts (determining the value of judgments), evaluations, etc. Of course, jury participants can also use novel AI research tools to help them find answers.

Then, you allow anyone to submit a digital list of answers to the entire set of questions (e.g., providing an estimated value of how much credit each participant should receive in the entire list). Participants are encouraged to use artificial intelligence to complete this task, but they can use any technology: artificial intelligence, human-machine hybrids, artificial intelligence that can access internet searches and autonomously hire other human or artificial intelligence workers, control theory-enhanced monkeys, etc.

Once both the full list provider and the juror have submitted their answers, the full list is checked against the jury's answers, and a certain combination of the full list that is most compatible with the jury's answers is used as the final answer.

The human judgment mechanism of distillation is different from futarchy, but there are some important similarities:

  • In futarchy, the "solver" makes predictions, and the "real data" on which their predictions are based (used to reward or punish the solver) is an oracle that outputs the value of the indicator, run by a jury.
  • In human judgment, the "resolver" provides answers to a large number of questions, and their predictions are based on a small portion of high-quality answers provided by the jury as the "real data".

Toy examples distilling human judgment for credit assignment, see Python code here. The script requires you to act as a jury and includes a full list of preincorporated AI-generated (and human-generated) answers in the code. The mechanism identifies a linear combination of the full list that best fits the jury's answers. In this case, the winning combination is 0.199 * Claude's answer + 0.801 * Deepseek's answer; this combination is more in line with the jury's answers than any single model. These coefficients will also be the rewards given to the submitter.

In this example of 'defeating Sauron', the aspect of 'humans as the steering wheel' is manifested in two ways. First, high-quality human judgment is applied to each issue, although this still utilizes the jury as 'technocratic' performance evaluators. Secondly, there is an implicit voting mechanism to determine whether 'defeating Sauron' is the correct goal (rather than, for example, attempting to ally with Sauron or handing over all territories east of a key river as a peaceful concession). There are other distilled human judgment use cases, where the jury's task is more directly imbued with values: for example, imagine a decentralized social media platform (or sub-community), where the jury's job is to mark randomly selected forum posts as compliant or non-compliant with community rules.

In the distillation human judgment paradigm, there are some open variables:

  • How to sample? The role of the full list submitter is to provide a large number of answers; the role of the jurors is to provide high-quality answers. We need to select jurors in such a way and select questions for jurors, that is, the ability of the model to match juror answers to the greatest extent possible to indicate their overall performance. Some considerations include:
  • The balance of expertise and bias: Skilled jurors typically specialize in their professional fields, so allowing them to choose the content to be rated will provide you with higher quality input. On the other hand, too many choices can lead to bias (jurors favoring content related to them) or weaknesses in sampling (certain content systematically not rated)
  • Anti-Goodhart: There will be content that tries to "play" with AI mechanics, for example, contributors generating a lot of code that looks impressive but is useless. This means that the jury can detect this, but the static AI model won't unless they try hard. One possible way to capture this behavior is to add a challenge mechanism through which individuals can flag such attempts, guaranteeing a jury judgment on them (and thus incentivizing AI developers to ensure that they are captured correctly). If the jury agrees, the whistleblower will be rewarded, and if the jury disagrees, a fine will be paid.
  • What scoring function are you using? One idea currently being used in the deep funding pilot is to ask the jurors 'Should A or B receive more credit, and by how much?' The scoring function is score(x) = sum((log(x[B]) - log(x[A]) - log(juror_ratio)) ** 2 for (A, B, juror_ratio) in jury_answers): In other words, for each jury answer, it will ask how far the ratio in the complete list is from the ratio provided by the jurors, and add a penalty proportional to the square of the distance (in logarithmic space). This is to show that the design space of the scoring function is very rich, and the choice of the scoring function is related to the choice of questions you ask the jurors.
  • How do you reward the submitter of the complete list? Ideally, you want to frequently give non-zero rewards to multiple participants to avoid a monopoly mechanism, but you also want to satisfy the following properties: Participants cannot increase their rewards by submitting the same (or slightly modified) set of answers multiple times. One promising approach is to directly calculate the linear combination (with non-negative coefficients summing to 1) of the complete lists that best fit the jury's answers, and use these same coefficients to divide the rewards. There may be other methods as well.

In general, the goal is to adopt a known effective, minimally biased, and time-tested human judgment mechanism (for example, imagine how the adversarial structure of the court system includes the two disputing parties, who have a lot of information but biases, while the judge has little information but may not have biases), and use an open artificial intelligence market as a reasonable, high-fidelity, and very low-cost predictive indicator for these mechanisms (similar to how the large prophecy model 'distills' works).

Deep Funding

Deep financing is the application of distilled human judgment to the weighted problem of filling in "What percentage of X's credit belongs to Y?" on the graph.

The simplest way is to illustrate directly with an example:

Output of the two-tier deep funding example: The origins of Ethereum's ideas. Check out the Python code here.

The goal here is to allocate credit for philosophical contributions to Ethereum. Let's look at an example:

  • The simulated deep funding round shown here gives 20.5% credit to the cypherpunk movement and 9.2% to technological progressivism.
  • In each node, you are asking a question: to what extent is it original contribution (hence deserving credit for itself), and to what extent is it a recombination of other upstream influences? For the cypherpunk movement, it is 40% new and 60% dependent.
  • Then you can see the effects upstream of these nodes: libertarianism and anarchism account for 17.3% of the credit for the cypherpunk movement, but direct democracy in Switzerland only received 5%.
  • However, please note that both libertarian minimal statism and anarchism have also inspired Bitcoin's monetary philosophy, thus influencing Ethereum's philosophy through two paths.
  • To calculate the total contribution share of libertarianism and anarchism to Ethereum, you need to multiply the edges on each path and then add the paths together: 0.205 * 0.6 * 0.173 + 0.195 * 0.648 * 0.201 ~= 0.0466. Therefore, if you have to donate $100 to reward everyone who has contributed to the philosophy of Ethereum, according to this simulated deep funding round, libertarians and anarchists will receive $4.66.

This method is designed to be applicable to domains where work is based on previous work and the structure is highly clear. The academic community (think: referencing diagrams) and open-source software (think: library dependencies and forks) are two natural examples.

The goal of a well-functioning deep funding system is to create and maintain a global graph, where any funder interested in supporting a particular project can send funds to the address representing that node, and the funds will automatically propagate to its dependencies based on the weights of the edges in the graph (and recursively to their dependencies, and so on).

You can imagine a decentralized protocol using built-in deep financing mechanisms to issue its tokens: Decentralized governance within the protocol will select a jury, which will operate the deep financing mechanism, as the protocol will automatically issue tokens and deposit them into nodes corresponding to itself. In this way, the protocol rewards all its direct and indirect contributors programmatically, reminiscent of how Bitcoin or Ethereum block rewards reward a specific type of contributor (miners). By influencing the weight of the margins, the jury can continuously define the types of contributions it values. This mechanism can serve as a decentralized and long-term sustainable alternative to mining, sales, or one-time airdrops.

Increase Privacy

Usually, to make the correct judgment on the issues in the above example, it is necessary to have access to private information: internal chat records of the organization, information submitted secretly by community members, etc. One benefit of "using a single AI" is that, especially in a smaller-scale environment, it is easier for a single AI to access information than to make information public to everyone.

In order to allow distilled human judgment or deep funding to play a role in these scenarios, we can try to securely enable AI access to private information using cryptographic techniques. The idea is to use techniques such as Multi-Party Computation (MPC), Fully Homomorphic Encryption (FHE), Trusted Execution Environment (TEE), or similar mechanisms to provide private information, but limited to mechanisms where the only output is the 'full list submission' directly inserted into the mechanism.

If you do that, then you'll have to limit the set of mechanisms to AI models (not humans or AI + human combinations, because you can't make humans see the data) and specific to models that run on some specific substrate (e.g., MPC, FHE, trusted hardware). One of the main research directions is to find practical versions that are effective and meaningful enough in the near future.

Advantages of Engine + Steering Wheel Design

There are many promising benefits to such a design. By far the most important benefit is that they allow DAOs to be built, giving human voters control the direction, but they won't be bogged down by too many decisions. They reach a compromise where everyone doesn't have to make N decisions, but they have the power to go beyond just making a decision (how delegates usually work) and more elicit rich preferences that are difficult to express directly.

In addition, this mechanism seems to have incentive smoothing properties. What I mean by "incentive smoothing" here is a combination of two factors:

  • Proliferation: Any single action taken by the voting mechanism will not have a disproportionate impact on the interests of any single participant.
  • Confusion: The connection between voting decisions and how they affect the interests of participants is more complex and difficult to calculate.

The confusion and diffusion here are two terms taken from cryptography, and they are key properties of the security of encryption and hash functions.

A good example of incentive smoothing in today's real world is the rule of law: instead of regularly taking actions in the form of "$200 million for Alice's company", "$100 million for Bob's company" on a regular basis, the top of government passes rules designed to be applied evenly to a large number of participants, which are then interpreted by another group of actors. When this approach works, the benefit is that it greatly reduces the benefits of bribery and other forms of corruption. When it is violated, which often happens in practice, these problems are quickly magnified considerably.

AI is clearly going to be an important part of the future, and it will inevitably become an important part of future governance. However, if you involve AI in governance, there are obvious risks: AI is biased, it can be deliberately undermined during training, and AI technology is evolving so fast that "putting AI in power" may actually mean "putting in charge of upgrading the AI". Distilled human judgment offers an alternative path forward, allowing us to harness the power of AI in an open, free-market manner while maintaining human-controlled democracy.

Special thanks to Devansh Mehta, Davide Crapis, and Julian Zawistowski for their feedback and review, as well as Tina Zhen, Shaw Walters, and others for their discussions.

View Original
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments