What does 'AI alignment' actually mean, and why is it contested?

AI alignment refers to training an AI system to behave in ways consistent with human values — avoiding harmful outputs, being helpful, refusing dangerous requests. It's contested because 'human values' aren't uniform. Decisions about what counts as harmful, what counts as helpful, and whose sensitivities matter are embedded into models through training data curation and human feedback processes, often without public deliberation about whose values are being encoded.

Who controls what goes into AI training data?

Training data decisions are made primarily by the organizations building frontier models — OpenAI, Google DeepMind, Meta AI, Anthropic, and a small number of others. According to the 2024 Stanford AI Index Report, more than 80% of significant AI model releases came from US- or China-based organizations. The filtering choices that determine which web content enters training, which languages get prioritized, and which sources are excluded are made internally, with limited external transparency.

What is RLHF and how does it shape AI behavior?

Reinforcement Learning from Human Feedback (RLHF) is a training process in which human raters evaluate model outputs and indicate which responses are better. The model learns to produce outputs that rate highly. Because these rater judgments encode assumptions about what constitutes harm and helpfulness, the demographics, working conditions, and supervision of those raters have a direct effect on the values embedded in the finished model.

What is the EU AI Act and does it address the governance problem?

The EU AI Act became law in August 2024 and is the first major binding regulatory framework for AI globally. It categorizes AI systems by risk level and imposes transparency and testing requirements on high-risk applications. It represents meaningful progress over purely voluntary frameworks, but it still relies substantially on self-reporting and faces the structural challenge that the organizations best positioned to shape its implementation are the organizations being regulated.

Why does AI development concentration matter for governance?

When a small number of organizations produce most frontier AI systems, the value judgments embedded in those systems — about what's harmful, what's helpful, whose cultural context gets modeled accurately — become de facto policy for billions of users. Those organizations answer primarily to investors and, secondarily, to regulators. The users whose intellectual lives are shaped by these systems have limited direct accountability mechanisms. Concentration makes the governance problem structurally harder to solve.

The Hidden Politics of Who Controls AI Training

The word "alignment" does enormous work in AI discussions. When researchers say a model is aligned, they mean it behaves in accordance with human values — it doesn't produce harmful outputs, it steers away from dangerous territory, it tends toward what's considered good.

But here's the question I keep returning to: aligned to whose values? Helpful according to whose definition? Harmful by whose measure?

That question is not technical. It's political. And the people answering it — usually implicitly, through thousands of small decisions about training data and feedback labels and reward functions — are exercising a kind of power that most governance frameworks haven't caught up to yet. In my view, this is one of the more consequential structural patterns of our moment, and it deserves the same scrutiny we'd give any other institution claiming that kind of authority.

The Decision Hidden Inside "Alignment"

When an AI company builds a model, they make choices. Which text goes into training? Which outputs get rewarded during fine-tuning? Which responses should the system refuse, and on what grounds? Each of these decisions encodes a value judgment — about what's true, what's harmful, what deserves amplification.

The usual framing goes: these are safety decisions. Technical choices made by well-intentioned engineers to prevent the model from causing harm. And that framing is partly right. The engineers involved are genuinely trying to prevent bad outcomes.

But the framing also does something else. It takes a political question — whose values should govern a technology used by billions of people — and relocates it inside a domain called "safety," where it can be answered by a small technical team without the kind of public deliberation that usually governs decisions of that scale. What gets called a safety decision is, underneath, a question about contested values: What counts as misinformation? Is it harmful to explain how certain drugs interact? Whose religious sensitivities matter, and which ones can be overridden?

These are not questions with correct technical answers. They're questions that democracies typically resolve through some combination of law, cultural negotiation, and ongoing argument. In AI development, they get resolved on a product roadmap.

The people making the most consequential decisions about what AI systems believe, how they reason, and what they refuse are not regulators, not elected representatives, not accountable to the users whose intellectual lives they're shaping. They're employees of a handful of private companies, working under competitive pressure, making choices that function like policy but face none of the scrutiny that policy normally receives.

Who Actually Filters the Training Data

Most large language models are trained on a version of the internet — filtered. The Common Crawl, a nonprofit that scrapes and archives web content, provides the backbone for many training datasets. Raw web data isn't what goes into training, though. It gets filtered: certain sources get prioritized, certain content gets removed, certain languages get oversampled or undersampled. The filtering decisions are where values quietly get baked in.

According to analyses of major LLM pretraining datasets, English-language content represents an overwhelming share of most training corpora — some estimates put English content above 90% of what's indexed in Common Crawl's most-used snapshots. The downstream consequence is that these models understand English-speaking cultural contexts more fluently than others. Not because their creators intended linguistic bias, but because the filtering process systematically prioritized certain sources, and those sources reflected a certain geography of the internet.

According to the 2024 Stanford AI Index Report, more than 80% of significant AI model releases came from organizations based in the United States or China. The rest of the world mostly uses models trained on someone else's implicit value system — models that were never asked to represent their assumptions to a public outside the labs that made them.

This isn't a conspiracy. In my view, it's more interesting than a conspiracy — it's a structural pattern. It shows up everywhere powerful institutions form around new technology. The people closest to the production process make the consequential decisions, because they're the only ones who understand the machinery well enough to make them at all. The church controlled which texts got copied because they ran the scriptoria. Major publishers controlled which ideas reached mass audiences because they owned the presses. The AI labs control training data because they're the ones who can process it at scale.

The question isn't whether this is malicious. The question is whether "non-malicious but unaccountable" is an acceptable governance posture for something this consequential.

The RLHF Loop and Its Hidden Assumptions

After pretraining, most frontier models go through a refinement process called Reinforcement Learning from Human Feedback, or RLHF. Human raters evaluate model outputs, indicate which ones are better, and the model learns to produce more of what gets rated highly. In theory, this is how the model learns to be genuinely helpful and appropriately cautious. In practice, the rater pool is not a representative sample of humanity.

A January 2023 investigation by TIME revealed that many of the contractors doing this labeling work for OpenAI were based in Kenya, earning less than $2 per hour, reviewing content that included graphic descriptions of violence and abuse — often without adequate mental health support, and under conditions structured to prioritize throughput over careful judgment.

There are two problems here. The first is the obvious one about labor conditions. The second is structural and less discussed: the decisions these contractors were making — is this output harmful? is this response better than that one? — were encoding judgments about what constitutes harm and helpfulness into the model at a fundamental level. Those judgments were made under time pressure, by a labor pool selected primarily for cost efficiency, supervised by a team whose accountability ran to a private company's product roadmap.

That's not really an alignment process. It's an outsourcing arrangement that produces alignment-shaped outputs.

I'm not saying the people involved were doing it carelessly. I'm saying the structure doesn't generate the kind of accountability we'd want for decisions of this magnitude. The raters who shape what a model considers harmful face none of the review we'd apply to, say, a court determining what constitutes harmful speech. But the scale of their decisions is, arguably, far larger.

How Different Governance Models Handle Control

Governance Model	Who Decides	Accountability Level	Transparency Requirements	Real-World Example
Self-regulatory (current US default)	AI companies	Low	Voluntary	OpenAI usage policies, industry safety pledges
National regulatory	Elected governments + agencies	Medium	Mandated for high-risk systems	EU AI Act (in force Aug 2024), China generative AI rules
International body	Multinational negotiation	Low–Medium	Variable by agreement	UN Advisory Body on AI, proposed CERN-for-AI concepts
Open-source governance	Distributed community	Medium–High	High by design	Meta Llama releases, EleutherAI, Mistral AI
Public compute model	Democratic institutions	High (in theory)	Mandated	Proposed only; no major example yet operational

Each model answers the "who decides" question differently. The self-regulatory model says the companies should answer to themselves. The national model says governments should set the rules. The international model says no single nation should hold unilateral authority. The open-source model says distribution is better accountability than regulation. The public compute model says the infrastructure itself should be a public resource.

None of these is purely correct. What's worth noticing is that the current default — self-regulation, with voluntary commitments and periodic congressional theater — is the model that most concentrates the control questions in the hands of the companies building the systems. That may or may not be the right answer. But it should be a deliberate choice, not just the result of governance frameworks lagging behind the pace of deployment.

Regulatory Capture in Real Time

Governments are now trying to regulate AI. The EU AI Act, which became law in August 2024, is the first major binding regulatory framework for AI systems globally. It categorizes AI by risk level and imposes requirements on high-risk applications — including some transparency obligations around training data and testing. The United States has taken a more fragmented approach: executive orders, agency guidance, and voluntary safety commitments from major labs, without binding federal legislation as of mid-2026.

Both approaches face the same structural problem: the entities best positioned to shape AI regulation are the entities being regulated. The major labs have the technical expertise, the government advisory relationships, the revolving door with academic institutions. The result is a regulatory environment where the rules are substantially written by the people they're supposed to constrain.

This is the classic pattern of regulatory capture, and it's not unique to AI. Pharmaceuticals, financial services, telecommunications — in each case, the industry shapes the rules because the industry holds the knowledge and the access. What's different with AI is the pace. Those other capture patterns played out over decades, which allowed at least some course correction. AI capabilities are advancing on a timeline measured in months. The window for establishing governance structures that genuinely constrain industry behavior — rather than ratify it — is narrow, and it may be narrowing faster than the governance process can move.

The first governance frameworks that actually stick after a major technology emerges tend to look less like proactive wisdom and more like a response to visible harm. The pharmaceutical safety regime we have now wasn't built on foresight. It was built on thalidomide and sulfanilamide and a series of public disasters that made the absence of oversight impossible to defend. I expect something analogous is coming with AI. That's not a comfortable prediction, but I think it's more honest than pretending the voluntary-commitment model is adequate.

The Concentration Problem

Five organizations account for the majority of frontier AI development. This isn't a controversial claim — it's a structural reality that flows from the economics of training large models. Training a frontier model costs hundreds of millions of dollars. That capital requirement concentrates the field structurally, before regulation even enters the picture. According to the 2024 Stanford AI Index, the United States alone accounted for roughly 40% of global AI investment in 2023, with a handful of organizations capturing the preponderance of that.

When five organizations make decisions about what AI should know, what it should say, and what it should refuse, those decisions function as policy for a large fraction of humanity's interaction with machine intelligence. But the five organizations answer primarily to their investors, secondarily to regulators, and only incidentally to the users whose intellectual lives they're shaping.

One useful comparison: the printing press was transformative partly because it democratized knowledge production. Before Gutenberg, knowledge was controlled by monasteries and royal courts. After Gutenberg, it gradually became something more distributed, more contested, more dynamic — a process that took about 200 years and a Reformation to achieve. What we're watching with AI might be moving in the opposite direction. The production of intelligence itself is concentrating. The cost structure rewards scale. The regulatory environment favors incumbents. The network effects are severe.

I don't think this trajectory is inevitable. But I do think it's the current direction, and reversing it requires governance choices that haven't yet been made.

What Healthier Governance Actually Requires

The problem isn't that we lack proposals. There are serious arguments for mandatory model cards with training data disclosure, for algorithmic impact assessments before deployment, for independent auditing of high-risk systems, for public compute resources that would let non-commercial researchers train competitive models. The AI governance literature is crowded with thoughtful frameworks.

What those frameworks consistently underestimate is the structural incentive problem. The companies with the most influence over AI policy are the companies with the most to lose from genuine accountability measures. They'll participate in governance conversations — often enthusiastically — because participating is better than not participating when rules are being written. That participation is not the same as advocating for rules that would actually constrain them.

Healthier governance probably requires something the current moment is reluctant to build: institutional capacity that is genuinely independent of the industry. Not advisory boards with industry representation. Not voluntary safety pledges. Not executive orders that rely on company self-reporting. Independent technical auditors with actual access, funded by sources with no stake in the outcome, reporting to accountable public bodies.

That's a higher bar than anything currently in place. Whether we get there before or after a sufficiently visible harm makes the case — that, I think, is the open question.

The Pattern Worth Watching

The control patterns in AI governance aren't novel. They're the same patterns that appear whenever a new technology concentrates power in the hands of those who understand it best. What's different is the scale and the speed, and the particular nature of what's being controlled — not just what people can publish, not just what they can broadcast, but the intellectual infrastructure through which they reason, research, and make sense of the world.

Who decides what the machine learns is, in the end, a question about who holds the controls over that infrastructure. It deserves the same scrutiny we'd give any institution claiming that kind of authority over that many people. The fact that it hasn't received that scrutiny yet is itself a pattern worth noticing.

Last updated: 2026-06-23