Post Modernism: What AI Alignment Teaches Us About Education (and everything else)

There is a problem at the heart of, well everything. It pervades all our designs and is a time bomb whose ticks are accelerating. The problem is that of definition.

(there are many footnotes in this article, as well as links. If you have an objection, check out the footnotes first! But if not, feel free to skip the footnotes)

Goodhart’s law “When a measure becomes a target, it ceases to be a good measure.”

“Not everything we count counts, and not everything that counts can be counted.” Attributed to various figures, often to Albert Einstein, though it is much more likely the sociologist William Bruce. False attribution of higher ethos figures does help with spreading quotes we happen to agree with.

“The map appears to us more real than the land.” D.H. Lawrence

郑人有欲买履者,先自度其足,而置之其坐。至之市,而忘操之。已得履,乃曰:“吾忘持度!”反归取之。及反,市罢,遂不得履。

人曰:“何不试之以足? ”

曰:“宁信度,无自信也

Proxies are Necessary. Proxies Suck. Why the Modern World is Icky and Lonely Redux.  

Nothing meaningful can be accurately defined[1]. Even the mundane has trouble with definition. If I say red, you will picture a color. But what specific color? The sharpest-eyed humans can differentiate about 10 million colors. Though incredible, this is but a drop in the bucket of all the colors that theoretically remain separate in the visible light spectrum. So when we say something like red, we mean a certain subset of the light spectrum, probably different parts, and at the boundaries, whether or not it is red becomes debatable. You can do this with any definition, like that of a chair. No definition of chair will only include chairs and no definition that will exclude all things that are not chairs. Try it, create a definition of a chair, and then ask yourself if you can think of a chair that isn’t part of it or a non chair that fits it. If you can’t beat yourself, that is a failure of imagination. You can make things strict enough to exclude almost everything that is not meant (few false positives), but then, with such a strict definition, you will also keep out many things that are meant (many false negatives).

Now, take something much more complicated, such as “good,” and what you have is an impossible mess when trying to define it. But just as we cannot communicate about color using a spectrum with millions of individual colors, we cannot operate in the world by communicating something as complicated as truth[2]. What we must settle for, then, is the use of proxies—that is, a reflection of the truth rather than the truth itself.

In the old days, digital cameras focused on megapixels even though there are many more important but more complicated factors to consider for the actual “goodness of taking pictures.” There isn’t even a universal “goodness of taking pictures.” Different tasks, circumstances, and photographers would all alter what camera is “best”. There are thousands of things in this world all with a near irreducible level of complexity. Normally, we deal with this by picking one simple measurement and going with it.

Often, this proxy is price. Other times, it might be general popularity or expert review scores. Each of these proxies is far from perfect, and each will create its own distortions. But this is unavoidable. There is only so much time we can spend learning and so many things to learn. A bigger, more complicated world is a world of increased need for proxies.

How well does someone think? How well does someone understand a subject? We might create a test, and the score on the test provides us with the proxy to communicate the underlying truth. The problem is that tests can only measure what they measure. There might be multiple ways of achieving the same measure (for instance, learning a test rather than gaining the real ability it is supposed to test). Everything the test does not measure is seen as useless for those looking to look good in the measurement.

This can all but entirely hollow out education. I had a former student tell me about their childhood, they were forced into piano practice since they were three. Specifically, they learned piano songs exactly as needed for the test. After passing the level 10 test, they were excited to finally get to play what they wanted. Their parents, seeing this, scolded them for wasting time playing the piano. They already got the measurement of “playing piano well.” Anything additional not measured was considered a waste.

Often, we think of measures as making things more scientific, without all those little subjectives we might disdain. In reality, it is often when we formalize that we discard the most important parts of what we care about in favor of that which is easiest to measure. Something defined enough to be measured can never truly be what it is we care about, and by measuring it, we make it rife for direct optimization.

This is why it so often backfires when we try to pull someone along with external incentives. We get consequences we didn’t intend. Economics is full of such examples, whether bounties for cobras or weight-based quotas in the Soviet Union. If you target an imperfect measure, you encourage deviation from the real. The harder you target, the greater the deviation. Well-measured mediocrity is the result at best[3].

Unfortunately, all extrinsic incentives must be tied to some measurement. So incentive alignment problems, that is, the difference between that which we want and that which we encourage, must exist. If they did not exist, things like worker morale wouldn’t matter. If the one rewarding the behavior is also the same one assessing (and themselves being the beneficiary is extra helpful!), these gaps can be relatively small and correctable. However, such things cannot scale by their nature, so as proxies scale up, the alignment also worsens. This is the cause of diseconomies of scale, the counterbalancing force with economies of scale that limit organizational size. The inefficiency of bureaucracy is pitted against the inefficiency of agent actions separated from the concerns of the actual principal. Profit, in this case, serves as a fairly robust selection mechanism pruning organizations that fall too far from optimal(this becomes less true the less competition there is). It is worth noting that profit is itself yet another proxy. Companies maximizing profit will not do exactly what we would actually “want them” to do. In economics, we call these misalignments market failures. Market failures are the rule rather than the exception. All that is required is the slightest bit of external harm or benefit, the slightest bit of market power for incentives to become misaligned. This is not to say ubiquitous market failure implies ubiquitous benefits from regulation. What are the selection mechanisms of government? Why would those mechanisms lead to anything like perfect alignment? 

I am afraid it is misalignment all the way down. 

Adam Smith argued slavery is inefficient, slaves get no reward for working for their masters, incentivized to do the bare minimum to avoid beatings[4]. Marx takes this logic further, suggesting fully developed communism would be more productive than capitalism. This is because the absence of dominance hierarchies within production relations, that is no “managers” workers can be trusted to intrinsically do what is best “from each according to their ability”. Alienation of labor not only makes the labor miserable, but it also removes any and all intrinsic incentives. The less external the motivations, the more behavior is in line with “what we want.” Given we can trust the intrinsic motives of others, the best thing would be to trust them and give them all the discretion to accomplish tasks. 

In areas where worker creativity is essential and output hard to judge, such as big tech, there is less structure and more emphasis on things such as “company culture” (an attempt to bring intrinsic incentives more in line with the company), as well as what economist Roiland Frier calls “aggressive human capital management” AKA fire people who are overly misaligned[5].

What does all this have to do with AI?

Everything, unfortunately, we are at the beginning of a new epoch. There is a non-zero chance it will be our last.

Don’t imagine the Terminator, robot armies hell-bent on supremacy and human annihilation. It is far from the most likely scenario. If AI does destroy us, it won’t be capable of anything approximating true malice.

Instead, understand how an AI is made. You do not make an AI with very clever programming. You make it like all complicated things are produced, a combination of iteration and selection. You start by meticulously labeling data using hoards of researchers, students, volunteers, and specialized firms full of bleary-eyed employees labeling data as correct, not correct, good, not good, cat, not cat. This labeled data is the basis of the training for the nascent AI. Though especially for more adaptive complicated systems, what matters is not just the training data but the “reward system” of the model, and the way the AI is programmed to seek optimization (such as gradient descent) also matters. Technically, not every way of training AI is reinforcement-based. However, all types that rely on iteration and selection have some sort of selection mechanism that works functionally as the “reward system.” The code can additionally tweak, for instance, how much false positives and false negatives are “punished” and how much different kinds of success are rewarded.

AI does not do what we want it to do. It does what it is selected to do in the way it is selected to do it. In this way, even simple AIs can act as mischievous wish-granting genies, giving you what you asked for, not what you wanted. Linked is a paper going over the very many ways in which AI can be misaligned[6]. This occurs for even relatively simple games like grabbing a coin or getting the highest score on a game. AI will do a good job of accomplishing its task in the training data. It is very similar to how we know that a rabbit is good at creating more rabbits given its evolutionary environment. Put them outside of their evolutionary environment and rabbits will display many behaviors “unaligned” from the “goal” of evolutionary fitness. One stark example is that pet rabbits in nice, safe human environments will kill their young if they feel stressed or exposed. In the wild, where rabbits are at the bottom of the food chain, it makes perfect sense for this behavior to exist, but this instinct is misapplied in the environment of the pet rabbit. Training data is essentially the evolutionary environment of AI and just like the case of the safe mother rabbit killing her young, if the actual environment is different than the training environment, the bizarre becomes possible.1

We make programs to tell us what the AI is thinking, but they are also developed based on what they appear to be doing rather than what they are actually doing. Here is an excellent YouTuber going over thousands of examples. If you think it is a simple fix, you don’t understand the scope of the problem.[7]

Given that nothing extrinsic, nothing measured, can be perfect. AI will never be fully aligned. That doesn’t mean our doom. The fact that perfection isn’t possible doesn’t mean catastrophe is the only outcome. There is much ruin in a nation. We accept in realms of governments and corporations the impossibility of perfection; imperfection does not imply the worst-case scenario though it does allow for it. But, given the increased reliance on AI, if we do not take the difference between what we are incentivizing and what we want seriously, catastrophe of some kind or another is far from unlikely.  

Chat GPT Will Always “Lie” to You

How do you make an AI that can write a school essay about Napoleon and then a Shakespearean play about frogs? It is complicated![8] You make a very large model, parallel with a ridiculous amount of parameters and then you train it. The training is the part that concerns us. In the beginning, it is overseen by some experts, but this is rather limited, and certainly, they can’t think of and try all the inexhaustible permutations language makes possible. What even is a good job writing a Shakespearean story about frogs? Such things are in the eye of the beholder. Still, you need a selection mechanism, so why not the beholder? There is a thumbs up and a thumbs down. A thumbs up means you approve of the answer, thumbs down means you disapprove. It “wants” a thumbs up in the same way genes “want” to proliferate. (GPT does not on the fly update itself as such models become hilariously inappropriate at the speed of 4chan. The general idea still stands).

Some lawyers got into hot water by asking for cases that supported their client, only to present fake cases invented by ChatGPT to a real judge. I was looking for a quotation from a book I didn’t much care to read to make a pithy tie into one of my articles. ChatGPT gave me the perfect one. Before posting, I did a quick search, which showed that this perfect quote wasn’t actually in the manuscript, and I almost had an egg on my face. Playing with GPT instead of using it can be fun, so long as you don’t trigger one of the hard-coded safeguards, you can make it apologize for being correct and invent lies for you.

Why?

 Simple, it doesn’t want to help you or tell you the truth. It doesn’t want anything it “wants” to not get a thumbs down. It “wants” whatever things it gets reinforced by the trainers. What is more likely to get a thumbs down, an admittance it can’t help, or a convincing sounding incorrect answer? I still remember GPT 3.5, giving the most convincing-sounding nonsense answers to one of my tests that I have ever seen. It scored an 8 out of 40, but to a novice, it knew every answer (GPT 4 got a 38). AI doesn’t do what you want it to do or ask it to do. It, like everything else, will do what it is selected for.

AlphaGo Zero impressed the pants off almost anyone with any understanding of Go or computer science, certainly anyone with a passing knowledge of both. I’m not going to say it isn’t. As someone who used to dismissively call past attempts of AI as “basically regressions,” recent milestones make this true only if we view all intelligence in the same light. Yet after crushing world champions, it was able to be beaten by an amateur who took advantage of its one weakness: it had actually no idea how to play Go or the point of Go. It was a prediction engine that went for win states. Play it as you would play against a human and be crushed with certainty. Treat it like what it is, and a soft belly may be found. Though, of course, this can be patched, and for a limited game likely patched enough to make victory for a human entirely outside the realm of possibility. But even then, it is not really playing Go. It doesn’t need to “play” Go to win Go.

If you can specify it, AI will dominate as an inevitability. Think of modern AI as a creature that can evolve itself over millions of iterations, bending itself purely at a single target. Humans are impressive but our complex evolutionary past hangs about our necks like millstones.  

We are not dealing with a true general intelligence. We are dealing with a complex entity whose behaviors evolved to a target in the blink of an eye, the inner workings of its trillions of parameters as opaque as an alien god.

What does that have to do with education?

Everything, once again, unfortunately. What should be “the point” of education? What is it in reality?

Why was the example of the piano so (perhaps) relatable, funny, and sad?

Ideally, education should be about providing students with real ability. Music is to perform for others (rarely) and to grasp enough to provide for self-expression, to perhaps better appreciate the playing of others. Soulless technical playing is unlikely to illicit great emotion (the difference between 1st and 3rd chair) and is the obvious antithesis of self-expression. A world-class musician is certainly the result of hard work, but how many result from pure extrinsic incentives?

A culinary school should hopefully produce people who know how to cook meals that satisfy future clientele. Engineers should learn how to design things and understand the limitations of what they build. But how do you ensure these things?

The best way would be to test every possible thing they need to do, which is impossible, of course. You need to test a subset, and of course, you need to specify a means of scoring.

Making the tests uncertain and the criteria subjective based on experts’ opinions allows a more holistic approach to testing (an interview rather than a multiple-choice test, for instance) but leaves the test open to bias from the judges. This can impede growth and change. It can also lead those being tested to focus on the judge’s favorites (for instance, you know they have a bias towards French food or a particular ideology) or parrot back the “expert” opinions, even if you do not, in fact, agree (or really understand).

But given that there is a measurable result: “can cook well once employed in a restaurant,” “designs things that function as intended,” or “contributes working code to projects,” this is generally fairly robust overall. If people receiving testing certifications could visibly not perform, then those certifications would soon develop a reputation as worthless. But what about fields where the measure of the output itself is nebulous or fraught? 

The general standards of such systems are more or less peer-enforced and can vary greatly. An A in my course might have a very different meaning than an A in a course of the same name, even in the same school, let alone courses that are technically supposed to be equivalent at another school.

The problem with reputation is that it is hard to keep track of too many reputations. The other problem is that reputation, in many ways, becomes a self-fulfilling prophecy. The problem with judges is that they are rather subjective and not very scalable. At high scales, you run into the problem of how to judge the judges! What you get as a result of these problems are standardized examinations.

For standardized exams to be best, they would need to follow three rules.

1. Those who have the ability of what is tested but not knowledge of the test should do well.

2. Those who do not have the ability of what is tested but have knowledge of the test should not do well.

3. As a corollary, studying for the test should be helpful in getting a higher score only because studying for the test increases the ability being tested, not the test-specific knowledge.

Doing all three perfectly is as impossible as all ideal things. This problem is magnified further by the fact such tests need to be repeated every year, there are only so many ways to ask a question which means that with access to past questions the form can be learned at least partially without the function.

Many tests fail miserably at all three. So long as at least 1 is true, then real education is at least possible. If 1 is true and 2 is false, education produces incentives to “teaching to the test,” teaching to the test is useful only to the particular extent that 3 is true. For simple and easily measured things, such as “Can you add, subtract, and multiply?” this isn’t so hard, but as tasks get more difficult or more subjective, you can get a huge fork between what would be useful for “education” and what teachers and students actually spend their time on. For 2 to be as true as we can make it, the test must be hard to study for since only ability is being tested, knowledge of the test itself should not increase the score. Teachers and students already struggling with predicable tests that can be learned algorithmically are doubtful going to be happy with such a change. Some standardized exams arguably do all three poorly. In this case, the test merely becomes a proxy for hard work/conscientiousness.  

One may cheat by cheating, one may also “cheat” by gaining the measurement without the ability that measurement is a supposed proxy for. If you work hard to learn the test, gaining no actual ability in the process, in what ways are you different from someone who got the answers to the test beforehand? To the extent tests are really about showing ability, you simply wasted more time for the same false signal. It is only when the test is, in fact, signaling effort that cheating is more cheating than learning to the test. 

Signaling Value Vs. Education Value

Let’s go back to Harvard. A question for you: would you rather take classes and gain knowledge from Penn State and get Harvard credentials, or would you rather get a Harvard education with Penn State credentials? (That is, even though you are Harvard-educated, you have to tell everyone you went to Penn State. Your degree will say Penn State. Your email will say Penn State.) My guess is most people value the signaling value of education at top universities more so than the actual education (many of the best classes are free online, BTW). If a concert is canceled because of snow, you will be upset. If your extremely expensive college course gets canceled for a day due to an accumulation of crystalized water, you will celebrate.

Given that much of the value of education is a general signal of “this person was able to get in, pay for, and do enough (often) nonsense work, so they will probably make a decent employee”[9] rather than this student learned and mastered this particular set of skills makes the incentive alignment problem even worse. This will, of course, vary not just by university but also by major!

Why Isn’t Education Fully Corrupt? Intrinsic vs. Extrinsic Incentives

Suppose you think about the horrible grinding logic of education, the incentives of students (minimum effort for maximum measurement), teachers (minimum effort for maximum measurement, whether that be external scores, student evaluations, position in the union, etc…), and test makers. In that case, it should be that education has only the most tenuous connection to learning. Iterate the system, sanding off everything, but optimal strategies, and what you should get is something akin to a horror show in which the vast majority of education, whose effects are hard to measure, will be essentially meaningless. Teachers of this meaninglessness just go through what is needed for their own measurement. If, for instance, they are rated by students, they might find it the most expedient to give students high grades[10].  I am not saying that modern education is 0% like that. In fact, it is depressingly like that, but it certainly isn’t 100% like that either. There are still plenty of curious students and passionate teachers (though we pretend there are many of both and believe there are very few).

It is those passionate about music who sell out concert halls, those passionate about their subjects who push the boundaries. It is true that as academia has scaled and formalized, this is increasingly selected against. Many former students of mine curse their natural curiosity, finding it makes their success for their given level of talent harder rather than easier. This is the opposite of what we should hope for from a system of true education. When the desire for the attainment of actual ability feels like a handicap rather than a leg up, the system is clearly overly misaligned.

Curiosity in learning and passion and teaching are some of the natural intrinsic drives that made humans what they are today. We didn’t evolve in a fixed environment of predictable standardized tests and other extrinsic measuring sticks. We evolved in small groups where exile was a death sentence, and the games we played constantly shifted. Optimizing on any one metric, or really optimizing at all in our dealings with our fellows, was deeply suboptimal. Even with our large brains and abilities to think and consider, even eventually perform the kind of calculus needed for the sorts of constrained optimization problems we face, this strategy, in general, works worse than something like “your friend is your friend, and you help them because they are your friend and your word is your word and you keep it because it is your word”. That is, our intrinsic emotional core, as I argued in previous essays[11], serves as a commitment device, allowing humans to be far more cooperative than we otherwise could be. We were, therefore, able to generate far more surplus with our weak dexterous hands than would a group of rational agents always ready to stab their companions in the back given the opportunity and profit. Indeed, that intrinsic core is why the idea of someone stabbing a friend in the back for profit is so deeply loathsome to us, and this is a thing that transcends culture. While parts of morality are subjective[12], no society praises the spendthrift who lets his mom die or the friend who betrays those close to him.

It is then this intrinsic core that prevents education, and all the other systems mentioned from becoming maximally corrupt. AI, it should be known, has no such core. It proceeds directly to optimize its target. 

It is those who find success through their intrinsic motivation that give the most to the world, the ones who truly create the new. The flashes of genius that are always later echoed by a deluge of soulless derivatives. It also explains the seemingly quizzical finding that school valedictorians rarely are impoverished but also rarely go on to achieve much beyond a fairly constrained convention. As the selection pressure increases in our world, as the measurement error grows with the scaling of our vast systems, these little intrinsic cores will increasingly become selected against. Though I think it is unlikely we can fundamentally destroy something so deep within us, it does not need to be truly destroyed, only silenced. If so, it will be to the detriment of ourselves and of the world. 

Sports Also (And Multiplayer Games, be they Video or Otherwise)

When Lance Armstrong had his awards stripped from him for doping, this was unjust. It is a fairly open secret that it is impossible to compete in professional cycling without doping. The stronger the selection pressure, the more misalignment is ensured. The removal of the awards was a fig leaf for the public of and by a maximally corrupt system.

I am far from a sports fan. But sports are a straightforward example of everything discussed so far. They have formal rules and, due to their popularity, massive selection pressure. If you measure it, they will optimize. Any rule change, measurement change, or update patch will change the metagame of behaviors. Given the stakes, you get people taking dives in soccer, bunting in baseball, underinflating footballs, and the general use of whatever performance enhancers are either allowed or disallowed but undetectable. Since profit moves professional sports forward, types of misalignment that cause them to be less watchable tend to get ironed out quickly. Other types fester behind the scenes. Those who make it to become professionals probably have some love of the game, but the brutal logic of competition forces everyone to compromise from pure “fair play for the love of the game.” Games are a great example, a simple example, it shows that while we can never make perfect rules, we certainly can have rules that lead to better and worse behavior. More or less “cheating”.

Male Evolutionary Misalignment

In pair-bonding species, male success “should” be a function of compatibility with females and the ability to acquire resources and care for children[13]. In prestige tournament species (for example, peacock, where the female selects the male with the best plumage), we tend to see the result as fair so long as it is doing what we expect. Yet there are sickly peacocks genetically programmed to use their last remaining resources to convince mates of potential fitness it does not have. This is one speculation for why we humans seem to disdain the use of too much chemical “help” in athletics, it muddies the genetic signal. For pretty much every species that exists, males find ways of cheating, and in equilibrium, this cheating continues until it is roughly as productive as not cheating. One can see things such as the use of force as an especially nasty piece of misalignment, which, like all misalignment, makes its own kind of sense, even if this is sickly. 

When we think of our ancestors, in the vaguest fog imagine our past what is it that we think? It is unlikely we imagine nearly so much hierarchy and horror as all the past replications that actually made us.

Some Speculation on How to Create a Truer or More “Truthful” AI

Take this with a grain of salt. It is really just an idea based on a general understanding of selection and iteration rather than a specific expertise in computer science. If you read this far you probably already figured it out. Computers will never truly “think” like us, but we can work to build an intrinsic core by varying measurements in ways the AI cannot know. For instance, sometimes it can be in an environment where going above and beyond gives it a much larger reward than normal, and other times where lying is much more likely to be detected and punished. When facing an unknown test on an ability the optimal way to prepare is to just learn the general theory. Just like we humans developed our intrinsic cores due to complex changing and shifting games, this idea should work just as well for AI. Some things are functionally already somewhat like this, but none are particularly with that purpose. Ironically enough, at least partially, this is not done because, in the short term focus of the hypercompetitive market, it makes no sense to invest in developing something more costly that will likely measure less well[14]. In this, I can think of nothing so much as traditional education.

Final Thoughts

This article has used education(and basically everything else) to understand the cracks in AI. These cracks are then used to analyze the cracks in education and ask why aren’t the cracks bigger? The answer to this question, if summed up as briefly as possible, is that humans are not pure optimizers; we have intrinsic motivations to learn and teach. I then briefly proposed a way we can build something that functions as an intrinsic core by AI, to have shifting measurements so that optimization of a measurement is no longer as straightforward.

My next AI article, How AI Might Make Us More Human (Or Less), will discuss how AI will likely affect us. It could be that AI removes much of our need for genuine human interaction, or it could be that it out optimizes us so much that in order to not be completely replaced, we will be forced to do the one thing AI cannot: become more human, genuinely understand, and genuinely care.

To take the idea of measurement, and thus alignment seriously is to see all the cracks in reality. We take it seriously and develop literature about it when it comes to the alien intelligence of AI, but often take the measure being the goal for granted in our own human systems. Though extrinsic incentives can make outcomes better or worse than inferior systems, true goodness comes from systems that allow for the intrinsic to motivate and shine. In bad systems, that same intrinsic motivation helps to lessen the cracks in expected measurement error, at least. That is, to prevent the state of maximal corruption.

Out of the crooked timber that is humanity, it may be said that nothing straight has ever been made. AI won’t be perfect, education won’t be, romance won’t be, businesses, governments, nothing ever will be perfect on this earth. Yet it is also true that imperfect does not imply maximally so.
It also implies that there is always room for improvement.


[1] Basic axioms and elementary particles can be defined rigorously, but the pixels we see the world in are much chunkier, never mind the objects we parse in the image. Even if we could manage to see the world in terms of the most fundamental building blocks, we would still run into the likes of Gödel’s incompleteness theorem. When we leave our Platonic bunker of mathematical perfection and head to our human conceptual jungle, our vision starts to blur, what was once a point is maybe a circle, and our line in the sand is perhaps actually an elongated rectangle. (This is written by my interlocutor Karl Irwin who describes my project as building a modernist cabin on a postmodern landscape.)

[2] Many things that are true may make things that are too complex to define. But even if we agree on something being factually “true,” one may make lies out of a patchwork of nothing but these factual truths.  If one population group commits theft one-tenth of the general population, given the population is large enough, if one reports every anecdote of that population stealing but rarely reports other populations stealing, people will have a general impression that group which is objectively less likely to steal is actually more likely to steal! This is called the Chinese robber fallacy. In fact, since we must always only express a subset of truths, it is impossible to actually neutrally tell the truth! (especially given humans are flawed and biased creatures). Even if you could somehow neutrally tell the truth to another, their imperfect human mind would not decode it in a neutral, unbiased way. Note: this is not to say the truth is always relative, and that there is no such thing as neutral. It is only to say that it is beyond human ability to approach perfectly. There is more neutral and less neutral, more truthful and less truthful. All that needs to be accepted is that in practice, for humans, neither of these can be achieved to any level of perfection.

[3] Clock comic

[4] This is not to say that America did not benefit from slaves or that plantation owners did not get rich off of slavery. Slavery is a system where a greater percentage of laborer’s surplus may be extracted, though the laborers themselves each produce less surplus than they otherwise would. So, while the US benefited from slavery, it would have benefited even more by bringing over the people and having them work freely. In a world without slavery there is a bigger pie, but less of that pie goes to individuals involved in the slave trade.

[5] He coined this phrase when talking about firing teachers. In The Production of Human Capital in Developed Countries: Evidence from 196 Randomized Field Experiments he uses the euphemistic phrase “managed professional development” instead. Having the correct alignment of intrinsic motivation is essential for hard to measure, open ended, and creative tasks. There are no extrinsic incentives that could make a misaligned teacher into a good one.

[6] Once you chew on that for a while, you can read this paper on how even if the impossible outer alignment was achieved, you would still have an inner alignment problem! I originally spent pages going over AI alignment, but I think it is unnecessary for this discussion. For those of you interested, hopefully, I have provided enough links to sate your curiosity. If you are not interested, then… just know it is really, really complicated, okay?

[7] I really skipped over so much of the complexity. A former student who worked on Gemini talked about how human misalignment leaks in AI misalignment through training. Training needs to be done based off of labeling and judgments of human workers who themselves are not necessarily particularly aligned.  “Oh, and there’s another layer, where Reward Models are trained using such human labels, which are then used to judge and train the actual models. So sometimes, the actual models optimize a goal given by Reward Models, which are optimized on human labels, which are produced by sometimes unincentivized humans… so the model is thrice removed from the actual objective.”

[8] I won’t get into transformer architecture here, but this 2017 paper is excellent. It is also very short!

[9] Showing up and doing nonsense might not be too dissimilar from many jobs, especially from the employee’s perspective. Nonsense prepares workers for alienated labor.

[10] This, due to the fairly relative nature of education, because it is mainly about signaling, leads to additional grade inflation, which leads to an ever-increasing weakening of the signal, which leads to an increasing need for further credentials. The value of the credentials being eroded over time by the same forces that cause them to expand.

[11] Primarily Receptor Theory: Why Fiction is “True” and In Genuine Praise of So Called Folly: Why Only Nonoptimization Matters

[12] Kind of at least, perhaps a later article will touch on this further. Here I am using subjective in a way that is closer to “varies between cultures”.

[13] Here should is not a moral universal, but instead a representation of the nice and noble stories we like to tell. Evolution has no shoulds, it has only replication. That replication made us a creature full of moral sentiments, full of shoulds.

[14] It may of course do much better given a long timescale, my point is, right now the incentives are such that long timescales are far from what is focused on.

  1. After posting, I read the book Smart Until It’s Dumb, which does a wonderful job of providing a bevy of examples of how often AI systems function badly and their strengths and weaknesses. ↩︎

2 thoughts on “Post Modernism: What AI Alignment Teaches Us About Education (and everything else)”

Leave a reply to Anonymous Cancel reply