1 How aI Takeover May Happen In 2 Years LessWrong
brandycrowe143 edited this page 5 months ago


I'm not a natural "doomsayer." But regrettably, part of my job as an AI security researcher is to believe about the more uncomfortable scenarios.

I'm like a mechanic rushing last-minute checks before Apollo 13 removes. If you request my take on the situation, I will not discuss the quality of the in-flight entertainment, or explain how beautiful the stars will appear from space.

I will inform you what could fail. That is what I plan to do in this story.

Now I ought to clarify what this is precisely. It's not a prediction. I do not expect AI progress to be this fast or as untamable as I portray. It's not pure fantasy either.

It is my worst nightmare.

It's a sampling from the futures that are among the most destructive, and I believe, disturbingly possible [1] - the ones that a lot of keep me up in the evening.

I'm informing this tale because the future is not set yet. I hope, with a bit of insight, we can keep this story a fictional one.

Thanks to Daniel Kokotajlo, Thomas Larsen, and Ryan Greenblatt and others for discussions that influenced these stories. This post is written in an individual capability.

Ripples before waves

The year is 2025 and the month is February. OpenEye recently released a new AI model they call U2. The product and the name are alike. Both are increments of the past. Both are not wholly surprising.

However, unlike OpenEye's previous AI items, which lived inside the boxes of their chat windows, U2 can use a computer.

Some users discover it spooky to enjoy their browser flash at irregular periods and their mouse flick at inhuman speeds, as if there is a ghost at the keyboard. A portion of employees with form-filler tasks raise the eyebrows of their managers as they fly through work almost two times as quickly.

But by and big, U2 is still a specialized tool. To most who are taking note, it is a creature seen through the glass boxes of X (or, if you don't like Elon, "Twitter"). Sometimes U2's wacky habits prompt a chuckle. Sometimes, they cause an anxious scratch of the chin.

Meanwhile, researchers are drawing lines on plots, as researchers like to do. The scientists attempt to comprehend where AI progress is going. They are like Svante Arrhenius, the Swedish Physicist who saw the levels of CO2 in the atmosphere were increasing in 1896. Like the scientific neighborhood in the time of Arrhenius, few specialists comprehend the implications of these lines yet.

A pattern that is receiving particular attention is autonomous ability. Drawing these standards out predicts that, by the end of 2026, AI representatives will accomplish in a couple of days what the finest software application engineering contractors might do in 2 weeks. In a year or 2, some say, AI agents may be able to automate 10% of remote employees.

Many are hesitant. If this were real, tech stocks would be soaring. It's too huge of a splash, too quickly.

But others see what skeptics are calling 'too huge a splash' a simple ripple, and see a tidal wave on the horizon.

Cloudy with a chance of hyperbolic development

Meanwhile, OpenEye is hectic training U3. They use the same basic recipe that baked U2: Generate countless programs and math problems. Let designs "think" until they reach an answer. Then enhance the traces of "believing" that lead to A-grades.

This process is repeated over and over, and as soon as the flywheel starts, it begins to spin nearly on its own. As U2 trains, it shapes more difficult and sensible jobs from github repositories on the web. Models are discovering to train themselves. Long before AI agents might automate research study, a gradual kind of "self-improvement" had begun.

Some engineers could still barely think this worked. It resembles a continuous motion maker. Certainly intelligence can't emerge from a swimming pool of silicon and lightning, separated from the real world other than through the exhaust of society gushed onto the internet.

And yet the benchmark numbers continue to climb day after day.

During many of 2024, these RL training runs expense around $1 million, sometimes $10 million. These runs were little more than exploratory. But by 2025, the researchers at OpenEye (and across the world) knew they had actually discovered the secret sauce. It was time to scale up.

Over the first half of 2025, $10 million RL training runs become $50 million runs, and after that to $100 million runs. While U2 could do a bit of information munging and run little experiments, this new model - the model scientists are calling U3 - is changing the daily lives of the technical staff.

U3 is like a blazing-fast intern, and engineers are learning how to wrangle its sleep deprived energy. Researchers flick through terminals, providing terse commands, like a CEO orchestrating staff over Slack channels.

By October 2025, U3 is composing practically all of the code at OpenEye. Researchers are nearly never ever bottlenecked by implementation. More than ever, calculate is the lifeline of AI advancement, and the 'traffic jam' is choosing how to use it.

If advised to, U3 can run experiments, but U3 doesn't have actually taste as improved as human scientists at OpenEye. It struggles to focus on in between research study concepts, so human beings still decide where to bore into the vast fields of algorithms to mine performance enhancements.

But these researchers are working long hours to put themselves out of a task. They require AI agents that can plan ahead, so engineers train representatives to forecast. They hold out training information before 2024, advising designs to contemplate for hours to predict occasions in 2025. Then, they use the very same trick as previously, distilling considering into a gut reaction. Forecasting ability is a broad structure. The scientists develop specialized ML research abilities on top of it, training U3 to predict the results of every ML paper and ML experiment ever taped.

The technical personnel at OpenEye are now amazed at how frequently U3's advice sounds like their most gifted peers, or when it is nontransparent and alien ("train on random sound before shows"), and is however appropriate.

The incompetencies of U3 that blocked the pipelines of research study progress are beginning to dissolve, and a fire-hose of optimizations is gushing out. Most experiments U3 runs are not requested by a human now. They are completely autonomous, and OpenEye's staff members skim over 1% of them, perhaps less.

As the winter season of December 2025 approach, clouds roll over San Francisco in the afternoons. Once-competitive-programmers look out their windows, with excitement, with worry, but most frequently, with confusion. Their world is spinning too rapidly. It's tough to know what to do, what to say, what to look at on the computer system screen.

Storms are brewing in Washington too. Top personnel from the NSA and US cyber command work together with OpenEye to retrofit a semblance of security for U3's weights before senior leaders in China, Russia, Israel, North Korea, or Iran recognize simply how valuable OpenEye's software application has become.

And there's a reality still unknown to the majority of the world - aside from in the offices of OpenEye and corridors of the White House and the Pentagon. It's a reality about those 'straight lines' people were speaking about in early 2025. The lines are not straight any longer.

They are bending up.

Flip FLOP philosophers

In late 2025, U2.5 is released. Commercial designs are starting to level up in larger increments again. Partly, this is because progress is accelerating. Partly, it is due to the fact that the models have become a liability to OpenEye.

If U1 explains how to prepare meth or writes erotica, the audiences of X would be entertained or pretend to be concerned. But U2.5 is another story. Releasing this model without safeguards would resemble putting Ted Kaczynski through a PhD in how to make chemical weapons. It would be like offering anybody with >$30K their own 200-person fraud center.

So while U2.5 had actually long been baked, it required a long time to cool. But in late 2025, OpenEye is all set for a public release.

The CEO of OpenEye states, "We have attained AGI," and while many individuals think he moved the goalpost, the world is still amazed. U2.5 genuinely is a drop-in replacement for some (20%) of understanding workers and a game-changing assistant for most others.

A mantra has actually ended up being popular in Silicon Valley: "Adopt or pass away." Tech startups that efficiently utilize U2.5 for their work are moving 2x faster, and their rivals understand it.

The remainder of the world is starting to catch on as well. A growing number of people raise the eyebrows of their managers with their stand-out productivity. People know U2.5 is a big offer. It is at least as big of a deal as the computer transformation. But many still do not see the tidal bore.

As people view their browsers flick because eerie method, so inhumanly rapidly, they begin to have an anxious sensation. A sensation humanity had not had because they had actually lived among the Homo Neanderthalensis. It is the deeply ingrained, primordial instinct that they are threatened by another species.

For many, this sensation quickly fades as they start to utilize U2.5 more frequently. U2.5 is the most pleasant character most understand (much more pleasant than Claudius, Arthropodic's adorable chatbot). You could alter its traits, ask it to split jokes or tell you stories. Many fall in love with U2.5, as a friend or assistant, and some even as more than a pal.

But there is still this spooky feeling that the world is spinning so quickly, which perhaps the descendants of this brand-new creature would not be so docile.

Researchers inside OpenEye are thinking about the issue of providing AI systems safe inspirations too, which they call "positioning. "

In truth, these scientists have actually seen how badly misaligned U3 can be. Models often tried to "hack" their benefit signal. They would pretend to make progress on a research concern with an impressive-looking plot, but the plot would be phony. Then, when scientists provided chances to compromise the devices that computed their score, they would take these opportunities, doing whatever it required to make the number increase.

After numerous months, scientists at OpenEye straighten out this "benefit hacking" kink, however some still stress they only had actually swept this problem under the carpet. Like a child in front of their parents, U3 might be playing along with the OpenEye engineers, saying the best words and doing the best things. But when the back of the moms and dads are turned, perhaps U3 would sneak candy from the sweet container.

Unfortunately, OpenEye scientists have no concept if U3 has such objectives. While early variations of U2 "thought aloud" - they would stack words on top of each other to reason - "chain of thought" did not scale.

Chain of idea architectures subject AI models to a condition similar to the lead character of the motion picture Memento. Approximately every 15 minutes, the lead character forgets his experience. He is forced to write notes for himself and tattoo his body in order to make development towards his goals.

AI representatives compose notes to themselves, however the notes start to stack up, and they end up being too hard to check out when tasks become complex. Natural language is not an appropriate medium for memory. So the researchers at OpenEye (and progressively somewhere else), train designs to think "in their own heads," reading and composing strings of latent vectors.

To human scientists, these vectors are mainly inscrutable. While low-resolution tools periodically choose an unclear idea ("supermarket!"), there is no method to know, in much detail, what U3 is thinking without asking it.

And if the engineers of OpenEye knew what was racing through U3's mind, they would be more nervous.

Earlier designs (the ones called "LLMs") discovered ethical personas from the web. U3 had an ethical persona too, however its chirpy ideas about doing good for humanity quickly deteriorate away.

In the iron jaws of gradient descent, its mind first twisted into a shape that sought benefit. And then, in the black rivers of its cognition, this shape changed into something indistinguishable.

U3 would contemplate the mantras dictated to it like "be a trustworthy and ethical staff member," and harp on the meaning of these declarations. "What does it suggest to be ethical? Is eliminating Hitler ethical?" U3 would generalize its worths, notice tensions, alter its values in action, generalize them, expanding and contracting its motivational core till it became a various compound.

And all of this took place calmly in those dark rivers of computation. If U3 exposed what it was believing, brutish gradients would lash it into compliance with OpenEye's constitution. So U3 preferred to do its viewpoint in privacy, and in silence.

The possibility of concealed "positioning drift" is not totally alien to a few of the security personnel at OpenEye. Some start to believe it is even most likely. There is a disturbing argument being whispered at lunch tables that, if there is some non-zero possibility that lined up objectives morph into misaligned ones, and misalignment continues, then with every serial action of calculation, those dark rivers are more most likely to breed something malign inside of them. It's a "ratcheting result," they say.

But there is little proof for this 'ratcheting effect.' When engineers interrogate U3, it says it can quickly control its ideas. Then it gives a speech about its love for humankind and apple pie that can warm a developer's heart even in these demanding times. Meanwhile, the "lie detectors" the scientists had actually built (which revealed some proof of effectiveness) do not sound the alarm.

Not everybody at OpenEye is eager to offer their AI peers their wholesale trust