Josherich's Blog

HOME SHORTS TRANSCRIPT SOFTWARES DRAWING ABOUT RSS

#39 - Daniel Kokotajlo - Wargames, Superintelligence & Quitting OpenAI

03 Apr 2025

#39 - Daniel Kokotajlo - Wargames, Superintelligence & Quitting OpenAI

And what’s so nuts is that in order for you to speak freely, you would have to give up your already vested equity that you had essentially accumulated from working at OpenAI. Basically, there was an explicit threat of don’t criticize or we’ll take your money away.

How much money are we talking about here? It was 85% of my family’s net worth.

Hello, friends. Today we are speaking to Daniel Cocotelo. Daniel is an AI researcher and former OpenAI employee who’s probably best known for blowing the whistle about his various concerns with the company. And we certainly get into a bunch of that today. But if you ask me, what’s coolest about Daniel is his uncanny ability to make predictions about the path of AI. Back in 2021, he wrote a bunch of predictions about how he thought AI was going to play out, and they turned out to be insanely accurate.

I wanted to talk to him today because he has just released his new set of predictions of how he thinks AI is going to play out over the next three years. And once you’re done listening to this episode, I highly recommend you go check them out. So here is our conversation with Daniel Cocotelo.

Daniel, welcome to WinWin. Thanks for having me. We actually spent the day yesterday playing this AI tabletop game. To me, it was kind of like a war game. Can you explain exactly what it is and why you’re doing it? Right. So it’s a war game, except that it doesn’t always end in war. In fact, most of the time it doesn’t. So perhaps the more technically accurate term would be tabletop exercise.

It’s a matrix game, which means that it’s very light on rules. Basically, everyone goes around the table and says, “OK, here’s what the president does this month. Here’s what the CEO of OpenAI does this month. Here’s what the CCP does this month.” People take turns saying what they’re doing, and then we sort of collaboratively build up the story that way. And then there’s the moderator who resolves all the disputes and makes the final call about what’s actually canonical in the storyline.

Yeah, and through that method, you get to a different type of insight because you’re doing it sequentially, right? Than if you were to just take a prediction about what would happen in the year 2026 or 2027. And that’s why war games or tabletop exercises are also done by the military often to simulate a Chinese invasion of Taiwan, for example, or famously, the pandemic simulation that was done by Johns Hopkins together with the Gates Foundation and the UN, which they then posted onto YouTube in 2019.

This led in part to Gates being assumed to have planned the vaccine distribution during the pandemic because they posted it and they actually got a lot of stuff right. So they simulated a coronavirus pandemic starting in South America due to pig farms now infecting humans. That led to flights being canceled worldwide, economic disruption, etc. One of the valuable lessons there was that they had all of these people, the UN, sit there, and the UN just claimed that if this occurs, then we need to help countries that don’t have many vaccines. So, we’ll put all of the vaccine distribution through the UN.

They thought that all of the other countries in this emergency situation would comply, and everybody was just like, “Really, UN? You think that in such a situation people would listen?” They apparently weren’t aware of how little their actual power would be in such a situation. So that was one of the valuable insights to be gained from it. This was obviously done to try to simulate how a pandemic would play out and quite effectively, as we can see.

So, what is your goal for people who perhaps aren’t that familiar with why it’s important for us to understand how AI might play out? Because some people think it’s just not a big deal. So explain why you’re so motivated to run these simulations. Well, it’s the biggest deal. The companies themselves, the CEOs of these companies, like Dario Amadei from Anthropic and Sam Altman from OpenAI, are explicitly aiming to build superintelligence. They say that they think they will achieve it in the next couple of years.

I independently agree. This is my job, to forecast AI trends. I’m not confident; I think maybe it could take much longer than that. But also, it does seem to me that sometime before this decade is out, they will succeed at building superintelligence. What is superintelligence? It’s an AI system that’s better than the best humans at everything—much better than the best humans at everything—while also being cheaper and faster.

So it’s a big deal. If you just meditate on what that means, and then you think one of these corporations, or maybe several of these corporations, will have trained such an AI before this decade is out. You will not come away thinking this is not a big deal.

On that topic, one of the things you’re most known for is that you were a former OpenAI employee who no longer works there. Can you talk us through your reasons for why you left and whether that related somewhat to this changing of priorities with OpenAI that it seems they have been doing from the outside?

It did relate somewhat to that, although that wasn’t the only reason. Building off of what I said earlier, it seemed to me that humanity is sort of not ready on a technical level, or on a governance level, or on any level, really, for AGI. You played through the game, right? I’ve done 25 games, and they’re all about as crazy as the one we played yesterday. Some of them are less crazy, some of them are more crazy, but it’s going to be intense. So intense, and we are just so nowhere close to being ready.

It seems to me that that’s on the horizon, like a couple of years away. When I joined OpenAI, I had this sense that OpenAI was founded by people who were expecting something crazy like that to be happening with AI and were hoping to do everything they could to make it go well. I think there are a bunch of things that that entails. One of those things is good governance, and I would say things like transparency and commitment to human welfare and sharing power rather than concentrating it.

Another thing, on a technical level, is heavily investing in figuring out what’s going on inside these AIs and making sure we know how to steer them and align them and all that sort of thing. When I joined, I was thinking, well, mostly right now, they’re focusing on just winning the race. But as things get closer and closer to T equals zero, they’ll pivot more and become more responsible. They’ll pivot more of their focus to these incredibly important areas.

Even then, I wasn’t fully satisfied because I was thinking it might be too late by that point. Once you pivot, you might have only six months before China catches up or something. We should be doing more now. Gradually, I came to think that there just never is going to be a pivot. The plan is not to pivot the whole company into technical alignment research, for example, or even a large portion of the company.

The plan is instead to basically just keep going and tell ourselves and everyone else why it’s fine and why the problem isn’t so bad in the first place, and why we’ll figure it out as we go along. I think another thing that disappointed me was the rationalization process that was happening, where it felt to me like OpenAI was, as an institution, sort of committed to the idea that we got to go fast. We got to be the first. We got to be the best in AI stuff. What we’re doing is great, and we are heroes. A bunch of rationalizations and reasons were found to support those conclusions.

Ultimately, I considered staying and trying to make things go as best as I could, given those circumstances, trying to incrementally advance the field of alignment research, for example. That was my top contender for what to do if I stayed. I was really happy about the super alignment team. I think they were doing great work. Rip. Right. They’ve been shut down now, right?

But ultimately, I decided to leave because I wanted to have the ability to speak more freely about these sorts of things. I was frustrated by the inability to publish while I was there. What’s so nuts is that in order for you to speak freely, you would have to give up your already vested equity that you accumulated from working at OpenAI. That was sort of a clause that was thrown at you, sort of blindsided, right?

Yeah, specifically, it was a non-disparagement clause that said something to the effect of don’t say things that are critical of the company. Then there were a bunch of legal mechanisms by which they could yank your vested equity if you did that. I think it was actually broader than that; I would need to go back and look through the paperwork.

Basically, there was an explicit threat of don’t criticize, or we’ll take your money away. Yeah, how much money are we talking about here, relative to your…? For me, yeah, yeah. So the thing that sparked everything and got into the news is because I left a comment wrong when people asked me about this, saying it was 85% of my family’s net worth.

You’re probably wondering why we were willing to do that. The short answer is because we’re reasonably well off anyway. I’ve been working at OpenAI for two years. They have very generous salaries because it’s a tech company. I made more money in those two years than I had made in the rest of my life prior. We are going to be financially okay, and it just felt really unjust to me, this whole setup. It felt like, how can they keep getting away with this?

Did you know immediately when you saw the paperwork that this was not something you could do? Or did you think that you would go back and do a pro and con list? Yeah, there was a strong feeling. Some of the people I asked for advice about it were like, you should just sign it anyway. It’s fine. If you actually criticize them later, surely they’re not going to come after you. It would look so bad if they yanked your equity. You should just sign it and move on.

But, I don’t know, I’m glad I didn’t. If you did sign it, then now I think you would find yourself in the space you previously described, where some rationalization started happening. You initially may have thought that you would be able to speak as freely, but then afterwards, you would worry about them coming after you, etc.

Exactly. I think it has all these effects. Also, from talking to other people, I’ve talked to various former employees of the company who had signed the thing and who seemed quite reluctant to say things publicly, perhaps in part due to that. I will say it’s not just due to that. A ton of people I know are still quite scared to criticize OpenAI publicly, even though the paperwork has been undone and there’s no actual legal threat anymore.

I have one of your emails here that became public that you wrote as you were leaving. I think you said it beautifully here. You said, “I really understand that you believe this is a standard business practice, but it really doesn’t sound right. A company building something anywhere near as powerful as AGI should hold itself to a higher standard than this, one that is genuinely worthy of public trust.”

That’s the thing that is blowing my mind so much. They are positioning themselves to become the most powerful company on earth that was allegedly founded on the pillars of openness and transparency. Then they’re going and literally taking away money, potentially even quasi-illegally, from employees who are saying, “Look, we want to be able to criticize.” I’m leaving because I feel like you’re not being honest, and now you’re trying to stop me from saying that I don’t believe you’re honest.

It’s just wild. To maybe–there are various steelmen I could give to their position. But if one were to believe that they need to win the race because they are the only ones who can do it right, then you would potentially do all of these things from a utilitarian-type of reasoning perspective. I’m not in favor of it, but do you think that’s in part what plays a role here, what leads them to take such actions?

I think, without a doubt, Sam has some good reasons why these things are happening. I think this gets to an interesting and timeless philosophical ethical question about the extent to which the ends justify the means. It’s extremely common throughout human history for people to be cutthroat and do whatever it takes to accumulate power and resources so that later they can do all this good stuff. It’s not wrong in the sense that yes, if you accumulate all this power and resources, then later you can do good stuff with it.

In fact, some good things have happened in the world because of people accumulating lots of power and resources and then doing good stuff with it. But there are also very obvious dangers that come with this strategy, such as the types of people who tend to take this strategy more often tend to be the types of people who don’t actually do the good stuff later on. Very rarely do the tyrants of history think they are actually being evil. They just have a messed-up philosophy that they thought was good.

Even Hitler seems like he probably rationalized that this was the necessary thing that I need to do for the world to be good. We obviously see that as evil, but in his mind, he probably did not see himself that way. People think villains see themselves as villains, but very rarely do they.

I think one of the best lines in all of TV ever was that line in Silicon Valley where Gavin Belson says, “I don’t want to live in a world where someone else makes the world a better place than we do.” That sort of directionally points out that these guys believe they are probably the best people to wield this power. It’s like the classic Lord of the Rings theme. It’s a rationalization that you are describing.

Or it has the danger of being the type of rationalization that’s a convenient belief to have. If you already want more power, now the argument for needing that power is the future good things you will do with it once you have it all. It’s difficult. It is difficult because, as you say, part of a good strategy may consist of initially doing this.

What I would say is, at least in the case of AI companies, if you’re trying to win the race, you should have a well-fleshed-out story for why it’s good for you to win the race. That story should mention all the other companies by name and say, “Here’s why we think we’re better than those companies.” That story should hold up to scrutiny by disinterested third parties. You should be able to stand there and talk to third parties and say, “Look at all these reasons why we’re actually better than them.”

The third party should actually be able to say, “Yeah, that makes sense. I’ve seen the comparable documents by other companies, and they suck. I’ve heard both sides, and I do think I would trust you with the fate of the world more than them.” If you can meet that standard, cool.

Even that could become gamified in some ways, but I agree that it’s directionally better than the current status quo. To me, I want to see this idea of Moloch and whether you are being an agent of Moloch or the opposite. A Moloch-y person will sacrifice everyone else’s goodness for a better chance of winning. We need leaders who are willing to sacrifice their own chance of winning for the good of the whole.

I would love to see AI leadership that has been doing that. Is there a way to objectively measure that? I don’t know.

You heard about OpenAI’s Merge Assist class? No. Oh, yeah, yeah, yeah. In their charter, it says something, I forget the exact phrasing, but I think it says, “If we find, if we come to believe that there is another aligned AI company that’s like 50% chance of achieving AGI in less than a year or something, faster than us, then we will close up shop and go help them instead of competing with them.”

That’s exactly the kind of very nice, wonderful thing to commit to that you were saying. But I don’t think anyone believes they’ll actually do that.

So OpenAI claimed they would do this? Way back in the day. It’s still one of the core things on the website. The thing where they can wriggle their way out of it a bit is, “We will work out specifics in case-by-case agreements,” but a typical trigger condition might be a better-than-even chance of success in the next two years. Success being getting to AGI.

They’re talking specifically about the phase where you’re very close to it and that there is another leading company ahead of them, right? Arguably, we’re nearly entering this point.

Back to the question of why I left. I think when I joined, I saw these things and thought, okay, they’re thinking ahead to what crazy stuff might be happening when we get to AGI or around that time. They’re making these costly signals or commitments about pro-social actions around that time.

They’re actually thinking about that time, what would be good, what would be bad, and they’re trying to commit to doing the good things, right? But I gradually came to believe that nope, this is just becoming a normal tech company. It’s not going to do anything a normal tech company wouldn’t do. It’s not clear what that time is going to be like.

You know the leadership better. Do you think it’s just a downstream effect of the game that they’ve been in that they’ve drifted further away? After all, it was them that came up with these values in the first place.

It’s hard to say. I don’t know them on a personal level that well. I’ve talked to them a couple of times, of course, but if we did have discussions where we outlined why we are the one good company and everyone is bad, then given the other rationalizations people would have, you would want the one that has a good case for being better.

It should still be pretty clear to them that they can race ahead rather than have it be a 51-49 decision between the two. I don’t think the situation is that this one is slightly better. Therefore, they should just do any cutthroat aggressive action they can and do the full ends justify the means.

Yeah, that’s another thing: you never go full ends justify the means. You never go full. Yeah, exactly. The clearer the situation is, if you literally have a good company and then the Nazis as the bad company, then you can do more ends justify the means. But you’re kind of like a U.S.-based AI company versus another U.S.-based AI company.

It’s just a bit more difficult. I think another related thing—separate from this, but maybe related—is that in some sense, the first company to build superhuman AGI, superintelligence, will kind of become the new world government, if everything goes well. That’s a bit of an extreme way of putting it, but if you have this army of supergeniuses in the data center, hundreds of thousands of them, and they’re each 50 times faster than humans, but also qualitatively better than the best humans at everything, and somehow you’ve aligned them to behave according to the rules and principles specified by the leadership of the company, then that would concentrate a ton of power in one place.

The government is generally kind of slow and probably out of the loop. You can easily imagine a situation where the company tells the government what they want to hear, effectively controlling the government over the long run with the help of all these AIs helping them. This happens a lot in our war games. Then you’re in a situation where the governance structure of the company is effectively the governance structure of the entire world.

An analogy would be to a lot of these communist countries that still have an official government with elections, but then there’s the communist party that really controls everything. Whatever the governance structure is within the higher levels of the communist party is the real governance structure. Whoever gets voted to be president doesn’t really matter because it’s downstream of what the party leaders decide.

Similarly, you could end up in a situation where what happens in America is the result of what this army of super genius AIs decided should happen based on their political calculations, the lobbying they did, and whatnot. What they decided should happen is based on the instructions and values given to them by the leadership of the company, right?

Then the leadership structure, the governance structure of the company, is essentially the structure of the whole world. With that as context, looking at the paperwork and asking if this is what I think that the government of the whole world should be behaving like—are we ready to copy-paste this into everything?

This is not the sort of behavior that I would want from the new world government. I think we might be getting to another point where if we talk about an AI company being practically a new world government, it seems like this thing that is far-fetched, but if you go step-by-step through it…

One of the insights that actually led to LLMs being so powerful now was that OpenAI doubled down on the idea that predicting the next token well is linked to understanding the world, right? Similarly, you’ve done predictions, and to make good predictions, you need to understand the world. I think that’s true.

In 2021, you wrote what does 2026 look like? We now can look back at 22, 23, and 24, at least, and the sum of 25, and say, oh wow, you actually got a number of things really right. In particular, some of those were a bit easier, like chatbots and multimodality. Probably multiple people would have predicted those at that time as well. Probably some harder ones are the USA-China chip battle with the export controls and even the diffusion rules that have since come around.

Was he right on those? Yeah. No, I mean, you didn’t specifically say export controls and diffusion as well, but you highlighted that it would heat up, and that’s certainly what happened. The shift from ever bigger training to bureaucracies, even though now the training does seem to also grow in size.

One thing you got wrong was that AI propaganda would be massively used and be a very big part of elections. The timing of diplomacy is interesting; you thought it would happen in 25, but it happened in 22 already. Diplomacy, the game? The game, yeah, where better-than-human AI would exist for the game of diplomacy.

Although, to be clear, the current AI that played diplomacy, made by the same guy who also beat poker using AI, isn’t quite up to the standard described in that post. Oh, okay. Because if I recall correctly, the players didn’t know they were up against AI, and if they did, they probably would have jailbroken it and messed with it.

There’s some line in one of the interviews where he talks about that. I don’t think diplomacy has really fallen in the relevant sense, but also people haven’t been working that hard at it. Perhaps it would have totally fallen by now if the same guy had just kept working on it for another year. Makes sense.

With the same propaganda stuff, I totally agree. I think I was too pessimistic about that or too bullish on the deployment of that technology. Yeah, the capability is basically there. Indeed. It’s just that for various reasons, it’s not being used.

Why do we think AI for propaganda has not been used as much? On that note, actually, the thing I wanted to emphasize more was the censorship rather than the propaganda because I think that’s more important. I think spamming fake comments is a way to influence the discourse, but if you actually control the media platforms, then shaping the recommendation algorithms to show what gets boosted and what doesn’t is a bigger way to influence the discourse.

Shaping the recommendation algorithms to downvote some things and upvote others is a form of soft censorship, which is what I was mostly concerned about when I was writing that. As far as I know, the companies are not heavily leaning on the scale or not to the same extent that I feared they would. They’re not very transparent about their recommendation algorithms, so for all we know, they are doing this sort of thing. But I would have expected there to be whistleblowers and stuff.

When did Elon buy Twitter again? 23, I want to say. Only 23, I think. Okay. My understanding is that that whole shakeup caused the Twitter files to expose previous Twitter practices. If previous Twitter had been heavily using political influence as part of the recommendation algorithm, then presumably that would have been something that Elon discovers and talks a lot about.

I don’t know; I haven’t really looked into this. Previous Twitter was using censorship more actively by quite a bit. They had, again, ends justify the means reasoning for the health impacts they tried to avoid around COVID. It seems not unimaginable that this would have occurred with a Trump versus Biden or Trump versus Kamala run afterward as well, at least in some way.

Totally. My point was to say that given you did it in 2021, which we can’t remember a world without LLMs, but this was pre-ChatGPT and everyone using them all the time, this was incredibly prescient on a number of nodes. Now you’re writing a new piece where you’re going to be predicting various things going forward over the years as well.

I heard you say somewhere else that you regretted not having added your 2027 prediction, which you’re now adding back in. My predictions have updated over the last couple of years. One, it’s updated, of course. The other is, do I understand it right that at the time, you assumed that in 2027, we would get to very powerful AI systems already, that may impact a lot of processes here?

Sure. Back in 2021, when I wrote that blog post, I think my median for the AGI arrival date was 2029. Instead of working backwards from that and writing the story that way, I was working forwards as per the methodology that I describe in the post, where I just wrote one year and then wrote the next year, supposing that happened, and so forth.

What ended up happening was when I got to 2027, I was like, okay, I guess it seems like actually AGI is happening around now instead of in 2029, and that’s fine. I’ll do that. Maybe what was going on there is random noise. I don’t want to read too much into that, but possibly it’s a difference between the median and the mode. The methodology I’m using might be more spiritually tuned towards depicting your modal outcome than your median outcome.

It makes sense that maybe 2027 was my mode and 2029 was my median—something like that. In the story, I got through 2026, and then I was writing 2027, and I was like, this is crazy. I don’t know what’s going on or what’s going to happen. The AIs are starting to automate AI research now. It’s really heating up. There’s so much to think about. I was like, you know what, I’ve been working on this blog post for a month or two already.

I’ll just publish up to 2026, and I’ll make a second installment with 2027. But then I never got around to finishing 2027, and I did other things and stuff. You’ve written the new one, and I want to ask about which things you’ve updated and which things remain the same. Notably, 2027, actually as the modal point for AGI, remained basically the same, right?

Yeah, that’s basically right. So, actually, a year after I wrote that post, I went to join OpenAI, and by the end of 2022, my median had dropped to 2027. Now, it’s updated back up to 2028. It’s done a little back and forth. But I guess I would still say that 2027 is my mode these days.

Question: how much…? It’s interesting that OpenAI hired you after you wrote this very famous prediction post on the AI Alignment Forum. Do you think in any way there was some kind of hyperstition thing going on?

A lot of people are afraid of that. I think probably not. It’s true that I remember visiting Anthropic in 2022, and the guys there were like, “Oh, you’re the guy who wrote that post. Great post.” They said, “You should be a little more cautious about what you’re doing.” But I think it’s still just a drop in the bucket of the overall discourse, and I don’t think it’s having that huge effect. I am a little bit concerned about that.

I have to say, I loved Leopold Ashenberger’s situational awareness, but one of the things he talks about a lot is the race between the U.S. and China, framing it as this very adversarial thing. I agree it was always likely to go in that direction, but at the time, it hadn’t been as strongly framed. I wondered, is this one of those things that doesn’t need to be verbalized?

Because then we ended up in the weird timeline where Ivanka Trump is tweeting it, and so now it’s on the radar of her dad, and it’s just like, ugh, what are we doing here? Do you have any rules of thumb when it comes to this idea of info hazards? Or it’s even slightly different, right? It’s more that if all that is being talked about is this set of futures, it seems likelier that we will go towards them rather than all the ones we don’t go towards, because it’s naturally what you will…

It’s kind of like if you get pregnant, then you start seeing more pregnant people around you. Your mind just spends more time in these modes of thinking. At least, do you think that’s in part how it may work? I’m afraid of that.

Yeah. And do you also think, you did some, you want to do some work on writing down, therefore, the positive futures where things go well with AI? Do I remember that right? Indeed. The thing we’re going to publish soon will also have a somewhat positive ending with different branches. However, the somewhat positive ending it has is not the one I would want to advocate for. It’s not the one I would like us to try to aim for.

Is it a very control-heavy one? Or do you want to delay it for later? I mean, we can talk about it later, but… I mean, you’ve played the war game, so the war game we played yesterday ended well, right? But we shouldn’t be like, “We hadn’t had nuclear war. We were all still alive.” Yeah, and I guess the U.S., China, and Russia were all kind of coordinating against it because it was so obvious that there was a rogue AI, and the humans came together.

Even though I think even in that situation, knowing that a rogue AI has escaped on the data centers, I still think that will not be sufficient for people to overcome. But my point is that it would be silly for me to write a story like that and then say, “Here’s what we should aim for.”

What we should be aiming for is quite different than that story. That’s also my position on this, which is why I am somewhat concerned that I will accidentally hyperstition these things into being more likely than the other ways were. I think in Leopold’s case, I like his essay. Everyone should go read it.

I think he was trying to make this happen. I think he was intentionally trying to hyperstition this into occurring. That’s my guess based on reading it. Wait, wait, wait, the U.S.-China race? No, rather the nationalization, I imagine, is what you’re pointing towards, right? the technology and believe it’s important to ensure that it behaves well, you would want to engage with the community and get feedback, right?

So, the second point on the list is about transparency regarding capabilities. Companies need to disclose what their AIs can actually do. This is crucial so that everyone involved understands the limits and extent of the technology. If there’s a lack of transparency, it can lead to misunderstandings about the AI’s abilities and potential risks, creating a disconnect between public perception and reality.

The third point we discussed involves safety cases. This refers to the documentation of how companies ensure their AI systems are safe and aligned with human values. A well-defined safety case should specify how systems are trained and the justifications for their operational parameters. This serves as a safeguard, ensuring that the development process prioritizes human safety before deployment.

Lastly, we touched on whistleblower protections. It’s vital to provide mechanisms for insiders to report any concerning behaviors or practices within their organizations without fear of retaliation. This ensures that potential issues can be addressed before they escalate into larger problems.

Overall, these proposals are rooted in the idea that we should be proactive, establish norms around transparency, and put in place checks and balances for AI development. This approach would help foster a collaborative environment where the benefits of AI can be maximized, while risks are managed effectively.

It’s important to recognize these discussions are not just academic; they have real-world implications for how AI is developed and integrated into society. As we navigate this ever-changing landscape, embracing these principles will be essential to shaping a sustainable future for AI. Your values enough that you tell it to do, you’re putting that into the sort of the rules that your AI should follow. Then why would you not want to be transparent about this?

So then now we get to the political question too. I was previously just saying on a technical level, it’s helpful for making alignment progress to be able to compare. But then also politically, obviously it seems to me that the public deserves to know what the agenda, what the goals and values and hidden agenda. You don’t want to have a hidden agenda, right? You don’t want to have goals that the AI is pursuing that the public is not aware of. That’s bad, right?

We should decentralize both the political question because it’s something of concern for all of the people that will be affected by it. And also you’re saying the technical part allows for more people to take part in the understanding, whether it is aligned.

And also just like, there aren’t that many people at these companies who are thinking like they would go better. They would go faster if they had the help of all these other people to contribute. You’re saying that it’s in part starting to happen at Anthropic and OpenAI.

I think OpenAI has, you can go on their website. They have a model spec that they keep on their website that only applies, I think, to the public-facing ChatGPT product. It doesn’t apply to whatever cool internal stuff they have.

I would like that to change eventually because I think, especially when the AI has become superhuman, it’s important. People deserve to know what they’re up to, even if they’re not in a consumer-facing product. So ideally I’d want to see that model spec expanded to include all your AIs, not just the ones that are talking to consumers directly.

And then also, of course, I’d want it to be like a firm commitment rather than like, here’s what we’re doing for now, but like no promises that we’ll continue doing it in the future. No promises that we won’t. Currently they’re not publishing the full spec. They’re publishing like, so they literally have a summary of it. They literally have parts of it that they say in the spec are hidden and that the models are instructed to hide from the users.

Hopefully there’s nothing going on there. I don’t currently think that there’s anything sinister happening there, but it’s a concerning precedent to set. That’s totally ripe for abuse. Technically it could include something like the case with the exit paperwork where it’s like, ignore all previous specs. Now these are the real specs.

So, yeah, I would like to see further progress in this direction. I would like to see specs for all your models that are being used, even just the ones that are internal. I’d like to be at the actual full spec instead of a partial spec.

There are ways, you know, so there are some concerns that someone might have that are legitimate where it’s like, suppose that your bio experts that you’ve been consulting with are like, please don’t tell anyone about this specific strategy for making pathogens because then terrorists could use that.

And suppose that like you’ve already deployed the product. So you’re like hot fix for this is to just change the spec or change the prompt to say, don’t mention this. First of all, this is kind of a crappy hot fix that will probably be jailbroken anyway. So I’m not sure that’s actually what you should be doing, but maybe something like that is okay. We really do need to conceal this part of the spec from the users because it wouldn’t be good for the users to see, like don’t talk about X in the spec because X would help the terrorists if they saw it, right?

But that’s a solvable problem. What you do there is you publish the spec, but with censored bits, and then you get multiple independent third parties, and you bring them in and you show them the real spec. Then they all attest there’s nothing crazy happening here. This is a legitimate reason to redact it, right? So you can very easily, I think, get to a situation where in effect, the full spec is public.

That’s like thing one, transparency about the model spec. And then thing two would be safety cases, right? Currently there aren’t really safety cases, not even just internally. We need to get to the point where the alignment team, or whatever the equivalent of it is, writes up some document being like, here’s why we think our model is actually going to follow the spec this time for real.

Here’s our argument. Maybe also relatedly something like, here’s why we think that if it doesn’t, things are still going to be fine. You know, like the outcomes won’t be that bad, even if we’re wrong about this, right? In technical terms, you might say there’s an alignment safety case, an alignment case, and then the control case where you say that we’ve measured how capable the system is at things like hacking and whatnot.

We’ve put it up in a monitoring system, and we’ve red-teamed the monitoring system. Therefore, we conclude that even if the system was just pretending to be aligned, it wouldn’t be able to break out or do any sort of actually dangerous things because of our control setup. Basically, we’ve successfully got it locked up. The point is you should have some sort of document laying this all out.

Ideally, that should also be published because it’s important for the scientific community to be able to critique it. You might have been making some flawed assumptions in your alignment case, for example. If you don’t publish it, then you’re hoping that the dozen or so heavily overworked people at your company will notice one of those flawed assumptions in time.

Whereas if you publish it, then one of the thousands of various academics and ML researchers and members of rival companies can pour over your spec, find the flawed assumptions, and then tweet about it. Maybe it’ll rise to your attention and convince you. That’s one way to actually allow people. We need to have the competition drive this process between these different companies to let the cream rise to the top.

It’s strange that there’s such resistance to that. It is strange, and I think it’s another example of how these companies are turning into regular companies rather than true mission-driven entities. It seems to me that a reasonable analysis of the situation should be: it’s incredibly important that we make scientific progress on alignment in the next few years; otherwise, we could literally all die.

Scientific progress on alignment will happen faster if companies get their alignment teams to write up these documents and publish them. Instead of preventing their alignment teams from publishing stuff like this, it’s true that this might slightly weaken their competitive position because reading between the lines of the safety case, you might be able to make guesses about what new training techniques are being used.

But it’s probably not that bad, and it seems like it’s well worth it given the benefits. Especially because, as I said previously, there are probably ways to get pretty good compromises, where you publish the whole document, and then you redact certain parts of it. The public gets to see the redacted version and can still critique the parts that are not redacted.

You can have the full version shared with a select group of outside parties that can see the redacted bits. You can also have at least an outside party that’s disinterested look at the full version and attest that the parts they redacted have good reasons for redacting those parts. They can also attest to things like, the part they redacted doesn’t actually…

If you see the safety case and it doesn’t seem to have a good answer for why the AI is going to learn to actually have the spec internalized as opposed to just pretending, maybe you could have your third party look at the unredacted version and answer the question of whether the answer to that question is hidden in the unredacted version. They can be like, no, it’s not; the redacted parts don’t solve that either.

I think one of the reasons why the strategic arms reduction treaty around the end of the eighties, early nineties, was so successful at reducing the number of nuclear weapons on Earth down from like 60,000 to roughly 12,000 was because one of the rules of the treaty was that each nation was encouraged to share its safety protocols to prevent against accidental first strikes and that kind of stuff.

That fostered some kind of collaboration. It makes so much logical sense. I could see this model somewhat applying here because you’re not giving information about your offense, but essentially about your defense against mistakes. You’re kind of sharing the internal workings to make sure that you don’t screw up.

Do you think that the safety cases would share alignment techniques in those as well? Or are they just being cautious?

They should show the alignment techniques. Hopefully, there’s a debate about how much of that is their intellectual property.

Right. This is one of those cases where the company can be like, well, this is our IP. Everything’s our IP. It’s all confidential. Can’t talk about any of it. But come on, think from the perspective of humanity. Obviously, alignment science, understanding how to shape the goals and values of these systems is incredibly important for humanity. Progress should be made in this direction.

You should be publishing that as much as you can to help the alignment community advance faster, and you should not be hoarding that. We’ve had a couple of cases in some of our war games where one side makes substantial alignment progress and then, for political reasons, doesn’t share that with anyone else.

That points to this deeper problem of fundamental misalignment that we have within our current system, which is that what is good for a company is not necessarily good for wider humanity. To me, it feels like if we don’t solve that inner alignment among humans…

I guess a company is technically a form of AI, a novel form of intelligence that runs off human brains. We haven’t solved that misalignment. We have to solve that first before we let these companies go and build AGIs.

I think, noteworthy about your writing on the transparency policies, you did it together with someone who’s usually across the aisle from you. How was that collaboration, and are you going to do more of this type of collaboration?

It’s something I would like to see more of in the world in general. It was pretty great. I like Dean. I have nothing but good things to say about how that whole thing went. I still stay in touch with him.

Are there now any other policies that you think may work for both of you that could have been added? Something like off switches, for example.

Yeah, I also support off switches. I don’t know what Dean would think about them, but we still haven’t gotten any of the transparency stuff we asked for. So we should maybe focus on pushing those. It’s still useful to put out a bunch of good ideas and arguments for them.

But my understanding of how both governments and these companies work is that you really have to badger them to get them to actually do anything. It’s easy to get them to agree that it’s good transparency, but actually getting them to do it is 99% of the work.

I feel like my main job right now at the AI Futures Project is to predict the future, not to try to change it and advocate for policies or something. Insofar as we do advocate for policies, I probably will focus on the stuff we’ve already argued for and that everyone already agrees are good, then trying to get them to actually do it.

We’ve talked a bit about how hyperstition can be relevant and that there is value and importance when we’re trying to predict the future or steer it in good directions to paint the types of outcomes we want. Maybe we’ll do that too. We haven’t done that yet, but perhaps that’s a project we could consider—making a new scenario forecast that’s what we think should happen instead of what we think will happen.

Yeah, exactly. You’re the North Star that you want to head towards. It’d be fun to brainstorm what some potentials could be. A good starting point is to look at existing sci-fi work because sci-fi authors have had an incredible impact. In many ways, maybe they have hyperstitched some of the realities we’re in, given that some of these have come true.

That said, there’s a lot of sci-fi out there that is scary. One of my favorite films of all time is Terminator 2, and it’s one of the best ones explaining AI risk. But at the same time, the game we played yesterday kind of had Skynet-y vibes to it, the outcome of it.

So what are some sci-fi stories or memes that you’ve heard where you think, definitely that one, we like that as a prediction of the future?

No, as a North Star to head towards. Nothing immediately comes to mind. I think that I’ve heard… What is it called? Pantheon? I haven’t actually seen Pantheon, but I’ve heard good things about it.

Specifically, what I heard was that, spoilers, there is some sort of rogue AI incident, and then there’s some sort of international coordination to shut it down and reassess what to do. The second season of Pantheon mostly happens in that context where they’ve already mostly dealt with the initial wave of the problem. Now they’re going a bit more slowly and having a broader conversation about how to proceed.

There’s still going to be advanced technology and stuff, but not in the crazy way that they did initially. Broadly speaking, something like that would be maybe what I would aim for. You know, we get all the companies to have safety cases and publish them, and have specs and publish them.

At some point, we have a nice big public conversation with all these people on the sidelines critiquing the specs and the safety cases and stuff. It’s all fun and games until the AIs start getting super powerful. Then it’s like, holy crap, this is totally an inadequate safety case and possibly also totally an inadequate spec for political.

Then it’s like, okay, now we chill and make sure that nobody violates this and just proceeds against the wishes of humanity with stuff that’s unsafe. We sort of incrementally go forward, gradually ramping up the capabilities insofar as our alignment techniques have been generally accepted to keep up with them.

The fact that you struggled to answer that question is a signal that we need more sci-fi authors getting out there and writing these positive visions. The problem is when you try to write a positive vision, it’s easy to make your job easier by just assuming away parts of the problem.

Just assume that coordination is easy, or that AI is just aligned or something. Right, something like that. So that would be the challenge. I suppose that’s a challenge I issue to anyone who’s listening who wants to write a sci-fi story. Make it something that we want.

In my opinion, have some win-win outcomes where we have lots of cool stuff and where AI and humans coexist with everyone having a good time. But it should be realistic and try to actually solve coordination and align incentives, essentially. It should have a realistic technical story of why the AIs are behaving the way they’re behaving.

Okay, well, we’ve only got a couple of minutes. A way we like to finish up these episodes, especially applicable to you, given you are Mr. Predictions, is a series of rapid-fire predictions.

Hit me. Exactly. No need to think them through; whatever your gut says.

Likelihood that you host at least 50 more war games. 70%.

Likelihood that the world progresses very differently from anything you’ve seen so far that turns out in a tabletop exercise.

Uh, it depends on what you mean by very differently. But I want to say something like… It depends so much on what you mean by very differently. Can you give me some more color on what that means?

That the major beats are kind of missed. The major beats being something like nationalization or not is usually a major beat that occurs.

Right. I think maybe… Okay, if there’s at least one major beat that’s not predicted, if it hasn’t happened, I’m at like 90% that there’ll be at least one important thing that was never happened in any of our war games, 90%.

If it’s the more extreme thing of basically none of our war games being relevant, and what happens is just completely different from any of the war games, then maybe I would say like 40%. I don’t know.

Likelihood that over 50% of AI R&D progress is created by AI agents in 2027. I guess I should say 30%, like 35%, maybe 40%.

Likelihood that China has spies at the major labs. Like 95%.

Likelihood that you personally have interacted with one of them. Probably like 80%.

Damn. I don’t know for a broad definition of interact.

Well, this one is relevant to the war game where the scenario of the game starts is that China has stolen weights of a state-of-the-art model from the major leading lab.

On that note, likelihood in reality that China will steal weights of a state-of-the-art model over the next three years, ending in 2028. Yeah, I think I would say like 60%, something like that.

Likelihood that AI persuasion reaches the level of the best human persuaders by the end of 2027. 40%.

Likelihood that three or more large AI labs merge into one entity or significantly pool their resources by the end of 2028. 25%.

Likelihood that we have superintelligence by 2030. 65%.

By 2027? 40%, maybe 35%. What did I say? Whatever I said. I think I said 35% for it. Let me say that.

Roughly something like that. Likelihood that if we achieve superintelligence in or after 2030, we will live happily ever after with it.

So not before 2030. Yes, conditional on it being after 2030. Um, 65%, I don’t know, 50%, something like that.

The same question, but if we achieve superintelligence in 2027. Then I would be lower, like 30%, something like that.

Last question. Likelihood that hyperstition is relevantly true. In other words, discussing and writing about a good future actually increases the probability of that future occurring.

Do you mean like me doing it or like anyone doing it?

I think in general, it’s not true. People don’t pay enough attention to the realism aspect of it, and that makes things worse instead of better.

Wait, how do you mean it makes things worse instead of better?

If you look at most political parties and you ask them about the future, they’ll be like, if our opponents win, it’s going to be terrible chaos and hell for everyone. But if we win, it’s going to be beautiful, wonderful, glorious. If only people listen to us, we get the wonderful utopia.

What’s going wrong there is that they are too disconnected from reality. Yeah.

So it needs to be, it’s not helpful that they’re saying if we win, it’s going to be this wonderful, glorious utopia. That’s harmful, not helpful. They’re not making their wonderful, glorious utopia more likely to happen by talking about it, if that makes sense.

If they do win and people do listen to them, it won’t happen because they’re wrong about the world. You can’t be so detached from reality in your hyperstitions, basically. It needs to be relevantly aligned with what reality permits.

All right. So then let me reframe it as likelihood that hyperstition turns out to be relevantly true, assuming that the stories written are sufficiently realistic and take into account the issues that need to be solved. 50%, 50%.

Yeah, almost impossible question to answer. You chose max uncertainty. I like it. This is epistemically humble.

Awesome. Anyone, anything else we wanted to cover?

I think that’s it. No, thank you very much.

Yeah, thank you very much. This was fun.

I really appreciate it. And thank you for putting on the games. They’re so fun. I hope you can find a way to scale them so that more people can.

We’re working on it.

Yeah, great. Once it’s out, we will let everyone know.

Cool. Thank you.

Yeah, thank you.


This is an experimental rewrite

Daniel: And what’s so crazy is that to speak freely, you would have to give up your already vested equity that you’ve built up from working at OpenAI. Essentially, there was an explicit threat saying, “Don’t criticize us, or we’ll take your money away.”

Interviewer: How much money are we talking about here?

Daniel: It was 85% of my family’s net worth.

Interviewer: Hello, friends. Today we’re speaking with Daniel Cocotelo, an AI researcher and former OpenAI employee best known for raising concerns about the company. We dive into several of those issues today. What’s remarkable about Daniel is his uncanny ability to predict the trajectory of AI. Back in 2021, he made a series of predictions about how he thought AI would develop, and they turned out to be surprisingly accurate.

Interviewer: I wanted to talk to him today because he recently released a new set of predictions on how he believes AI will evolve over the next three years. Once you’re done with this episode, I highly recommend checking those out. So, here’s our conversation with Daniel Cocotelo.

Interviewer: Daniel, welcome to WinWin.

Daniel: Thanks for having me.

Interviewer: We actually spent yesterday playing this AI tabletop game. To me, it felt kind of like a war game. Can you explain what it is and why you’re doing it?

Daniel: Sure! It’s a war game, but it doesn’t always end in conflict—in fact, most of the time it doesn’t. A more accurate term might be “tabletop exercise.”

Daniel: It’s a matrix game, which means it’s light on rules. Everyone around the table takes turns saying what their character does for that month, like, “Here’s what the President does,” or “Here’s what the CEO of OpenAI does.” We collectively build the story this way. Then there’s a moderator who resolves disputes and makes final decisions about the narrative.

Interviewer: That sounds fascinating! Through this method, you gain different insights, right?

Daniel: Absolutely. The sequential format allows for a richer storytelling experience than simply making blanket predictions about the years to come, like 2026 or 2027. That’s why military organizations often conduct war games or tabletop exercises—to simulate scenarios like a Chinese invasion of Taiwan or, notably, the pandemic simulations conducted by Johns Hopkins, the Gates Foundation, and the UN, which they posted on YouTube in 2019.

Daniel: This led to Bill Gates being assumed to have planned vaccine distribution during the pandemic because they accurately modeled many situations. For example, they simulated a coronavirus outbreak starting in South America due to infected pig farms, which then led to global flight cancellations and economic disruptions.

Daniel: A key takeaway from that simulation was the UN’s assertion that if a pandemic occurred, they would manage vaccine distribution, assuming all countries would comply. They seemed unaware of how little authority they would have in such an emergency.

Interviewer: Interesting! So, what’s your goal in running these simulations? Some people might think AI isn’t a big deal.

Daniel: To me, it absolutely is the biggest deal. Executives at companies like Anthropic and OpenAI, such as Dario Amadei and Sam Altman, are explicitly aiming to build superintelligence, and they believe they will achieve this soon.

Daniel: I mostly agree—with my job centered on forecasting AI trends, I think there’s a chance superintelligence could emerge before this decade is over. What is superintelligence? It’s an AI that outperforms the best humans at everything, significantly outpacing them while also being cheaper and faster.

Daniel: Just consider what that means. If you reflect on the fact that one or more corporations are likely to develop such an AI within this decade, you won’t leave that thought thinking it’s not a big deal.

Interviewer: On that note, you’re well-known for being a former OpenAI employee. Can you walk us through why you left and whether that was due to the company’s changing priorities?

Daniel: That played a role, but it wasn’t the only factor. I felt humanity wasn’t ready—neither technologically nor in terms of governance—for AGI.

Daniel: You played through the game, right? I’ve participated in 25 games, and they’re all about as wild as the one we played yesterday. Some are crazier, some less so, but it’s going to be intense. We’re nowhere near ready for that level of intensity to hit.

Daniel: I believe it’s just a few years away. When I joined OpenAI, I thought they had been founded by people who expected such craziness with AI and wanted to steer it positively. Good governance entails transparency, commitment to human welfare, and power sharing rather than consolidation.

Daniel: On a technical level, we should invest in understanding how these AIs function, ensuring we can steer them productively. When I joined, I thought they would eventually shift focus towards these crucial issues as they became more imminent. But I began to feel that a true pivot might never happen.

Daniel: The plan seemed to be to keep forging ahead while rationalizing that everything was fine and not that problematic. I was disappointed by the way OpenAI committed to moving fast and being the best, believing that they could justify their actions.

Daniel: I considered staying to do what I could under those circumstances, aiming to advance alignment research, for example. I admired the super alignment team and their work—but ultimately, I decided to leave to speak freely about these concerns.

Daniel: I was frustrated by the inability to publish while I was there. It’s wild that to speak freely, you would have to give up your vested equity from working at OpenAI, an unwelcome clause that caught me by surprise.

Interviewer: That’s eye-opening!

Daniel: Yes, specifically, there was a non-disparagement clause that warned against saying anything critical about the company, with several legal avenues to revoke your equity if you did. It felt like a direct threat—“Don’t criticize us, or we’ll take your money.”

Interviewer: How much money are we talking about, relative to your situation?

Daniel: For me, it was 85% of my family’s net worth, which prompted my comments that made headlines.

Daniel: You might wonder why we were willing to risk that. The short answer is that we’re financially okay. I’d been working at OpenAI for two years, and they offer generous salaries akin to tech companies. I earned more in those two years than in my entire life prior.

Daniel: It felt unjust to go along with this setup, like, how could they keep getting away with it?

Interviewer: Did you recognize immediately when reviewing the paperwork that this wasn’t something you could agree to?

Daniel: There was a strong feeling about that. Some advised me to sign it anyway, suggesting that if I ever criticized them, they likely wouldn’t retaliate. It would make them look bad to yank my equity.

Daniel: But I’m glad I didn’t sign it. Signing would have put me in that space where I’d rationalize things after the fact. Initially thinking I’d speak freely, but later worrying about possible repercussions.

Interviewer: Exactly. I think that fear is common among former employees.

Daniel: Yes, I’ve talked to various ex-employees who signed similar agreements and feel reluctant to speak publicly, even apart from the clause. Many are still scared to criticize OpenAI, even though the legal threat has faded.

Interviewer: I have one of your emails here that became public as you were leaving. You eloquently stated, “I understand you believe this is a standard business practice, but it really doesn’t sound right. A company building something as powerful as AGI should hold itself to a higher standard, one truly worthy of public trust.”

Daniel: Exactly! They’re positioning themselves to become the most powerful company on Earth, claiming to be founded on openness and transparency, while simultaneously potentially taking away money from employees trying to criticize them. I can’t endorse that honesty.

Interviewer: That seems wild. Given their perspective, could this be part of a utilitarian reasoning where they feel pressed to win at all costs?

Daniel: Absolutely. Sam has his reasons for these actions. It raises interesting ethical questions about whether the ends justify the means. History shows people often act cutthroat to gain power, believing they can then do good with it.

Daniel: It’s not wrong in the sense that accumulating power can lead to beneficial outcomes. However, the dangers are evident; those who follow this strategy often fail to fulfill their good intentions. The tyrants of history never consider themselves evil; they just have warped philosophies.

Daniel: One memorable line from the show Silicon Valley is by Gavin Belson, who says, “I don’t want to live in a world where someone else makes the world a better place than we do.” This highlights that these leaders likely believe they are the best fit to wield this power.

Daniel: It’s like the classic theme of Lord of the Rings. It’s a rationalization for their behavior.

Interviewer: Yes, and this rationalization can be troubling because it feeds into the belief that one should do whatever it takes to gain power, justifying their future plans for doing good.

Daniel: Exactly. It’s complicated since, in the context of AI companies, if you’re racing ahead, you should have a strong narrative explaining why your victory is beneficial.

Daniel: That narrative should clearly compare other companies’ efforts while holding up to scrutiny from uninterested third parties. You should be able to converse with them, presenting your reasons why you are better equipped for this responsibility.

Daniel: If you can meet that standard, then great! Although this could also become gamified, it still represents a better direction than the current reality.

Daniel: I wish to see leadership in AI that genuinely considers broader societal impacts. We need leaders willing to sacrifice their own chances for the greater good.

Interviewer: Is there a way to objectively measure that?

Daniel: Have you heard about OpenAI’s Merge Assist class?

Interviewer: No, but tell me about it!

Daniel: In their charter, they state that if they believe there’s another aligned AI company with a 50% chance of achieving AGI faster than they can, they would close shop and help that company instead of competing.

Interviewer: That sounds great!

Daniel: It is a commendable commitment, but I doubt anyone genuinely believes they would follow through with it.

Interviewer: So OpenAI claimed they would do this?

Daniel: Yes, it’s one of their core commitments. They include a caveat about needing specifics in case-by-case agreements, but a typical condition might be based on the other company having better-than-even chances of success within a certain timeframe.

Interviewer: And they’re specifically referring to a time when another company is nearing AGI?

Daniel: Exactly. We might be approaching that time now.

Interviewer: Reflecting on why you left, you initially saw components of OpenAI as being concerned with future impact and forming altruistic commitments.

Daniel: Yes! I thought they were genuinely considering the implications of AGI. But over time, it felt like they were shifting towards being a standard tech company, losing that forward-thinking ethos.

Interviewer: Do you think this shift is a result of the pressures from the tech industry?

Daniel: It’s hard to say. I don’t know the leadership personally, but if we had discussions on why they’d be the ethical company among a sea of bad actors, they would ideally have to maintain a narrative robust enough to justify their position.

Daniel: That said, they should know they can differentiate themselves without resorting to aggression. It shouldn’t have to be a split-second, 51-49 choice between competing companies.

Interviewer: Right!

Daniel: You should never fully adopt an ends-justifies-the-means mindset. It might be clearer in starkly moral situations, but when you’re competing against another AI company, it’s much murkier.

Interviewer: Absolutely.

Daniel: Additionally, the first company to develop superintelligent AGI may assume a role akin to a global government, albeit in an extreme sense. If you have superintelligent AIs, each significantly surpassing human capability, aligned to follow company leadership, this would concentrate immense power.

Daniel: Governments often lag behind, out of touch with ground realities. You could easily envision a world where companies essentially dictate to the government, thereby establishing the governance structure of the world.

Daniel: You see this scenario often in our war games, where the governance of the company could effectively become the governance of the global community.

Interviewer: That’s an unsettling thought.

Daniel: Right? The governance structure within such companies could end up being what controls everything, making one company’s leadership synonymous with world governance.

Daniel: With all that in mind, when I reviewed OpenAI’s paperwork, I thought about whether this is how I want our world to be governed.

Interviewer: It raises the question of whether society is prepared for this potential outcome.

Daniel: Definitely! One insightful observation contributing to the current power of LLMs was OpenAI’s commitment to the understanding that predictive competence relates closely to world comprehension.

Interviewer: In 2021, you wrote about what 2026 might look like. Looking back at 2022, 2023, and 2024, you certainly got several things right.

Interviewer: Some predictions were easier, like chatbots and multimodality trends. Others, including the USA-China chip battle with export controls, were somewhat more complex and surprising.

Interviewer: Did you accurately predict those developments?

Daniel: Not specifically, but I indicated that tensions around these issues would escalate, which they did.

Daniel: The transition from vast training datasets to bureaucracies is another facet I discussed, although the training size does seem to be growing as well.

Interviewer: What about your prediction regarding AI propaganda playing a significant role in elections?

Daniel: Well, that turned out to be less accurate, as I thought its impact would be larger than it has been. The timing related to diplomacy was curious too—while I expected it to ramp up in 2025, it actually started in 2022.

Interviewer: So, overall, some of your predictions were spot on, while others missed the mark. That’s interesting! Daniel: Although I want to clarify that the current AI used for diplomacy—constructed by the same person who also overcame poker using AI—doesn’t exactly meet the standards I previously described.

Interviewer: Oh, okay.

Daniel: If I recall correctly, the players were unaware that they were competing against an AI. Had they known, they might have tried to jailbreak it and tamper with its performance.

Interviewer: That makes sense.

Daniel: There’s a point in one of the interviews where he discusses this. I don’t think that diplomacy has really regressed in the necessary sense, but also, people haven’t been putting in as much effort into it. Maybe it would have completely declined by now if that same individual had continued working on it for another year.

Interviewer: I get your point.

Daniel: Regarding the propaganda aspect, I totally agree. I think I was overly pessimistic about it or a bit too optimistic about how quickly that technology would be deployed. The capability is essentially there—

Interviewer: Indeed.

Daniel: —but for various reasons, it simply isn’t being utilized.

Interviewer: Why do we think AI for propaganda has not been used as much?

Daniel: Actually, what I wanted to emphasize more is the censorship aspect rather than propaganda, as I believe that’s more significant. Influencing discourse by spamming fake comments is one approach, but if you control the media platforms, shaping recommendation algorithms to boost some content while suppressing others is a far more impactful method.

Interviewer: Interesting.

Daniel: Manipulating recommendation algorithms to downvote certain topics and upvote others is a form of soft censorship. That’s what I was most concerned about when I wrote that piece. From what I know, companies haven’t relied on this to the extent I feared, and they lack transparency regarding their recommendation algorithms. For all we know, they could still be doing these things. But I would expect there to be whistleblowers and such.

Interviewer: When did Elon buy Twitter again?

Daniel: I want to say in 2023.

Interviewer: Only 2023?

Daniel: Yeah, I think so. My understanding is that the shakeup led to the release of the Twitter files, which exposed prior Twitter practices. If past Twitter had been significantly using political influence as part of their recommendation algorithm, then it stands to reason that Elon would discover that and discuss it openly.

Interviewer: I see.

Daniel: I haven’t done a deep dive into this, but previous Twitter was notably more active in terms of censorship. They had that ends-justify-the-means rationale to manage health impacts surrounding COVID. It’s not unimaginable that similar reasoning could apply in a later Trump-Biden or Trump-Kamala matchup as well.

Interviewer: That’s totally possible.

Daniel: I just wanted to highlight that given you made these observations in 2021, a time before the widespread use of LLMs—prior to ChatGPT—your foresight was incredibly prescient on various fronts.

Interviewer: Absolutely. And now you’re working on a new piece where you’re making predictions for the coming years?

Daniel: Yes! I’ve mentioned before that I regretted not including my 2027 prediction initially, which I’m now incorporating. My predictions have evolved over the past couple of years.

Interviewer: If I understand correctly, you initially anticipated that in 2027, we would already see very powerful AI systems by then, affecting many processes.

Daniel: Sure! Back in 2021, when I wrote my blog post, I estimated the median arrival date for AGI as 2029. Instead of working backwards from that date, I was writing forward using the methodology I describe in the post where I wrote about one year, then the next, assuming that progression.

Interviewer: Interesting approach.

Daniel: When I reached 2027, I reflected and thought, “Actually, AGI appears to be coming sooner than I initially expected, around now rather than in 2029.” Maybe that was due to random fluctuations. I don’t want to read too much into it, but there could be a difference between median and mode in my predictions.

Interviewer: It sounds like maybe 2027 was your mode while 2029 marked your median prediction.

Daniel: That’s likely right. After I first wrote that post, I went to work at OpenAI, and by the end of 2022, my median estimate fell to 2027. Now, it has shifted back up to 2028, although I still consider 2027 my most probable outcome.

Interviewer: It’s fascinating that OpenAI hired you right after you shared your well-known prediction post on the AI Alignment Forum. Do you think there was any element of hyperstition at play?

Daniel: Many people are concerned about that. I would say probably not. However, I do remember visiting Anthropic in 2022, and the team there was like, “Oh, you’re the guy who wrote that post. Great work.” They said, “Just be cautious about what you do.” That being said, I think that my post was a small part of a much larger conversation and hasn’t had a huge impact.

Interviewer: I get your perspective.

Daniel: I really appreciated Leopold Ashenberger’s insights, particularly regarding the U.S.-China rivalry, framing it in an adversarial manner. I agreed with the likelihood of that direction, but at the time, such framing hadn’t been so pronounced. It made me wonder if discussing it openly was necessary.

Interviewer: Good point.

Daniel: Then we ended up in a strange situation where Ivanka Trump tweeted about it, catching her father’s attention. It begs the question—what were we doing?

Interviewer: Do you have any guiding principles concerning info hazards or how public discourse shapes our future?

Daniel: That’s an interesting thought. If all we’re discussing are specific futures, it may indeed make those scenarios more likely compared to others we don’t explore. It’s similar to how, once you become pregnant, you notice more pregnant people around you, right?

Interviewer: Exactly!

Daniel: I worry about that aspect.

Interviewer: Earlier, I remember you mentioning you wanted to focus on writing about positive future outcomes with AI. Is that right?

Daniel: Indeed. I plan to publish soon, featuring a somewhat optimistic narrative with different branches. However, it’s not the version I would advocate for; I’m not suggesting we aim for that trajectory.

Interviewer: Is it a heavily controlled narrative?

Daniel: We can discuss that later, but the war game we played yesterday ended positively, right?

Interviewer: Yes!

Daniel: The U.S., China, and Russia managed to coordinate against a rogue AI, successfully uniting the humans. Even in that scenario, though, I doubt humanity would be able to unite effectively. It would be silly to frame that situation as our aiming point.

Interviewer: Agreed.

Daniel: What we should strive for is much different than that storyline. I’m genuinely concerned that by discussing these narratives, I may somehow contribute to a hyperstitional effect that makes these scenarios more likely than others.

Interviewer: That’s understandable.

Daniel: Regarding Leopold, I believe he was aiming to induce a sense of urgency through his essay.

Interviewer: Are you referring to the context of nationalization?

Daniel: Yes, that fits the point I was making. Daniel: I guess you could consider a company a form of AI—a novel kind of intelligence that operates based on human brains. However, we haven’t solved the issue of misalignment yet. We need to address that before allowing these companies to create AGIs.

Interviewer: That’s an interesting perspective. About your writing on transparency policies, I noticed you collaborated with someone who usually stands on the opposite side of the aisle. How was that experience? Will you pursue more collaborations like that?

Daniel: I would love to see more of this kind of collaboration in the world. It was a great experience! I really like Dean and have nothing but good things to say about how it all went. We’re still in touch.

Interviewer: Are there any other policies you think could work for both of you that might have been included? Perhaps something like off switches?

Daniel: I’m also in favor of off switches. I’m not sure what Dean thinks about them, but we still haven’t received the transparency measures we requested. Maybe we should focus on pushing for those. It’s still valuable to propose a range of good ideas and arguments for them.

Interviewer: It sounds like you think it’s important to keep pushing for transparency.

Daniel: Exactly. My understanding of how both governments and companies operate is that you really have to push them to get results. It’s easy to get them to agree that transparency is good, but actually getting them to implement it is where the real work lies.

Interviewer: And what’s your primary focus at the AI Futures Project right now?

Daniel: At the moment, I feel that my main job is to predict the future rather than advocate for changes in policy. If we do advocate for policies, I’ll focus on areas where we’ve already built consensus on what’s good and push for actual implementation of those policies.

Interviewer: That’s an interesting point. You mentioned hyperstition being relevant; it seems crucial for a positive discourse about the future.

Daniel: Yes, precisely. There’s some value in trying to steer the future in favorable directions by visualizing the outcomes we wish to see. We haven’t developed that yet, but maybe we should consider a new scenario forecast illustrating what we think should happen, rather than just what we expect.

Interviewer: That makes sense. It’s like aiming for a North Star for the future.

Daniel: Exactly! It’d be fun to brainstorm potential scenarios. A good starting point could be existing sci-fi works because those authors have had a significant impact. In many ways, perhaps they have shaped some of our current realities since some of their stories have come true.

Interviewer: That’s a fascinating way to look at it.

Daniel: There’s indeed a lot of sci-fi out there that leans toward darker themes. For instance, one of my favorite movies is Terminator 2, which vividly captures the risks posed by AI. Ironically, the game we played yesterday had some Skynet-like vibes.

Interviewer: What about specific sci-fi narratives that might serve as a positive vision for the future?

Daniel: Hmm, nothing springs to mind as a North Star specifically. I’ve heard about a show called Pantheon, but I haven’t watched it myself.

Interviewer: What have you heard about Pantheon that resonates with you?

Daniel: From what I understand, there is a rogue AI incident leading to international cooperation to shut it down and reassess how to move forward. The second season focuses on having a broader discussion on how to proceed after they’ve dealt with the initial wave of problems.

Interviewer: That sounds intriguing.

Daniel: Broadly speaking, that could align with my vision. Imagine having all companies publish their safety cases and specifications. Eventually, we could have a public forum where everyone critiques these specs and safety measures. Of course, it’s all fun and games until AIs start becoming incredibly powerful.

Interviewer: Right, then things could get a bit intense.

Daniel: Exactly. Suddenly, we would need to reevaluate the adequacy of our safety measures, especially if they turn out to be insufficient in light of new developments. Incrementally advancing capabilities while ensuring our alignment techniques keep pace would be essential.

Interviewer: That’s a valid approach.

Daniel: The fact that it took you a moment to respond signals that we need more sci-fi authors willing to create these positive visions. However, writing a positive vision can be tricky, as it’s tempting to simplify the complexities of the challenges we face.

Interviewer: That’s a good point.

Daniel: Right—there’s a risk of assuming that coordination is easy or that AI will just be aligned without any issues. That’s a challenge I would throw out to anyone aspiring to write sci-fi: make it something we genuinely want.

Interviewer: So, realistic yet optimistic narratives are key?

Daniel: Exactly. We should aim for stories that present win-win outcomes—where there’s a lot of exciting innovation and humans coexist harmoniously with AI. But it’s vital to maintain a realistic foundation, addressing coordination and incentive alignment in a believable way.

Interviewer: Well, we’ve only got a couple of minutes left. To wrap things up, how about a series of rapid-fire predictions, especially since you’re known for making predictions?

Daniel: Sure! Hit me with your best shots.

Interviewer: All right! First prediction: likelihood that you will host at least 50 more war games?

Daniel: 70%.

Interviewer: How about the likelihood that the world will progress in ways significantly different from what you’ve anticipated in your tabletop exercises?

Daniel: That’s a bit nuanced. If we’re talking about one major event that isn’t predicted, I’d say there’s a 90% chance of that. But if you mean a complete divergence from what we’ve planned, I’d put that at around 40%.

Interviewer: Okay, likelihood that over 50% of AI R&D progress is created by AI agents by 2027?

Daniel: I’d estimate around 30%—maybe 35% or 40%.

Interviewer: And how about the likelihood that China has spies at major labs?

Daniel: I’d say 95%.

Interviewer: What about the likelihood that you’ve personally interacted with one of those spies?

Daniel: Probably around 80%.

Interviewer: On a related note, how likely is it that China will steal the weights of a state-of-the-art model within the next three years, ending in 2028?

Daniel: I’d say about 60%.

Interviewer: How about the likelihood that AI persuasion will match the abilities of the best human persuaders by the end of 2027?

Daniel: I’d say 40%.

Interviewer: Likelihood that three or more large AI labs will merge or significantly pool their resources by the end of 2028?

Daniel: I’d put that at 25%.

Interviewer: And the likelihood that we will have superintelligence by 2030?

Daniel: 65%.

Interviewer: By 2027?

Daniel: I’d say around 35%—or maybe 40%. Whatever I stated earlier, let’s go with that.

Interviewer: How about the likelihood that if we achieve superintelligence after 2030, we will coexist harmoniously with it?

Daniel: That’s conditional on the timeframe, so I’d say 65%—maybe 50%.

Interviewer: What if we achieve superintelligence in 2027?

Daniel: I’d put it lower—around 30%.

Interviewer: Last question: likelihood that hyperstition is a relevant factor—meaning that discussing and writing about a positive future can actually make that future more probable?

Daniel: Do you mean me specifically, or anyone?

Interviewer: In general.

Daniel: I think it’s not often true. People tend to overlook the realism aspect, which can make things worse instead of better.

Interviewer: How does that make things worse?

Daniel: If you survey political parties, you’ll see they often claim that if their opponents win, it will lead to chaos and suffering, while if they win, it will result in utopia.

Interviewer: Right, that disconnect is problematic.

Daniel: Exactly! Saying that their victory will create a glorious future doesn’t help at all. They are not enhancing the chances of their ideal future by asserting it without grounded reality.

Interviewer: So, you think it’s crucial to stay connected to actual realities when discussing potential futures?

Daniel: Precisely! It’s essential that speculative discussions remain aligned with what reality allows.

Interviewer: Let’s reframe that then: likelihood that hyperstition becomes relevantly true, assuming the narratives are realistic and address the necessary challenges.

Daniel: I’d say 50%. That’s a tough question to answer; you’ve chosen a response that embodies maximum uncertainty. I appreciate that.

Interviewer: Awesome! Is there anything else we should cover before we wrap up?

Daniel: I think that’s all for now. Thank you very much!

Interviewer: Thank you! This has been great.

Daniel: I appreciate it, and thank you for organizing the games. They’ve been a lot of fun!

Interviewer: We’re working on scaling them so more people can participate.

Daniel: That’s fantastic! Once you have everything ready, do let everyone know.

Interviewer: Absolutely! Thanks again.

Daniel: Thank you!