Kevin Roose: Yeah. So it stands for artificial general intelligence. And you could probably ask a hundred different A.I. researchers, and they would give you a hundred different definitions. Researchers at Google DeepMind just released a paper this month that sort of offers a framework. They have five levels, ranging from zero, which is no A.I. at all, all the way up to Level 5, which is superhuman. And they suggest that currently ChatGPT, Bard, LLaMA are all at Level 1, which is sort of equal to, or slightly better than, an unskilled human. Would you agree with that?
Sam Altman: I think the thing that matters is the curve and the rate of progress. And there’s not going to be some milestone that we all agree, like, OK, we’ve passed it and now it’s called AGI. I think most of the world just cares whether this thing is useful to them or not. And we currently have systems that are somewhat useful, clearly. And whether we want to say it’s a Level 1 or 2, I don’t know.
But people use it a lot, and they really love it. There’s huge weaknesses in the current systems. I’m a little embarrassed by GPTs, but people still like them, and that’s good. It’s nice to do useful stuff for people. So, yeah, call it a Level 1. Doesn’t bother me at all.
Kevin Roose: What are today’s A.I. systems useful and not useful for doing?
Sam Altman: I would say the main thing they’re bad at is reasoning. And a lot of the valuable human things require some degree of complex reasoning. They’re good at a lot of other things — like, GPT-4 is vastly superhuman in terms of its world knowledge. It knows more than any human has ever known. On the other hand, again, sometimes it totally makes stuff up in a way that a human would not. But, you know, if you’re using it to be a coder, for example, it can hugely increase your productivity. And there’s value there even though it has all of these other weak points. If you are a student, you can learn a lot more than you could without using this tool. Value there, too.
Kevin Roose: Right now, I think what’s holding a lot of people back in a lot of companies and organizations is that it can be unreliable — it can make up things; it can give wrong answers. Which is fine if you’re doing creative writing assignments, but not if you’re a hospital or a law firm or something else with big stakes. And how do we solve this problem of reliability? And do you think we’ll ever get to the sort of low fault tolerance that is needed for these really high-stakes applications?