Episode Transcript
[00:00:00] Speaker A: Welcome to ID the Future. I'm Andrew McDermott. Today's episode comes to us from our sister podcast, Mind Matters News, a production of the Discovery Institute's Walter Bradley center for Natural and Artificial Intelligence.
You can learn more about the show and access other episodes at mindmatters. AI.
[00:00:24] Speaker B: Greetings and welcome to Mind Matters News. I'm your Turing Testable host, Robert J. Marks.
The Turing Test, proposed by Alan Turing in 1950, is a method for assessing a machine's intelligence by evaluating whether it can imitate human conversation so convincingly that a human judge cannot reliably distinguish it from another human being.
In the test, an interrogator communicates with both a human and a machine, usually by text, so the voice doesn't give it away, and tries to identify which is which.
If the judge cannot tell them apart, the machine is said to have demonstrated intelligence. Our guest today is Yorgos Maporis, who says the Turing Test is not enough to measure intelligence, and I agree with him. There are some AI researchers that they think that while modern AI can simulate human like conversation impressively, this does not mean it has actually passed the Turing Test in a rigorous, sustained and meaningful way. I don't agree with this. I think the Turing Test has been passed by large language models, models already such as Grok and ChatGPT, at least on a rudimentary basis. I think the Lovelace Test, first proposed by Sommer Bringjord, is a test of good, test of creativity, and creativity being a component of intelligence.
Our guest today proposes the Turing Test 2.0 that is more rigorous that it's a more rigorous testing of intelligence of AI. The shortcomings of the original AI, its reliance on things such as deception and imitation and shallowness are are one reason new proposals aim at setting clearer standards for detecting true intelligence. A link to his paper entitled the General Intelligence Threshold is provided in the podcast notes.
Our guest today is Dr. Yurgis Mapouris and he's the one that is proposing this Turing Test 2.0.
I have his permission to call him George. So that's what we're going to do for the remainder of the podcast. And George, you're welcome to call me Bob.
He was born in Greece, he grew up in Cyprus and he is a study or a graduate in electrical and computer engineering at the National Technical University of Athens, which of course is in Greece. And after graduation in 2014, he moved to the US to pursue a PhD in computer architecture at Duke University.
Really great school. He received his PhD about a quarter century ago and he is currently Working in Silicon Valley, including Oracle.
And this is where technically gifted people go to get high paying jobs and maybe to get rich.
George, welcome.
[00:03:07] Speaker A: Hi, Bo. Thank you for having me.
[00:03:08] Speaker B: I want to get something out of the way first. Not to do with the topic, but we were talking before about what it's like to live in Silicon Valley. Everybody's heard of Silicon Valley, but they don't know very much about it. First of all, I understand it's a beautiful place to live, is that right?
[00:03:23] Speaker A: Yeah, especially for me, coming from Cyprus, being used to the Mediterranean climate, the nice weather, sunny every day. Yeah, this is the perfect place if you want that type of weather.
[00:03:36] Speaker B: And what about it, culturally, I mean, what I think you mentioned, there's lots of venues there that you can go to and it's really rich and it's a good place to get employment. Like if you work for one company and you decide maybe to go to another company, it's easy to hop ship.
Is that right?
[00:03:55] Speaker A: Yeah. So I don't want to make it sound like a paradise. Every place has its own drawbacks. But yeah, this is the thing with Bay Area. Because of the companies and because a lot of employees are come from different places, there's, let's say, an international element to it. So in terms of food or entertainment, it feels like you can find many different cultures.
So especially if you're not an American, I guess.
Yeah. That you can find things that you cannot as easily access in other places in the U.S. okay.
[00:04:31] Speaker B: Is, is there a good Greek community there?
[00:04:34] Speaker A: Not big, because I guess.
Well, not that many, but yeah, a good one. Yeah.
[00:04:41] Speaker B: Okay. Now I was telling George before we started recording that I spent many years in Seattle and I think Seattle is a beautiful place.
I wouldn't care to live there today, but in the summer it's one of the most beautiful places in the world. And the joke in Seattle is during the winter, it's like living in a car wash. It's terrible. Now George rolled his eyes. You think that the climate is better in the Bay? In Silicon Valley?
[00:05:08] Speaker A: Yeah, of course. Yeah. So I've visited. Yeah, I visited Seattle. I visited other places in the US too. In the North.
[00:05:17] Speaker B: Okay.
[00:05:18] Speaker A: Yeah, I don't think I could permanently live there. Yeah, I don't adjust well to cloudy weather or cold weather. Yeah.
[00:05:26] Speaker B: Oh yeah, yeah, I can visit. I like visiting, but yeah, it's actually, it's actually terrible. In the winter, it's just gloom. In fact, I have my son and his wife and two of my grandkids live in the Seattle Area. And one of the things that happens is that the sun doesn't shine, they don't get their vitamin D and they become depressed, and they literally have to buy sun lamps and sit under sun lamps and, I believe, take vitamin D in order to get rid of their depression. It's called sads, and I forget what the anachronym means, but it's this idea that during the gloomy part of the year that you kind of get depressed. It's. It's rough. So. Yeah, I don't think you have that in San Francisco.
[00:06:09] Speaker A: No. No, definitely not. Yeah.
[00:06:10] Speaker B: Okay. Have you ever seen Alcatraz, by the way? Do you drive by Alcatraz ever?
[00:06:15] Speaker A: Yeah, I mean, I live in the Bay Area, but I visit San Francisco sometimes.
If you drive by the coast. Yeah. You can see it. Yeah, I haven't visited, but you can see. I know there is an official cruise or, like, boats that take you there as a. To see the place, but I haven't done that yet.
[00:06:36] Speaker B: Okay. Yeah. I would always like to visit Alcatraz. I've seen it in the movies, and it's very famous.
So let's get down to business. Let's talk about your paper. You wrote a paper, Turing Test 2.0, the General Intelligence Threshold. And this is a new and I think innovative measure of. To whether AI becomes intelligent. You and I don't agree on everything, but I think that this is really an innovative paper.
So let me ask you this. What motivated you to rethink the Turing Test?
[00:07:04] Speaker A: Yeah. So the first thing that started me asking that question was probably hearing people talk about AI and, let's say, describing its abilities without providing evidence. It was more like, I feel like this is going to happen. So I was like, I want to see. I want to actually find a way to test it. The first thing is like, okay, let's see how people test AI right now. How intelligent is it?
Looking at the tuning test, the first thing that made me, let's say, question it a little bit, was that we have this interrogation system and we have a human and a machine. I'm like, okay, how do I select a human?
Does any human.
Yeah, that's a good point, because not all humans have the same intelligence. Right.
Do I get an average? Do I get, let's say, am I going to question you about physics? And I get an expert in physics, and then what is the line of questioning? Because different people will come with different lines of questioning.
[00:08:11] Speaker B: Yes.
[00:08:11] Speaker A: And let's say maybe for some humans, right. The machine deceives me. For some, does not.
What if I take it even. What if I think the human is the machine? What does that mean? You see, I mean, so it felt like it had a lot of inconclusive, let's say, outcomes, which made me feel, like, ambiguous and, like, not well defined.
So this is the first thing that made me feel like this is not enough.
[00:08:41] Speaker B: Yeah, that's a good point. By the way, I have a reverse Turing Test. If I were to be in a conversation and wondered whether it was a human or computer, I would ask the computer, what's the cube root of 2416? And if it gave me. If it gave me an immediate answer, I would know it was a computer. So, you know, your point is well taken. If a physicist talks to it and he says, what's the equation for, I don't know, Newton's Laws or something like that, you know, that that's different than from a kindergartner talking to it and asking it about.
[00:09:12] Speaker A: I don't know exactly. And it doesn't really mean that it's not intelligent. Okay, you can tell it's a machine, but that does not mean it's not intelligent because of that test, you see? I mean, because if it can do intelligent things, but I can still tell it's a machine because of how it talks.
And this is actually my second point that I started thinking about is we think about it through the natural language, right? I either speak to the machine, right? And my question was, why? Yeah, sure, humans do use natural language, but what about other types of communication? What if I tell you to draw pictures? And I have two test subjects, they both draw a picture. Why do we use the language? It's like these things feel like selected without a rigorous process.
We just said, okay, that's how humans do it. Let's do that.
I think Turing Test Test doesn't test if it's intelligent. Test how good you can mimic a human.
But not all human attributes are due to intelligence.
[00:10:15] Speaker B: Yeah, that's true. You know, yesterday I gave a talk before the Waco Chamber of Commerce, and I was joined by the CEO of a company called Worlds, and they do AI, where they do their learning from images.
I guess that most of the fuel for training large language models has been exhausted. And so this guy says, you know, there's all sorts of information in the world if you just know how to extract it from photos and.
And things of that sort. And one of the things he did, George, which I found fascinating, and I'm going to start to do it. I usually, when I drive or commute. I listen to podcasts. He said he doesn't do that anymore. What he does is he puts on grok and has a conversation with it, and he asks different topical questions, and they go back and forth. And he says he learns a lot. He's become an expert in a lot of areas. So I want to start doing that. I thought that that's a good substitution for podcasts.
Yeah. So that's what I want to do in the future. You know, one of the big things today is this idea of artificial general intelligence, AGI.
And I think that the definition varies from place to place. Do you have a definition of AGI, Artificial general intelligence that you go by?
[00:11:26] Speaker A: Right. So that's one of the things. The other things I didn't like in this, let's say, research area, let's call it, is that everybody comes with their own definition, and then we don't really know because everybody has their own definition. How do we look for it? Right.
So first of all, I like to drop the artificial, right? I don't care how it's generated.
[00:11:47] Speaker B: Oh, that's interesting.
[00:11:49] Speaker A: Okay, so if it's general intelligence, even general intelligence feels sometimes redundant. But let's stick with it. Is it truly intelligent? That's what we really want to know, right? And let's say we call that the human level of intelligence, but whatever that means. So if it's general intelligence or artificial general intelligence, if they reach the same level, let's say it's the same thing. So we don't need two terms. So it's general intelligence. Right.
So this is my. The first thing that I notice that it doesn't matter how it's generated, maybe from an alien species, from a machine, maybe from a human, is that level. So now the important thing is to define that level.
So let's call it GI for short, general intelligence.
And that's the first thing that I tried to do, try to pinpoint it. And in order to pinpoint it, I actually looked at the human race as a system.
So we can look at the human race as a system and think about what does it do? What is this thing that it does that we actually find fascinating?
We can look, for example, compare this system with other systems. Let's say another animal, like a dog, right? And we can see how the dog, from its.
Let's say from the time it started its life on the Earth up to this point, how it lives. And we can look at humans and how they live, and we can see that humans have a unique thing at least from the data we can get now, we see that humans started living, let's say from caves, right?
Putting up fire.
Now we live in skyscrapers, we travel to the moon. So what we see is that there's an increase in knowledge.
There's an increase in knowledge, there's increasing information.
And what I call there is that increasing functionality. Functionality, meaning the things we can do.
And I think that's exactly what actual intelligence, what actual general intelligence is, is this ability to look at your environment, get the information, but that's not enough, then extract knowledge out of the information. What I mean by knowledge, if you really know, if you really understand some information, it means you can apply it so you can do some functionality.
An easy example that when I talk about this I like giving is the idea of the anecdote we have with Newton and the apple falling from the tree.
Everybody sees the apple falling. That's not new information for anybody. We know that if you drop something, it's going to fall and you don't discover that. What Newton did is looked at this information and tried to get knowledge out of it.
Right.
[00:14:41] Speaker B: I like that example of the apple fall.
[00:14:44] Speaker A: Like, yeah, I think it's easy to communicate it and that's why I use it. And what he said is that objects attract each other. Right. Even if we know that today, that's not really true because of Einstein, he did that. And then from that, it's not just he did our observation, but he got some new knowledge. What does it mean? It could apply that knowledge and we can apply it to better describe how the planets move.
So there's some functionality that comes from this understanding.
[00:15:14] Speaker B: Let me, well, let me first of all chime in. I think one of the definitions of artificial general intelligence is having all of the intelligence of all the libraries in the world available to you at, you know, at the touch or the query of one of these large language models.
The question is, is whether general intelligence generates what I would call creativity.
Newton's observation of the, it's a myth. But the myth of him looking at the apple falling from the tree and, and being able to be creative and actually extract from that, I think is something that AGI probably will never do. But I think that that's a good test for AGI above and beyond just having access to all of the world's library. Now you mentioned the idea of functional information and non functional information.
And could you elaborate on that and the difference between the two? You kind of touched on it. But if you would drill a little bit deeper that would be good, right?
[00:16:16] Speaker A: Right, right, right. So I think that's exactly like if we go back to this example, right? The information that thinks if I drop something, it's gonna drop on the ground, right? If I hold something high, I leave it, it drops on the ground. That information exists, right?
But what it actually means, not everybody knows. So this information is non functional, as in I cannot get some functionality out of it. I don't know what to do with this information.
But then you can think it's like a chemical reaction where a truly intelligence, an AGI, general intelligence entity, can look at that information and as a byproduct, it moves this non functional information and transforms it to functional information.
And the byproduct of that is new knowledge, new functionality.
So this is what you can think of. AGI is a system that takes a non functional information, some information that I don't know what to do with it, and outputs factual information, meaning information that now I know what it means. I know that the reason things drop is because objects attract each other. And now out of this process, the byproduct is new knowledge, new functionality. In this case it's the functionality of able to predict how things move in space, for example.
[00:17:36] Speaker B: So I guess I would say for the non functional information that this requires creativity. That in order for Newton to extrapolate the idea of Newtonian laws of physics from an apple falling from a tree required a lot of creativity. Newton famously said that what he did was because he stood on the, on the shoulders of giants.
And that standing on the shoulder of giants is like all of the corpus of material in the world. And I think he had a library of something like 2,000 books. And that was his. Those were the giants. He was probably referring to people like Galileo and people that preceded him. But the idea of creativity was not contained in those books. But it was actually the top of the mountain. He got to the top of the mountain and then he added on to it, he came up with this idea of creativity. And I think that that's what you mean by the non functional information. I think that does require creativity. Would you agree?
[00:18:37] Speaker A: I agree, yeah. I think it's a good definition for creativity, right? Because even terms like that, creativity, even the term information, a lot of times they have some ambiguity inside of them, like what do we mean by creativity? So that's why what I like to do in this paper, I try to define those things.
But rather than reusing terms like let's say creativity, AGI, new knowledge, all of them, in the end I try to find, let's say, description that fits all these things. And I think that's what I try to do with this non functional information and functional meaning the same information, the same data may be non functional for somebody because it cannot interpret them, doesn't know what to do with them. At the same time, for somebody else might be a functional information because they know how to use it. And the idea is that, for example, this comes back to the question, how do you detect it?
[00:19:32] Speaker B: Yes.
[00:19:32] Speaker A: Let's say we have two humans, right? One has a hammer and knows how to use it and one has a screwdriver and knows how to use it, right? And they can teach each other, right? So when they teach each other, what they do is that they exchange functional information. They say, look at this tool, this is how you use it, right? So they exchange functional information.
But then, okay, now both of them know how to use both the hammer and the screwdriver. Is that it?
So after all, humans exchange all their ideas, Is that it is information done. Can we learn anything new?
And that's a system. For example, if that was the case, that would be a system that is not really intelligent, is not really general intelligence.
Truly general intelligent system would be able to, exactly what you said, reach the mountain and then add something to the top. Look at the information, say, can I extract even more knowledge, even more functionality out of it?
[00:20:29] Speaker B: I think that's good. I think it might be difficult.
You know, I'm a proponent of the Lovelace test and the idea of whether it's new or not is sometimes very difficult. Because what if it comes up with an idea and this large language model or this transformer method, whatever you want to call it, has, has digested all of the pros in the world.
And maybe it comes up with an answer that is somewhere that it learned, but you don't know about it. And you look at that and say, oh, that's creative, that's kind of new. I think it would be. It might be difficult to, to determine. I think that this is a problem of differentiating non functional information and creativity even in the Lovelace test. What do you think about that?
[00:21:11] Speaker A: I agree, I agree. And that's why I call it the general intelligence threshold.
[00:21:14] Speaker B: Okay. Yeah. So define that. I mean, that's the title of your paper. Define what the general intelligence threshold is.
[00:21:22] Speaker A: So. Exactly. So I don't think intelligence means really inventing something new per se, because as you said, it might be out there somewhere.
But if you can prove that, for example, you have a system that maybe you taught it a specific, let's say, class of functions. So you have a group of functions, a group of knowledge, and maybe you also give it enough information.
So you tell it, for example, objects fall from the trees, and then you expect it to come with a new knowledge, with a new functionality. So that's exactly why this is a threshold, because this is actually an important part.
Typically when people try to measure intelligence, they have some kind of, let's call it, they have some, some kind of levels, right? And they say, okay, where are you in this, in this bar from 0 to 100, let's say, which is very hard to define. Where is, where is the threshold, right? Where for me, rather than trying to give you a score, I just have this threshold that is like, can you pass it or not? And this is the main difference. I don't try to give you a score. I'm trying to say, can you produce this thing? Can you showcase that you can do this? That means functionality, that means general intelligence.
What I ask for a system is if I give you specific amount of knowledge, specific amount of functionality, and I give you enough information, what I call non functional information. So information that you might not know how to use yet, but there's enough information there for you to extract new knowledge out of it, right? So this is exactly the example. Imagine Newton having all the library of physics behind him, right? So he has whatever he needs, all the information, all the experiments he asked for is there, and he has the current knowledge.
Can he produce something? So let's say we have a system where we train it with what Newton knew at that time.
Can he produce the new knowledge? Even if this knowledge is known now, it doesn't matter if he can show that without looking, right? Without looking at today's library, by just giving him the library of that time, can he produce new information, right? So this is the difference doesn't have to be truly new. The system has to show that can produce new knowledge. And an example would be, let's say, assume we find an alien civilization, right?
And maybe because typically people think about aliens, they're more progressed than us, let's assume the opposite. They're less progressed than us. They're still using, I don't know, tools like living in caves and whatever, right?
How could we tell if they are intelligent, truly intelligent, general intelligent? If we can see this progression, can we see that there's new functionality? Before that they didn't have fire. Now they use fire.
Now they also went from using stone to iron. If they can have this progression of new functionalities that shows general intelligence.
[00:24:42] Speaker B: Okay, this is interesting. There was a movie, I forget the name of it though, where we were looking at extraterrestrial sort of life. And the first thing that they did from a distant planet was kind of elucidate the prime numbers and they talked about a number of the prime numbers and all of a sudden we said, whatever is generating this must be intelligence. So that was the first step.
Let me give you some examples of what I think would be good non functional and I think this would be indisputable. And it kind of answers my question of whether or not this is hidden somewhere in the training data of the transformer.
Recently, very recently, there was this 17 year old homeschooled girl named Hannah Cairo and she proved something called the Mitsuhadi Taguchi conjecture. Actually she disproved it. But there are a lot of problems like this that are open in mathematics and you can be assured that nowhere has anybody solve this. And then a few years ago, 2003, Gregory Perlman solved something called the Poincier conjecture. And this is I think a 40 year old, 40 year old problem, not 40 year old. I think the Mitsuhadi Taguchi problem was 40 years old, but the Poincier was hundreds of year old. And then Andrew Wiles in 1994 proved Fermat's Last Theorem. This is over 350 years old and nobody has ever checked it out.
So there's all of these unproved open problems in math, including the twin prime conjecture, the Collatz conjecture, the Riemann hypothesis, and any of these which they are solved by AI would I think be indisputable proof that AI was creative. And of course what you would do is like you said, you would give it all of the knowledge of mathematics. And if you go into like ChatGPT and I'm sure with the other large language models, they can do mathematics now, they can solve differential equations and Laplace transforms and do stochastic process analysis. It's just, it's really scary. So anyway, that's my proposal of something which is indisputable.
[00:27:00] Speaker A: So yeah, I agree, I agree with that view.
It's always like, because they keep learning, there's always a question mark.
Did somebody else had this proof first?
You found it on the Internet, right? So yeah, and that's, that's why it's like as a test, if you really want to test it. Sometimes we think if it's, if it's the model, right, that is intelligent, the software, if it's actually the software. Then you don't have to keep training. You can, you can train it with less data and see if you can produce something we already know. The reason that makes it easier is because we know what information can. What knowledge can come from this information. For example, if I give you a book that only talks about, let's say medicine, right. And how to do a specific therapy, let's say you should be able to get that knowledge out of that book.
So by saying, okay, if the model is actually intelligent, we don't need to always keep it at the edge of today's science. We can actually go backwards, say, okay, the science at some point was here.
Can you produce what science produce after that? Because I know what to expect. I know where you are right now. And of course disconnected from the Internet. So this is actually one of my.
When I proposed The Tuning Test 2.0, this is one of the requirements, is that you provide some knowledge to the system, but then there should be no other external functionality that comes from anywhere because that's how you can really tell. Okay. Nobody else gave this information to the system.
[00:28:41] Speaker B: Great. You talk in your paper about Searle's Chinese Room, which I think is a smackdown of the ability for computers to understand what they're doing. I think that the general idea, but you use it in kind of a different way. Could you explain the Searle's Chinese Room and how you use it in your argument about your theory?
[00:29:03] Speaker A: Yeah, actually this is a funny story because it kind of all came together. I was like on YouTube listening to random videos and I came across John Lenox and he was describing, he was describing the argument and it was the first time I was hearing and I was like, wow, this is mind boggling.
[00:29:24] Speaker B: Yeah. John Lennox, for those who don't know, was a professor of mathematics at, I think it was Cambridge, wasn't it? Cambridge University. And he since retired and right now he's kind of a Christian apologist, but he delves into artificial intelligence quite a bit. So anyway, continue.
[00:29:42] Speaker A: Yeah. So for people that might not be familiar, right.
The Chinese argument is that I have a person that doesn't know how to speak Chinese. You close him into a room where he has no access to the real world and you give him some specific instructions that explain him how to manipulate Chinese characters in order to form responses. And you have people outside that write Chinese characters. They slip a note under the door.
The guy inside the room gets the note. It follows the algorithm writes a response back without knowing what he's actually Writing slips the response back. And for the people outside the room, it looks like they can communicate, Right? And John Lennox was talking about it, and he was saying why this is not truly aware of what he's writing, doesn't know what he's writing, Right? And then I heard the counter argument. So again, listening to podcasts and stuff, I heard the counter argument.
That's why I'm saying everything feels like it clicked together. Because it's like if maybe the YouTube algorithm, I don't know, was showing me what I needed to hear, and the counter argument was like, wait a second. Yeah, maybe the person inside the room doesn't know, but the whole system, the person in the room, plus the algorithmic instructions of how to manipulate the characters, plus input output, maybe that's what makes it understand the language.
And that put me in thoughts. I was like, wait a second, is that true? Kind of sounds true, but it doesn't satisfy me intellectually.
So I was like, okay, what if in the instructions, so what if somebody slips him a node and in the node explains him how to escape there?
Would he be able to act upon this information?
That's where the idea came from. Can I actually extract new functionality out of that information?
So the guy actually in the room might be talking to you and seems like it understands Chinese, but he cannot act in a way that knows knowledge of the subject, because if it tells him, for example, oh, the password in order to unlock the store is this thing, just put it and get out to free yourself. It won't be able to act on it just because it doesn't really understand Chinese. So that's where the paper came from, actually, through this observation that it's one thing to have information, it's another thing for the information to have knowledge of what the information means. So information can be non functional.
I have it. I have the Chinese document in front of me, but it's non functional because I cannot understand becomes functional as long as I can understand it and get knowledge out of it. And now I know how to use it.
In this case, by opening the door, right? And the guy goes out of the room and is free. And this is what inspired me to make this observation.
[00:32:45] Speaker B: Okay, okay, let me ask you the final question, at least for this podcast, is do you believe Your Turing Test 2.0 has been passed?
And what do you think of the chances of it being passed if your answer to the first question is no?
[00:33:00] Speaker A: Yeah, so I don't think so. So part of the paper wasn't, I didn't let's say thoroughly tested. It was more like here is a test. Right. But as much as I tested it, it didn't seem like today's AI can pass it. Right.
And it's important, it's an important caveat is that a system, in order to pass it doesn't have to constantly show that it can do that. Right. It can just show once. Like humans for example, they can show once.
Maybe in one aspect. Maybe I'm good at math, but not good at physics.
[00:33:36] Speaker B: You know something, let me interrupt your flow and then I want you to go back to it. But I would call this a flash of genius. That's what Roger Penrose called a flash of genius. And it used to be in the United States in order to get a US patent that you had to have a flash of genius. They no longer have that request requirement. But I think the flashes of genius are what you're talking about in terms of the non functional information.
[00:33:57] Speaker A: Exactly. Yeah. I think that's a very nice term. And you don't have to always show it. You have to show it once. And that's enough to show that you're general intelligent. And again, all the tests I did and all the, as much as I could look at it, it does not look like. Right. They pass any type of test that is falls under the dream test 2.0.
And will it ever pass it?
I don't see a path to it, let's say. Right. So currently I don't see a path. How would that happen? Yeah.
[00:34:31] Speaker B: Could you outline the three rules that define whether a test qualifies under the Turing Test 2.0?
[00:34:38] Speaker A: Right. So in the paper I defined three rules for a test to be a valid Turing Test 2.0. It's like a framework that can help you.
You can use it to generate such tests.
[00:34:50] Speaker B: Right.
[00:34:51] Speaker A: So there's three rules. The first rule is that a system will have. We will transfer to the system through training or through hard coding some amount of functional information. And from what we talked about in the previous episode, this a group of functionalities. So things that we expect the system to be able to do.
The second thing is that we have to give to the system information that maybe is non functional.
What do we mean by that? That the system doesn't know yet how to use it.
[00:35:27] Speaker B: This is what we call before the flash of genius, it has to come up with something that hasn't been trained on.
[00:35:34] Speaker A: Yeah. So some information that is not functional for the system. We, we have to know that, that the system doesn't know how to extract the functionality, has not yet extracted the functionality from that information. So we have two sets. First rule, a set of functional information, it has some functionality. Second rule, a set of non functional information.
And then the third rule is that no other external training, external functionality has been ported to the system.
So after that we can, let's say, try to interact with the system and try to see if it can come with new functionality based on the information we gave it. So to use the example we used in the previous episode about the anecdote about, okay, we have the apple falling from the tree. You can extract Newton's theory out of it.
So the idea will be I give you enough information that we know it's enough to extract the new theory, the new functionality.
Can you showcase that? You can do that. And it's not just showcase randomly that maybe you spit out some random thing that happened to be true. But after that point, you have to show that you can always reuse this knowledge. So an example that it's easy to understand is that you might know how to, you might have, let's say, an array, and you might know how to find an element in an array. And then you ask the question, what if the array is sorted and the system has to come with a binary search where it's a more efficient way to search an array if you know that it's sorted to find a specific element if it's inside the array, and then if I ask it later, the next day or a year after, it has to know to reuse that knowledge, that knowledge is not lost, but it's able to reuse and say, oh yeah, if the array is sorted, I can do it more efficiently. I learned that I can always apply this functionality from now on. So it's not a functionality that will showcase one randomly, but you have to consistently be able to show that, yes, I learned this new functionality and now I can use it.
[00:37:48] Speaker B: So again, I kind of equate the idea of non functional information with a flash of genius. And I think that that's how we can tell that human beings are creative, because we've all had flashes of genius.
We get a solution which pops into our head and we're not sure where it came from. And history is replete with people that have had flashes of genius. I think one was Tesla for the brushless motor. He said, he's walking along the beach and he said, like a flash of lightning it came to him and he brushed off some stuff and brushed off some dirt and actually drew a schematic in the dirt, the same one that he used when he published the paper. Friedrich Gall said he woke up in the morning and said he had the solution to this problem he had been working on. And he said he had no idea where it came from. And it also comes from the arts. People have flashes of genius, for example, of writing music.
You hear people, for example, saying, I don't know where I came from.
I'm too afraid to examine it because I'm afraid it might go away.
So we all have these, these flashes of genius, this non functional information which humans have access to. And I think it is a key marker of intelligence. And you mentioned last time, which I think was very interesting and true, that you see no path to achieving this goal. And in fact the attempts that have been made, you know, AI writing better AI writing better AI writing better AI I think it's been debunked and I don't think it's conclusive, but I think there's just lots of evidence. There's something called model collapse, wherein that if you use AI to write better AI pretty soon you're not going to get new information and it kind of becomes a blubbering idiot. But that, that's, that's the source of another, another talk. Sometimes in your experiments you gave a bunch of examples. Why did you choose the task like drawing a clock at. I think it was 6:30 or generating a hexagonal stop sign.
[00:39:38] Speaker A: Yeah.
[00:39:39] Speaker B: By the way, when I took my driver's test, I missed that on thing.
I missed that on the test. It says how many signs do a stop sign have? And I missed that.
It was, it was, it was back when I was 16 and I missed it. It was the only one that I missed on the thing. I, I thought, well, you know, it's a regular polygon, but I'm not sure. So I guessed wrong. So anyway, tell me about the 630 and the hexagonal stop sign and what does it reveal about large language models?
[00:40:08] Speaker A: Right, right, right. So this all again came kinda as almost a coincidence. Things were. I just. In the previous episode, we talked about the Chinese room argument.
[00:40:18] Speaker B: Yes.
[00:40:19] Speaker A: And how it shows that the person inside does not have real knowledge of what he's reading.
And this came kinda almost like an accident. I was watching the videos explaining what is the Chinese room argument, counter arguments. Then I see a random video where there's a person talking about a problem AI has so that it cannot draw accurately the time and the interpretation. Why is because the images on the Internet, which typically is what AI uses as Training data, mass data from the Internet, tend to have a specific picture of clocks or watches. Right. That point to the. What we call the 10 2. So the.
The clock hands typically are in the 102 position.
So. And the reason they do that is for cosmetic reasons. They assume that. No, they assume. They. It's considered that this is the best image if you want to showcase a clock or a watch.
[00:41:23] Speaker B: Oh, that's right. You go into a clock store, and if the clocks aren't running there, they all have their hands at the same place.
[00:41:29] Speaker A: Yeah. And they were. They were talking about in. In a. You know, in a funny way. Oh, this is funny. All the images are like that. AI cannot produce other images easily. Oh. And I was like, wait, wait. Wait a second. That's interesting. So they were actually being very, let's say, very casual about it. But I was like, this is very interesting why you cannot. It's something very simple.
So then, looking at what we already discussed, I said, wait a second. Does AI, though, know how a clock works?
Because if you know how a clock works, that's enough information to draw any picture.
So then I went back to the AI and I followed the Turing Test rules. The Turing Test 2.0 rules. Right. And I said, okay, can you draw images? Yes, you can draw images. So this is the functionality it already has.
Can you draw images of a clock? It can. It gives you very nice images of clocks. It already has this functionality. This is the number one, my first rule for a Turing Test 2.0.
So then I looked at non functional information. Meaning is that information that it's enough if you know that if you really understand it, to draw any clock.
[00:42:42] Speaker B: Oh, that's a great example. Yes.
[00:42:44] Speaker A: So I asked the AI, how does the clock work? They give a perfect description of how the clock works with the minutes, seconds, everything. And even you can ask it, can you tell me where the hands of the clock are gonna be? If it's 6, 30, and they will describe it in detail, accurately, they'll tell you, oh, the arrow should point at six, a little bit past six. Actually, they're very accurate. A little past six. And the minute hand will point at six because it's six times five, is there?
So I was like, all right, you have all the information you need. You pretty much gave me the image, but in a text form.
Then I will ask you to generate that image that I described. And it fails.
It gives you random 10, 10 or something close by.
And that shows that it doesn't have understanding, that you cannot extract the meaning of the things it's giving you.
So if it doesn't have a meaning, how can it get functionality out of it?
Right. And this is the third rule. Now this is where the third rule can break. The third rule says of the Turing Test 2.0 says there should be no external, other external functionality come from. Right. So one can think that this problem is kind of simple actually to fix. We just have to balance our data set, the training set, give you more images of different times, label them correctly, train the model and fix this problem. The problem is easy to fix. Yes, but then you're not testing for intelligence anymore because you broke the third rule. You gave it external information. So that's why it's a good test, is because of how the nature of the data that are used for these large models, they go to the Internet and get what is out there pretty much.
[00:44:39] Speaker B: I like this very much.
Here's an idea that popped into my mind.
If all the literature in the world said that the world was flat, but it also knew things like, well, the North Star changes as you travel and that when ships disappear in, in the distance, that they disappear from the bottom to the top. And it could take that information and say, you know, this idea of a flat earth, it doesn't line up with these sort of things. So it's that sort of correlation that will generate this non functional information.
So tell me about the hexagonal stop sign test that you did.
[00:45:14] Speaker A: Right. So after I noticed the behavior with the 630, right, with the cloak, I wanted to make sure first that this is not just something random I stumbled upon, but I can actually reproduce it with other images.
What was unique about the 670? That we have an image that is very, let's say, dominant in the data set.
So I asked myself what other images we have that are dominant like that? And I started thinking about cases. Actually, the paper, I give other examples too, that I don't. Didn't really present in the paper. But I'm like, you can try them out if you want. And one of them that came to my mind was like the stop sign.
So if you look at a stop sign, it has a very specific image, right? It has an octagon and it has a very specific, like you have a red sign, white letters.
So I'm like, okay, the AI has seen that image many, many times, hasn't seen some specific shapes. For example, what if it's a hexagonal? What if I wanted to do the same exact thing, but this time in a hexagonal Shape.
And I noticed that AI had big difficulty doing that when I tested different models. ChatGPT if you insist a lot, it was able to produce an image.
But then if you reset your chart, it would fail again. And you have to try hard to make it produce the same image again, the correct image again. So it's what I said before is that if you really want to pass the Tuning Test 2.0, you have to see that if you extract some knowledge, that knowledge doesn't go away, but you are able to apply it again.
And there are other things like that that you can kind of play with. Like for example, a triangular flat screen, right?
[00:47:18] Speaker B: Oh, yes.
[00:47:19] Speaker A: Because flat screens typically have a specific shape or a driver license, not driving license, a car license plate in weird shapes. So sometimes depending. Because, for example, you can think that, oh, there are different signs and different shapes.
AI is very good at, let's say, correlation, getting one thing, applying the other thing on top of it. So there is a niche there that you can start looking at, looking at these images that are very, very unique.
[00:47:53] Speaker B: This is very interesting because as a watcher of sitcoms, it was either on the Office or Arrested Development where they came up with this. I think it was the Office, they came up with this idea for a triangular screen.
So I'm wondering if that exists. But if it did exist, certainly the large language models didn't catch up on that. You know, my experience, George, is that a lot of these large language models go and they put band aids on things and stuff that didn't work. You know, last year kind of worked this year, a classic example. I was informed and I checked it out and it was true and it was Tom's. Tom's mother has three kids. Snap, crackle. And now if you're. If you know about Rice Krispies, their logo is Snap, Crackle and pop.
So, yeah, pop was the last one. Thomas mother had three kids. Snap, crackle. And well, of course it was Tom's mother.
So it was Tom. It should have been Snap, Crackle and Tom. So this was something that was answered incorrectly a long time ago, but it was corrected. Somebody went in and I don't think it was. It was straining. I think it was a human being that went in and corrected the that. I suspect that if they do correct that if they do identify the problems that you're talking about, people at OpenAI or Grok are going to come in and they're going to put a band aid on this and fix it, but it's going to be a human intervention. It isn't an epiphany of the AI, is it?
[00:49:20] Speaker A: Exactly. This breaks actually the third rule of the tuning test 2.0. Because what will happen is that external sources, in this case humans, will come and give this functionality to AI. And that's why it's important to keep, to preserve the third rule. That's why these tests, yeah, they're nice, but very likely they're going to go away as we add more functionality on AI systems.
[00:49:45] Speaker B: Yeah. And I think a lot of that is being done by human beings behind the scene.
[00:49:49] Speaker A: Exactly.
[00:49:49] Speaker B: They go in and they kind of, they tune all the errors. In fact, if you go to some of these large language models and they make a mistake, you can see you made a mistake, blah, blah, blah. And they will come back and they'll go, oh, you're exactly right and they'll fix it. And the next time you won't see that same mistake again.
So it is apparently self correcting. I hope that it verifies it before it corrects, but it is self verifying. And also I think that humans are that way. One of the things that I tried a long time ago is to tell ChatGPT not to do something. I said draw a picture of Times Square with no pink elephants or no pink rhinoceroses, I think it was. And I says, no pink rhinoceroses, I don't want it. Didn't know how to not do that. So gave me a picture of Times Square with a rhinoceros in it, a pink rhinoceros. And I tried it on a bunch of other things, but they have fixed that. Then I retried it and I don't know if it came to the attention of the people that do these images, but I tried it just a few months ago and I said no pink rhinoceroses. I didn't recognize that. So somebody behind the scenes is dinking with this stuff, making it more and more accessible.
So that's interesting.
So let me ask you about this. You talk about.
I think it's Cholle. C H O L L E T. How would that be pronounced? Do you know Chollet's definition of intelligence? Francois Cholet.
[00:51:10] Speaker A: Oh yeah. I don't know how to pronounce his name.
[00:51:13] Speaker B: Okay, let's go with, let's go with American.
[00:51:17] Speaker A: Yeah, but Collet.
[00:51:19] Speaker B: Collet, okay.
[00:51:20] Speaker A: I don't know.
[00:51:21] Speaker B: Yeah, I'm in Texas. We have this road here called P O G U E which of course is French and it should be posh, but everybody calls it Pogue. It's Pogue Road. And there's another one, B, O, S, Q E, which should be Basque, I believe, but everybody calls it Bosque.
[00:51:38] Speaker A: The best person to judge about accent.
[00:51:41] Speaker B: Okay, so we'll call it Colet's definition. And we apologize to Francois. I got his first name, Francois Collet, about this. He was a employee of Google when he published this, and he proposed a definition of intelligence. And you talk about that in your paper. Could you unpack this? I found. This is interesting, and I confess I had not heard of this definition of intelligence. I kind of like it.
[00:52:09] Speaker A: Yeah. Yeah. Yeah. Actually, when I first started digging around, I came across this paper, and I found it very interesting. It actually has a whole benchmark. So it's more detailed, as in it's a whole benchmark that you can actually apply to a model and see how good it does. So the difference there is that Kudel comes with a specific test, which a lot of people do, but his test is a little bit different than other researchers. And he says, I have this type of tests that are kind of unique. Meaning it's kind of like playing a game that you haven't really seen before.
[00:52:44] Speaker B: Okay. Yeah.
[00:52:45] Speaker A: And these games, if you. You can actually. If somebody Googles Colette's work, they can go to the website. They can even play the games themselves.
So it pops up on your web and you can actually do the test yourself to test yourself.
And it's actually pretty much what I would call IQ tests. If anybody's familiar with IQ tests, they will probably get an idea of what I mean, where you have some patterns, and then you have to predict the next one. So it gives you. Rather than being random numbers or whatever, let's say it's adjusted for computers. So you have pretty much, let's say, a grid with each grid has some squares inside, and the squares have colors and positions in the grid. And it gives you, let's say, three different variations. Where the squares in the grid, from one variation to another, they change either in colors or in position. And what you have to do is detect this pattern and predict the fourth grid. So that's why I'm saying it's like, if anybody ever solved the IQ test, it's very similar. It's like IQ test adjusted for computers. That's the description I would give.
And it's very nice because you test a computer, if you can cut a result in a way, that's how it feels like.
And a lot of other people did other tests, the other tests that. To be honest, why I like Colettes is that because Other tests might be something like, for example, people propose. Can you get inside a typical house and find the coffee machine and make coffee?
Okay, good. But why is that intelligent?
[00:54:24] Speaker B: Right.
[00:54:24] Speaker A: Coalesce has a little bit more close to intelligence because you're trying to reach reason and find something that needs you to what we call think. Right.
What I don't like about this, though, it's the major thing that I think is common almost to all pure work in this field is that we compare in order to figure out if it's clever enough, if it reaches the general intelligence, we compare it to human results.
And I'm like, okay, again, which human, the average human, should we get the smartest?
[00:55:00] Speaker B: A guy that goes. A kindergarten kid?
[00:55:02] Speaker A: Yeah, exactly. And even a human that is not smart in this way must be smart in another way. I've seen it through my friends and family that somebody might be not as good in math, maybe very good at physics, maybe not good at physics, very good at painting, maybe not painting, maybe some sports. And you can see the intelligence there. You can see how they innovate.
So that's why I don't like this very specific, very tailored tests because, okay, it's a specific type of intelligence. And the other thing that. Because I've taken tests like that before, IQ tests for different exams and stuff.
If you take the test the first time, you're probably not going to do so well.
The second time you're going to do better, the third time even better. Am I becoming smarter?
[00:55:51] Speaker B: Sure.
[00:55:52] Speaker A: Or am I just training and learning the patterns?
You see what I mean? So if I'm learning the patterns, then sure, AI can do that. We know you can do that. But that's not what we're seeking to find. If you can learn patterns, yeah, AI as the more you're training with this type of test, it will do better, eventually probably better than humans.
But yeah, I recognize patterns. No real intelligence. Because if I test every time you take it, you can do it a little bit better.
Then it's just training.
This is my biggest, let's say, objection to this type. It's a very nice test.
But like the Turing Test, like the original Turing Test, I think it doesn't really measure until.
And that's my biggest, let's say, critique.
Is that why we test machines on things we are good at?
Right. And not the machine? For example, there was a time where the machines were not good at doing high precision math. They could only do a small precision. We actually used to use humans for high precision. Right. Then they became Better than us.
[00:57:06] Speaker B: And they were known as computers, I think.
[00:57:08] Speaker A: Right, exactly. Yeah. So why don't we use that test? Computer is very good at it. Right. What if I wanted the same action repeated precisely over and over again?
Humans are terrible at it. Machines are great at it.
That's exactly how I came with the Turing Test 2.0. It's like, in order to be able to get with a test, I first have to define, what does it mean to be intelligent? What does it need to be general?
What is this thing we're looking for? And that's how we. As we discussed in the previous episode, I came with this idea of what we call creativity. And I define it as being able to extract new knowledge out of existing information, information you already have. So it's not like you need to go get other new information that you don't have. You already. I already gave you that information. Can you extract new knowledge? What has been knowledge? So it means, can you apply what you learned? Did you get new functionality out of it? And this is actually how good teachers test students, like in an exam. What is a good exam is to show that what they learned, they can apply it in a different environment.
You can see that, okay, you learn, let's say, the Fourier transform. Okay, here is a problem. I'm not going to tell you that the Fourier transform is needed, but you have to think about it like, oh, here, this is a good solution. Can I apply it? If you applied it correctly, it means you have knowledge of it.
[00:58:30] Speaker B: Right?
[00:58:31] Speaker A: So this is what it means to extract new knowledge out of it. And I like that because you can see actually how some teachers teach. Some teachers, when they teach, they don't tell you the solution. They try to guide the students to the solution, and they will tell you, think about this. So, for example, we talked about, example, the binary search that works in a sorted array and the teacher, what kind of search Again, the binary search. When you try to search binary search an element in a sorted array and you can do it faster than just simply checking each element individually.
Maybe the students don't know about the binary search and the teachers tries to teach them and says, what about if the array is sorted and the students don't know what to do? And then tell them, what if I pick a random element in the array?
What can I learn from them? Then the students will be like, oh, a random element.
Oh, because it's sorted, what I'm looking for, it would either be on the right or the left of this element. And then they can kind of Lead them right.
So they can extract themselves. And that's what I'm trying to find as intelligence, being able to. If I give you just enough information, can you extract the knowledge new functionality, then you can consistently apply this new functionality. From that point on, you can apply it to similar problems.
[00:59:55] Speaker B: I like that. That's very much.
Yeah. Kolei, I thought was very intriguing. I think in the early days of AI, one of the examples would be if you specifically taught an AI to play, for example, checkers on an 8x8 checkerboard. And this was pointed out, I think first by Gary Smith at Pomona. And unless you program that generality into the checkers playing thing, if you gave it Instead of an 8 by 8 checkerboard, you gave it a 6 by 6 checkerboard, it would have no idea how to generalize to it.
So that's an example also.
Okay, very interesting. By the way, you mentioned Fourier. So I now feel justified in referencing my book Handbook of Fourier Analysis and its Applications in the podcast Notes for those who want to learn about Fourier transforms, you never hesitate to advertise your books. So I will go ahead and do that. I would point out that George has mentioned that his test has not been achieved. He doesn't see a path to it. I'm a proponent of the Lovelace test. That hasn't been demonstrated. And I think for the Colet that we just discussed, that hasn't been proven yet. In fact, I read that, and I don't know if this is, this is true or not, that Kohli deliberately designed his theory so that current AI approaches like deep learning and reinforcement learning would fail.
And so that's kind of interesting that all of these methods for measuring intelligence are not yet here. But if you believe in Ray Kurzweil, the singularity is right around the corner. And it's been right around the Corner for over 20 years. So we'll see what happens.
I have been a big proponent of the Lovelace test proposed by Selmer Brinkshort of Rensselaer Polytechnic, who says that computers will be creative if and when their output is beyond the explanation or intent of the original programmer. Now this is becoming more and more difficult because you might get an output for your large language models that you didn't understand.
And you say, wow, this is creative and was not in my intent. But then in order to actually claim that, you would have to go back and look at all the corpus of material that was used to train that large language models and make sure that it was not in that corpus. So We've talked about George's model on AI and intelligence. And do you want to contrast that, George, with the Lovelace test?
[01:02:26] Speaker A: Yeah. So one thing we talked in the previous episodes. Right. Was this what you refer to as strike of genius, right?
[01:02:35] Speaker B: Yes.
[01:02:36] Speaker A: And lablist tests come very close to the same definition as the one I give, but I think there's some fundamental differences.
And the first one, I think we have to give the chance to AI to pass a test. So although very similar definitions, what I don't like about the Loveliest Test, or what I would disagree with, let's say, is that. Let's say I do design an AI that is truly intelligence. Right. So that's my.
Or that's at least what I claim, then I cannot be surprised by its output. So it's kind of a contradictive term, as in, how can I be surprised if that's my intent?
[01:03:19] Speaker B: Well, I would maintain, George, that surprise is not the same thing of creativity. I've written a lot of programs and I'm kind of surprised at the output. But I can usually go back and explain why that surprise happened. It's something that I programmed it to do. I might ask it to search through a billion different possibilities for a solution, and it comes out with one and said, wow, I wouldn't expect that I'm surprising. But then I go back and look at. Look at the details and I find that it's right. So I don't know if I would equate surprise with two creativity. So go ahead. That's my two cents.
[01:03:54] Speaker A: Yeah, I agree with that. What I'm saying, though, in the labnym test, if it's truly creative, I won't be able to explain how it came to that result.
[01:04:03] Speaker B: Right?
[01:04:03] Speaker A: Right. But if I designed it like that, then I wouldn't be able to explain it anyway. So how do I. You see what I mean? So even if I intentionally design it to be creative, then if there is such a path, then I'll be able to say here, because I designed it like that. So I feel like that's where the contradiction comes.
But moreover, the problem is, what does it mean not be able to explain? So, for example, in the previous episodes, we gave the example about the anecdote of the Newton, and Newton observing the apple falling and as it reaches the ground. Right. And then from that extracts new knowledge that objects attract each other.
So how much information can I give until I say, oh, yeah, I can see how it came with this result. It looked at this information and it extracted some pattern Right, because it's the problem. You said before I have to go back and check if that information was already in the data set. So how much information am I allowed to give you? So this is not well defined in the knowledge test, where in my proposal it's defined because I define what that means.
That means new functionality. So I can check what functions a system can do, I can find one, it cannot.
And then I can check, do you have information that you can extract this new functionality out of?
So, for example, we Talked about the 630 example, the clock that points at 630 where the AI can describe you precisely in words, but it cannot generate an image because it cannot extract the knowledge from the information it actually provides to you. So describes where the hands of the clock should be, but it cannot extract the functionality. And it's not like I disagree with the value test, as in I feel like things are not well defined.
Right. So that's why I felt the urge to be like, we really have to define precisely what does it mean creativity? What does it mean general intelligence?
[01:06:17] Speaker B: Let me ask you, you use the phrase, suppose that we wrote software that was designed to be creative.
Is that possible? How would we do that? How we would design an AI to be creative? I have no idea how to do that. Is it possible?
[01:06:30] Speaker A: I agree that I don't see a path to it, but I want to leave this open, as in, I don't want to be too restrictive because then the counter argument will be your definition is such as to make it impossible. So I want to avoid this step, which sometimes people fall into. Right? We fall into because we have our own preconceived biases. For example, I don't see a path to AGI, but I want to leave an open door where, yes, if it's there, I will be able to detect it.
[01:07:03] Speaker B: You'll be able to detect it. Okay, okay, that's fair enough.
What about application, other things? We actually talked about the idea of flash of genius. And I think that in order for AI to be creative, that's indeed what we're looking for in all of these tests for intelligence. We're looking for a flash of genius.
And I think that applies to human beings. Roger Penrose in his book the Emperor's New Mind, points out a number of different places where this flash of genius has occurred. You and I, and I think anybody creativist had this flash of genius where you actually come up with a solution, and it was nothing like you thought about. And you go, oh my gosh, where did that come from? And so the flash of geniuses do indeed occur. And so we can have this. And so these different models of intelligence, I think it's important that we apply it to other things. Now you mentioned that your Turing test too can be applied to other things. Could you unpack that a little bit?
[01:07:56] Speaker A: Yeah. So theoretically at least, right?
That's why, let's say I really like that specific definition of functionality, like emerging functionality, let's say generating functionality. Because already clearly it can be applied to any system. And what I mean by that, for example, one way I came with this idea is by saying we talked about in the previous episodes, we look at humans as a system and we see how their technology evolved, right? So they advanced. We can look at other animals. And at least with the data we have today, we don't see that. We see that other animals live as they always lived. And an example would be you might have a pet dog, right? And you learn it, you teach it tricks, and it's learning the tricks, but then it won't learn anything else, right? It will still live in the same way. It won't take these tricks and try to build upon them.
A similar example is you might have seen videos where people teach the sign language to apes and try to communicate with them. Okay, very nice, very interesting. It shows some aspects of intelligence. But is it general intelligent? If I take this ape and take it back to nature, will it use this communication to build a society? Will there be an advancement? Like we will see them to use new technology, right? Do some innovative things, new functionality. That's why functionality is important.
And it's the same thing. And the same thing comes to anything else. Like for example, one thing we mentioned is that humans typically have different levels of intelligence in different aspects, like art, sports, science. And you can see that. For example, I like the example of NBA. Like if we look at NBA today, how it's played, the basketball game, right?
[01:09:44] Speaker B: Oh, NBA, okay.
[01:09:45] Speaker A: Yes, NBA. It's very different.
If you look at the 60s and what happened, right? So people started rethinking the game and saying, wait a second, maybe I don't need as much physical contact. Maybe I can shoot.
I can shoot from the three point line or even further back. And statistically, even if I miss more because it's more points, it gives me a bigger chance to win. And people arrive to new functionality, new play styles that were not there before.
We see athletes do new moves to trick their opponents.
This is new functionality. And the game involves. And you can see if you look at a game back then and now, very different. Same with music, same with, let's say, paintings. We can see new trends coming up.
And I think this is exactly the new functionality.
Sometimes people might think that we see that with AI, but what we see in AI is that it's a combination of what already exists right now. That's why I propose in The Turing Test 2.0, APRO was specific rules that you have to apply, like having some functionality, having some information that you haven't extracted functionality out of it, and then no new information coming to the system.
Can it extract new functionality out of it? So, for example, you could apply that, let's say, by saying, going to the painting world and say, what was the different types of the different methods people use to paint. Right. To paint pictures. And maybe we go back to, I don't know, the 800, and we only train the model with this type of images. Or you can go even further back, let's say the Stone Age, and you only have the pictures on the walls that the primitive humans were doing. And you say, okay, I'll only give you these images and maybe all the information that the humans had at that, at that time. Can you come with more innovative images?
[01:11:49] Speaker B: Yes.
[01:11:50] Speaker A: So this is the idea of new functionality. You can observe it everywhere. You have some type of intelligence.
[01:11:56] Speaker B: I think part of the intelligence you're talking about is the creative dimension.
I think that animals probably have a degree of creativity. You hear about the people that put a monkey in a little room and they hang bananas and they put a stick there, and the monkey figures out how to use the stick to knock the bananas down. That's probably a low level of creativity, but I think the higher level of creativity is probably limited to human beings and possibly our creator.
What blows my mind is that everything that we have now, from computers to cars and this great lifestyle that we live, is made with stuff that was here on the Earth 3,000 years ago.
[01:12:37] Speaker A: Exactly.
[01:12:37] Speaker B: And oh my gosh, you know, we have taken that, we have used our creativity to form the society that we have now. And this takes human creativity. And this human creativity is just awesome in what it can do. And now we're seeing this application to computers, computers and artificial intelligence. It's just astonishing. All of the Silicon man, it was lying around on beaches, I think. Right, exactly.
And it was all there. We just figured out how to use it. And we made cars and computers and houses and skyscrapers and boats and explosives and all sorts of things. So what's that?
[01:13:14] Speaker A: Good and bad applications Good and bad.
[01:13:16] Speaker B: Exactly.
I mean, it was there. But I do believe it does have its sort of limitations.
[01:13:23] Speaker A: I like what you said. We learned how to use it. Right. It was always there. The silicon always had the same process.
[01:13:29] Speaker B: That's right.
[01:13:30] Speaker A: We learned what that means. What is the functionality we can extract out of it.
[01:13:35] Speaker B: Yes.
[01:13:35] Speaker A: And that's the point.
[01:13:37] Speaker B: Yeah. And that goes back the simple example you gave about Newton seeing the apple fall from the tree, that old myth, and how he was able to use creativity to do great things.
I don't think computers will ever be able to do that. And at least I would agree with you. I think you state it more safely. You don't see a path to it. I definitely do not see a path to it. So let me. Let me bring up a topic that I didn't put down in the notes for you, but I'd be interested in your take on it. There was this recent article, I believe it was in Forbes, and I'll try to find it and put it in the podcast notes that said we shouldn't be studying large language models. We should be studying puppy dogs. And they said, why should we be studying puppy dogs? Well, we should be using brains.
It does turn out that it's, you know, that there's something that comes from the brain. And I believe, in fact, I edited a book called, or co edited a book called Minding the Brain, that I believe that the mind is more than the brain, that we're more than computers made out of meat.
The inability of AI to create, I think is evidence of that. It isn't a proof, but I think it's certainly evidence.
And so they were making the point that maybe in the future we could actually grow brains.
And right now they take pigs and they grow, like, pancreases and livers and stuff. I don't know the details, but I ask a guy, could you grow a human brain on a pig? And they says, yeah, we could. And I just wonder what we could do with a computer with a human brain made out of meat. Whether that would be creative or not. I don't think theologically or philosophically it would have a soul or a spirit at all. But is there something that we could tap in there? Do you have any thoughts on that? Have you ever thought about that?
[01:15:23] Speaker A: Yeah, so lots of thoughts.
Very interesting subject, I think, when. When we think it, like, I like to break it down, right? And that's one aspect again of The Turing Test 2.0 is, let's break it down. It means. What does it mean? To be intelligent, we set this new functionality, okay? There's two paths to it.
Either this thing is actually able to be done algorithmically or not. There's only two possible outcomes. So if you break it down like that. That's why I like to define it precisely, so I can break it down like that. And then it's like, if it can't be done algorithmically, then eventually we will do.
Doesn't matter of time, you see. I mean, if it's an algorithm, maybe not now, maybe 100 years from now, it doesn't matter. It will come down, right? We can even randomly just change things until we figure it out.
[01:16:22] Speaker B: Maybe.
[01:16:23] Speaker A: Okay, so it's the same thing with the brain.
If it's an algorithm, then the brain itself should be sufficient.
And if you put it in any other system, it should be sufficient. And yeah, so if it's not. So if we go to the other, then, okay, we examine this case. What if it's not?
Then I think this is fundamentally groundbreaking. If it's not, because then if algorithms cannot do it, then there's no path to it.
Sometimes we think about algorithms, and especially with AI, we think about it as a mysterious thing. What is an algorithm, right? Like AI, first of all, it's algorithms. Because sometimes people forget about that.
[01:17:08] Speaker B: Oh, in fact, in fact, if I could interject really quick, this is a point that Roger Penrose makes in his book the Emperor's New Mind. He says the computers are limited to algorithmic things. And we know, shoot, back in the 30s, Alan Turing showed the halting problem was non algorithmic. And now we know a bunch of things which are non algorithmic. There are a bunch of problems that computers will never solve.
And that's where the human mind comes in, I believe.
[01:17:31] Speaker A: Exactly. So if we think about the algorithm, what is an algorithm? Somebody Googles the term, you can find a definition.
It's not super precise, but you can find a definition. It's a finite steps. Finite steps, A process with finite steps that comes to an outcome. So you put an input that is finite of processing steps. And the finite is very important because if it's not finite, then you never get an answer right?
So if it's finite like that, as we said, then all there it is is this algorithm. And you can map it into different things, different Turing machines. The Turing machine might be a brain, it might be a modern computer. It might be anything we can define as a Turing machine. However, if it's not, then that's a big, big issue. Like, it means there's Something else.
Some people might disagree on what it is, but yeah, definitely. Then it's not the algorithm. The issue is not the algorithm.
But in order to get either or.
That's what I don't like when people talk about these things. In order to get either or, we have to actually first define what we're looking for and then prove its computability.
If we can prove for or against it, which is then you have to actually prove it mathematically. Until then, the best thing we have is testing and seeing how close you're coming to this test. If somebody passes the test, this is proof enough that it exists.
If we don't have anybody to pass the test, we are still, though, in the. We don't know. Maybe somebody passes in the future, maybe not.
Yeah. Which I think creates this interesting debate.
[01:19:16] Speaker B: Okay, back to your paper. You propose advanced versions of your tests, like the single discipline test and the generational test. How might these overcome weaknesses in simpler examples?
[01:19:29] Speaker A: Yeah. So in the last episode, we talked about the test that I show off in the paper, which is they're nice, intuitive tests, like the clock, where they cannot produce a specific time. An image of a clock showing a specific time, but only 10, two, because that's all the images they have saved.
[01:19:48] Speaker B: Yeah, yeah.
[01:19:49] Speaker A: And as you said, people can come and fix these things. We just give you a better data set.
[01:19:55] Speaker B: Right.
[01:19:55] Speaker A: We balance the data set. You can fix these things, so these things can go away.
So this is one of the problems. The other problem is sometimes it's very hard to figure it out. As you said, what was in that training data set. Right. In order to know if this is a stack of genius and new knowledge, or is it something that was already in the data set? And that's why I proposed two other tests. I don't test them, but it's out there if somebody wants to use them.
The single discipline test is what we talked about. The idea of. It was actually a podcast. I mentioned it before where Sam Altman was in a podcast called Huge Conversations.
[01:20:37] Speaker B: And interviewer asked, by the way, if you could send me the link to that, I'd be happy to put it in the podcast. Notes for people that are interested. Thank you.
[01:20:44] Speaker A: And the interviewer asked him, let's assume I take your new, I think GPT5. Now I train it only with Newton's time of physics. Very Newton. You right. And Newton had a specific theory about how gravity works that today we know is wrong.
Einstein told us Newton's theory of gravity is wrong. Here's how it's done.
Will GPT5 having that knowledge be able to gradually come to that conclusion?
Sam's answer there was pretty much a little bit avoiding the answer, but he said pretty much that sometimes what we need is more data.
But that's why I like the tune test 2.0.
I'll give you the data. I just don't tell you how to interpret it.
That's the idea. So I don't want you to go to Einstein's theory. I'll do you a smaller step. I'll give you, let's say, more relaxed version. Invent something new after YouTube. Anything. Anything that came after that point. So I trained you with Newton's time, physics, only physics. So it's easier to expect results. That's a single discipline test, only physics. And then I want you to figure out the next thing, whatever is that tiny new invention that was made. And you can. We kind of touched on a bit on the previous episode or if I'm not mistaken, where we talked about images.
[01:22:15] Speaker B: Yes.
[01:22:16] Speaker A: And you can have. I'll train you on the images of the primitive humans.
It's all, let's say, images on stone.
Right.
And then I'll. I'll see. Can you come with new type of pictures? Right.
Realism or something else. Right.
The same thing with any other aspect of intelligence. Like sports can be science can be medicine, Anything, anything. So this is the single discipline test. I'll go back in history in a specific time where I knew what was all the information, all the knowledge at the time. I give all that knowledge, all that functionality to the system plus information we had at that time.
Which, as always in for example, currently we have some specific amount of knowledge. We're in the frontier of knowledge.
[01:23:05] Speaker B: Right.
[01:23:05] Speaker A: There is some of the things we have to remember, some of the things we know today are wrong. There's some functionality. We have it wrong.
If I go back in time, I can do that. Come on. At that frontier, there's some things that we knew back then that were wrong and we corrected them. Can AI do that? Can it get us a tiny leap? And it's easier to observe because we know what were the next steps. So it's easier to see what we're looking for.
But even that test is very hard. What does it mean to have all the knowledge of that?
Sometimes it's very hard to define that. It's very hard to make sure you didn't cheat, even accidentally because you have huge amount of data.
Even accidentally, you might give data that you're not supposed to entrain your AI. With data we're not supposed to. So that's why then it comes the generational test. The generational test is the same idea as humans. We see that we have a generation of humans teaching the next generation, the younger generation, the younger Generos comes up, invents new things, teaches the next, and it keeps going. So this is evident when we look at humans and how everything, technology, music, art, sports evolved.
We invent new things.
So that is the ultimate, I would call the ultimate test.
If an AI can only using the data it can generate, generated AI data can train the next generation of AI and then eventually come with new functionality that wasn't there before, being able to solve a problem that wasn't able to solve before.
Then we're going to have passing the tuning test 2.0.
Actually the funny part, as long as I thought that I went to search it, I'm like, no way. Somebody must have thought of that. And what it was fun is that people actually used use that not for the purpose of figuring out if model is intelligent, but rather to accelerate training so that rather than me having to give you data, let AI give you the data. And what they observe is that we have generations of training.
Not only the models don't become better, they actually become worse.
[01:25:17] Speaker B: In fact, I think that this phenomenon, I don't know if you've heard the term model collapse exactly where they have tried to use AI to train better AI to train better AI, and it just, after a few generations, it becomes a blubbering idiot. So something. So I think it's evidence. I don't think it's proof, but I think it's evidence that AI doesn't have the ability to write better AI. And this goes back to the idea that it can't be creative. So.
[01:25:42] Speaker A: Exactly. That's exactly my point. And it's good to go today and nuances of this thing, because there are solutions out there. For example, people say, okay, I'll train you with generated data, but also I'm going to introduce some real data from humans in order to avoid the model collapse.
But this thing, although it solves the problem or mitigates the problem, it breaks the third rule of tuning test 2.0. I see, because you introduce functionality externally. And I want to mention that I didn't come with this rule that in order to counter these arguments, I came with the rules first. And then I saw the test, I was like, oh yeah, of course, in order to avoid this problem, you break the rules of the test.
And that's kind of. I Find it very interesting.
[01:26:30] Speaker B: Very good.
Let me end by asking you a question that a lot of people are talking about. There's people that believe, of course, this super intelligence is going to emerge, including Ray Kurzweil of New Google, Yuval Noah Harari, the late Stephen Hawking believed in this, but his Nobel Prize buddy Roger Penrose did not believe in this.
What are going to be the societal and ethical implications if one day AI demonstratively crosses your general intelligence threshold? What's going to be the impact?
It's kind of scary, isn't it?
[01:27:06] Speaker A: It's definitely, I want to say, weird to even think about, because what does that mean?
I guess it means we have, first of all, what it means, right? It means that our type of creativity is just an algorithm, right? A lot of people already believe that. That's why they think. I think one of the things that people believe that we're gonna get there because if we're doing it and if we're just an algorithm.
[01:27:33] Speaker B: Well, I tell people, these people believe that we're computers made out of meat. And therefore if we can create, then certainly if we replicate ourselves in silicon, that it can create also. And I think that that's a vacuous model, Right?
[01:27:47] Speaker A: But it's logical. It's a logical extrapolation.
[01:27:50] Speaker B: Well, it's logical if you. Depending on your ideology.
[01:27:54] Speaker A: Yeah, I agree, but what I'm saying is it's kind of logical extrapolation.
And the thing is, if this is really true, right, Then you have a machine. We have to think about how humans run, right? Humans use these neurons that need, let's say, milliseconds. What is a millisecond for a computer? It's ages. Milliseconds to communicate. What is this we're talking about? Nanoseconds, the bandwidth of a computer? How much information can you intake?
The latency from the time I asked you to do something, I told you, give me a response.
So if you truly have an algorithm that reaches that and is so much superior in hardware, because it's not just software, it's also hardware, then this is going to be like a thing we cannot even imagine.
We might be trying to imagine, but this is going to be. It might be even the case where we cannot even understand it anymore.
You know what I mean? And it might even be less useful in the end because.
[01:28:55] Speaker B: Less useful?
[01:28:56] Speaker A: Less useful because imagine if you go to a kid that is in the first grade and you try to teach it general relativity.
[01:29:05] Speaker B: Yes.
[01:29:05] Speaker A: What is it gonna learn? Nothing. You see me so it has to be at least gradual, like it's gonna be our teacher, so it has to be a good teacher. If it's not a good teacher, then it's not gonna be good. Very useful. You see me.
[01:29:21] Speaker B: Yep.
[01:29:21] Speaker A: But, yeah, it's definitely. It's very hard to predict what's going to happen. That's like just a tape.
Yeah. But I think all this talk, though, however, as you said, it's up in this assumption is this possible?
And currently I don't think we have at least a mathematical proof that this is possible.
It's more like a belief or a
[01:29:48] Speaker B: hunch, I would say. A faith.
[01:29:52] Speaker A: It's a faith, yeah.
[01:29:53] Speaker B: Yeah. So, yeah, there's this number called Chayton's number, which is astonishing from algorithmic information theory. If you knew this one number, you could prove or disprove all of the open mathematical problems that can be proved that can be disproved with a single counter example. It's an astonishing number and Cheton proved that it was that it exists and it exists. C has a Chaitan number, as does Python. But you can also prove that it's unknowable. It might be that there are some things which are unknowable, some things are true because they're true and are beyond computing or breaking down.
So it could be that. That's another answer to this. I don't know.
[01:30:38] Speaker A: Yeah.
[01:30:39] Speaker B: Any final thoughts?
[01:30:41] Speaker A: Yeah. I think what is after this? Right. I think this is the nice question right now. Right. Okay. If you have a good definition of general intelligence, then I think the next step is what exactly we discussed right now. Try to look at it and see can we map it to an algorithm or not?
[01:30:58] Speaker B: Yes.
I believe if you define AGI as climbing that mountain and standing on the top of it. But I think that AG that currently the large language models get you to the top of that mountain quicker than anything else. I mean, all of the information that you have in Grok or ChatGPT is available in the US Library of Congress. It would just take you a long time in order to extract it, in order to get to the top of that mountain. But ChatGPT gets you there immediately.
[01:31:28] Speaker A: Yeah. So one more point that I wanted to add is because we talked about what is going to be envisioned if this thing actually becomes intelligent. One thing we have to always remember is the opposite, though. Like, what if it's not?
Right.
Because if it's not, the problem is the other problem with AI. Typically people think about AI, they're like, oh, the doom that is going to come. If it gets super intelligent, it will destroy us all. But we don't think about the other side of the. Of the coin. If it's not truly intelligent. There's a fear of a false prophet. And what do I mean by that is we had this in our society multiple times. For example, we had. We had the anchorman, the newsman, whatever the newsman says is true because look at him. He's on the tv, dressed nicely. He knows what he's talking about. Then we had scientists say. And oh yeah, if scientists say it, I don't know if you know, the nest plate, the nice workplace that says scientists say that people are more likely to believe something if scientists say. Right. And then after the scientists say, maybe we have. I googled it. And then now we have AI setting.
So this false prophet is an actual problem, especially when people over exaggerate its abilities.
And then we might have this problem where we keep relying on AI that gives us data we already know.
And then we keep putting out data that is generated by AI. AI uses this data to train again, because it's on the Internet. The more data generated by AI, the more the data that AI uses to train is artificially generated. Then we have this closed loop where no new information, no new knowledge is produced. And then we have the same problem of a stagnation of information, stagnation of a model collapse where we don't produce anything, nothing else, because we rely on something we assume is truly diligent, generally intelligent. But it's not. And I think that's the other aspect of it that I have to be careful. As long as they don't pass the test, we have to remember all they produce now is what we know today, the frontier of knowledge. So in other words, in other words, if the frontier of knowledge today was Galileo's time, AI would insist that the Earth is not moving.
It's in the center of the universe and everything's been surrounded. You have to remember that because then it becomes this problem where no AI said it. Yeah, but AI says what we think right now. That's why if you see AI playing chess, it plays like an average person because that's the average knowledge. It doesn't play like an expert.
Right. And I think that's very important to remember.
[01:34:14] Speaker B: Wow. Well, thank you, George. I think this has been just a fascinating conversation.
We've been talking to Dr. Yorgos Maporis, known to his American Forensic George, about his new interesting paper, Turing Test 2.0, the General Intelligence threshold. Again, we're going to make a link to that on the podcast Notes so that you can read that, as well as make available to links to other literature. And George, if you could send me the URL to that podcast, we can include that too.
[01:34:47] Speaker A: I will do.
[01:34:47] Speaker B: And so thank you again.
This has been a wonderful time. And until next time on Mind Matters News, be of good cheer.
This has been Mind Matters News with your host, Robert J.
[01:35:09] Speaker A: Marks.
[01:35:10] Speaker B: Explore more at MindMatters AI.
That's MindMatters AI.
[01:35:18] Speaker A: Mind Matters News is directed and edited by Austin Egbert. The opinions expressed on this program are solely those of the speaker.
[01:35:28] Speaker B: Mind Matters News is produced and copyrighted by the Walter Bradley center for Natural
[01:35:34] Speaker A: and Artificial Intelligence at Discovery Institute.