Larry Sanger on Wikipedia, AI, and Preserving Human Knowledge

[00:00:00] Speaker A: Foreign. [00:00:05] Speaker B: The Future, a podcast about evolution and intelligent Design. Welcome to ID the Future. I'm Andrew McDermott and today I'm sharing again hosting duties with Nathan Jacobson, our director of Brand and Media at Discovery Institute. We're excited to continue our conversation with Larry Sanger, well known for his role as co founder of the online encyclopedia Wikipedia. Sanger has developed a number of educational and reference sites over the years and he's currently president of the Knowledge Standards Foundation, a non profit defining tech standards for encyclopedias. Sanger is also a longtime philosopher with a PhD in Philosophy from Ohio State University. Larry, welcome back to ID the Future. [00:00:53] Speaker A: Hi. Well, it's good to be back. [00:00:56] Speaker B: Well, in part one of this conversation, available in a separate episode, we discussed your journey as a skeptical philosopher and your decision to return to Christianity. We also talked about some of the arguments for intelligent design that have impressed you along the way, including the work of Dr. Stephen Meyer, William Dembsky, and Michael Behe. On this episode, we wanted to discuss your work with Wikipedia as well as the challenges that Wikipedia has had and still has today in presenting information about important or controversial ideas both accurately and fairly. We also want to talk to you about your current efforts to decentralize and preserve the world's knowledge. No easy task, but I think you're onto something with with your work there as well as pick your brain about the future of online encyclopedias and how to navigate the Internet of today as well as tomorrow. So let's jump right in. [00:01:47] Speaker C: First, Larry, I have a personal question. I have noticed following you on X, that you from time to time post these lovely photo homages to the state of Ohio, from the Licking river to Mount Pleasant and Lancaster. So what is it that you love about your neck of the woods in Ohio? [00:02:08] Speaker A: What do I love about Ohio? Well, I mean, it's hard to explain. It's certainly grown on me. I didn't appreciate it when I first moved to Ohio State like 30 years ago. Over 30 years ago. But the rolling hills are too rugged to be farmed, not rugged enough to be called mountains. They're very walkable and that the it's not like parts to the west of of us in the Midwest where it's, you know, flat as a pancake and not a lot of differences in in views. There's a just a lot of lovely trails that I like to walk and drive around to get to. So that's just the landscape though. I also like the people of Ohio. They're pretty genuine and unassuming, which I appreciate. [00:03:24] Speaker C: Do you find that getting away like that is a place for philosophical reflection. Or is it more as a alternative to that? [00:03:34] Speaker A: Oh, I, I listen to a lot of audiobooks on my way to the walks and then when I'm walking, I, I, I pray and have my sort of dialogues, imagined dialogues with God. [00:03:50] Speaker C: So yes, that's beautiful. Larry, in your talk at Discovery Institute's annual 2024 COSM Technology Summit, you sang an old refrain. You said, we desperately need to support open networks and open software that will preserve our knowledge in many independent copies, digitally signed, following open standards and exchanged according to open protocols. Use the word desperate. What is the desperate scenario that you envision that we need to avoid? [00:04:30] Speaker A: Well, I guess there's a couple and they both seem very unlikely, but we worry about them, don't we? So one of them is like nuclear holocaust or some other just civilizational collapse for whatever reason, so that the, you know, the power grid becomes spotty, the Internet becomes either spotty or non existent. And then, you know, for the people who can keep their laptops running and powered, you know, the great books of Western civilization can fit onto one of these things. I'm holding up a Zwe book flash drive. So the Knowledge Standards foundation sells these and it has almost 70,000 books, all of the public domain books in Project Gutenberg as of last June. The, the idea is somebody might be able to whip out one of these bad boys and, and have access to really a fairly sizable library's worth of books up until the 1920s. The other nightmare scenario is where we either either in the United States or the entire world find ourselves under the thumb of a horrifically totalitarian system in which what we read is carefully watched and where we need to have our own untraceable local copy of things. If we can have like an offline access to the great books, that could be very useful, maybe even essential to educating our children and preserving Western civilization. Not saying this is very likely, but yeah, we do worry about that, don't we? So it's worth having. I actually think it's worth having simply as a sort of time capsule. Also the archival value is great. Now that's just talking about these flash drives. What's possible is we make text versions of all of the currently PDF only books that are in Internet Archive and there's like one and a half million, I have, I am told, books in Internet Archive that are actually public domain and a lot of those have not had reliable OCR done on them. Optical character recognition, which means you can't just like Copy the text and make a really nice usable offline copy. And in fact, if you want to back it up, it takes like 500 terabytes, which is like 250 times the size of your laptop hard drive. And that's if it's fairly big. So yeah, that's like too much. So if you simply reduce all of those books to text, which nobody has yet done, which I'm amazed at, but if you reduce them all to text, then you could actually put them on a two terabyte hard drive, you know, and ssd, and you could, like, somebody could sell those things for a little more than the price of the SSD, maybe like $400 or something like that. And wouldn't that be something? Then you'd have not just the size of a reasonable local library, but something more, closer to a giant university library. Right. And if there are many, many copies of that, then Western civilization's not going anywhere. [00:09:27] Speaker C: I do wonder if, behind the scenes, all of the data sets, training models for the LLMs have something approaching that, but it's not available in the same way that archive.org makes it available. [00:09:45] Speaker A: Right. And also they don't, they don't expose that. They, they, they simply have the, I mean, they can look it up online, but what if it's not online? Yeah, they have been trained on it, but that's very, very different from actually making it available. Anyone who has actually used an LLM to do research, to actually get feedback on serious academic research, they understand that there are huge limitations there. [00:10:17] Speaker C: Yes. And your essay on that is excellent. We're going to get to that in just a moment. But first, like so many others, I have a tale of woe. Trying to contribute constructively to Wikipedia, losing days of my life, trying to justify a small edit, rebuffed by a phalanx of regulars who reversed even the simplest and verifiable correction that I was trying to make in a hostile entry. In my experience, Britannica seems to have retained more of a objective and neutral approach. So I'm curious, what led you in the beginning of Wikipedia to institute the neutral point of view policy? And if you think that there's any hope of returning to that on a platform that is user generated, like a wiki. [00:11:16] Speaker A: Yeah, I'm going to be talking a lot more about this later this year, but I can say this now, giving sort of the background. So, first of all, I have a personal animus against bias that goes back to my childhood, like reading old encyclopedias and saying, but they're just like coming out and saying that socialism was like, well meaning and so forth, or like, this is not supposed to be in an encyclopedia or in a textbook. I was just offended. So that's sort of reflected in Wikipedia's policy. But there's another reason for it, and that is that Wikipedia began as a wiki and wikis have a history that go back five, six years. Before Wikipedia, the initial wikis wiki was called wiki, Wiki, web. And the people who thought about how it works, they have this conceptual scheme about wikis. There are two different modes of editing a page. In the original wikis, there is discussion mode and document mode. In discussion mode, you're just talking things out. It's like a discussion thread. In document mode, somebody says, well, I think that this discussion is not maybe totally over, but we're sort of like not treading new ground. And I'm just going to summarize everything in a document. And so that's what they do. And that actually the notion of a Wikipedia article is kind of a riff on that, like actually making the document mode into an encyclopedia article. But one thing that was true of the is actually part of the original notion of document mode in the original wikis is something that they called a consensus. And this required that people enable other people who are working on the same text to basically have their say. So an article doesn't just express one point of view, it expresses multiple points of view. And you make compensatory adjustments to your views, to your view to be respectful to other reviews, other views. So it's the neutrality policy then was actually partly a social thing. It's like we're going to try to achieve that same sort of consensus. This was the initial dream, but it was just a dream. And the way to achieve that consensus is to be really, really committed to, you know, to neutrality. You know, allow other people to have their say. The article must not say in its own voice what only one party says, and so forth. So it was the neutrality policy was meant as a tool where radically different people could actually compromise and come to a consensus text. Wikipedia does not work that way anymore at all. [00:15:18] Speaker C: So you've got a new solution to this problem. Your efforts at the encyclosphere propose and make real a different remedy of decentralization. As you say, any centralized repository will always be vulnerable to ideological capture or corporate capture. In your approach, encyclopedia articles can be retrieved and ranked from many sources rather than attempting to achieve that single oracle of truth. And I noticed in this a kind of parallel to issues with consensus science or this science trademarked that, you know, because of funding and the ruling party, you know, tends to be captured. Maybe you can tease out your philosophy of decentralization as a remedy to the challenges of Wikipedia, the failures of Wikipedia and what you're working on at the encyclosphere. [00:16:24] Speaker A: Well, as long as there are gatekeepers that insist on using Wikipedia rather than a broader collection of resources, this isn't really a solution. But the idea is that we make a common data standard for all of the encyclopedias. We make all of the free encyclopedias available on a common network that doesn't have any one center. So anyone can contribute to it and have their own windows into the encyclosphere, so to speak. Right now we have two big ones, Encyclo Search and Encyclo Reader. Encyclo Search focuses on basically any free to read encyclopedia, not necessarily free to distribute. So it's got about 65 encyclopedias in the database and all of the articles that are free to distribute are in the form of a zwe file, Z W I mean zipped wiki. So it's a zip file. It's got the text itself and then it's got, which is HTML for the most part, but you can have other formats and it's got images, or at least scaled down versions of images and metadata and the whole thing is signed digitally, which means that you can be sure that, that it comes from a particular server and was signed on a particular date, which might or might not put your mind at ease about tampering by AI, for example. So that's one of the advantages of having signed articles. Also having all of these, as you were summarizing there, having all of these articles in a single database, not to say in a single server, but a single database that can be distributed in many different arbitrary locations enables and according to a single stable data standard, you can build on top of that. Rating systems. We haven't done that, we've done everything else. The rating system is a little bit more difficult and in order to make it happen then you actually have to motivate people to care about the ratings and so forth, which is actually a pretty tall order. But, but that's the, that's the vision. And like if, if Elon Musk were to ask for my advice on how to do this, I would say, okay, just like start putting money into the encyclosphere and really, really get behind the idea of a bunch of different nodes, as we call them, of the network so people can like publish Articles to the encyclosphere from many different independent locations and also then ratings of articles. And then we can like use the, the articles themselves and the ratings to, to slice and dice the encyclopedic data and of course train AIs based on them. That's, that's great. But there, there's one thing I know you're not going to ask, ask me this question because most people don't think to ask me this question. But it's an important question. Like what really is the point of, of an encyclopedia when there are LLMs that can give people answers a lot faster? Like I don't use encyclopedia as much anymore. I just ask an LLM and I ask it for sources if I really want the sources. But the answer is, even if nobody even reads the articles, LLMs, they summarize the information at different levels of granularity. Essentially you can see that in how they answer questions. Like if you ask it a really in depth question, it's clearly looking at data that is contained deep in books. Whereas if you're asking a general question, it's giving you like stuff from news articles and encyclopedia articles. Well, it's really important that human beings with human judgment write the encyclopedia articles. In other words, human beings still have to be the arbiters of the summaries of what is known. That's going to continue to be the case. [00:21:36] Speaker C: And yes, this is actually a question I was going to ask Larry. One of, one of my favorite essays of yours is why Encyclopedias are Still Important. And you, you know, drawing conclusions about what is generally believed requires human judgment, you say. And the irreplaceable and essential role that humans play in developing new knowledge that you lay out in that piece I think is really well articulated. So I highly commend that that piece. [00:22:15] Speaker A: Are underestimating you there. [00:22:18] Speaker C: All right, Andrew, you have a question? [00:22:19] Speaker B: Yeah, yeah. Along along the same lines, actually the same article, you, you make an important point, Lar. Nature of Organic Living intelligence and how it differs from machine intelligence. Now I've studied this topic for several years and intensely interested in that. Many of us have heard or used the phrase reinventing the wheel, and in most cases it refers to something we ought to avoid. After all, reinventing the wheel might be considered inefficient or wasteful. But you argue that reinventing the wheel actually reflects our originality and that we're not just the sum total of our material inputs that happen to us in our life. Tell us what you mean by that. Give us a little more Detail. [00:23:01] Speaker A: Oh, boy. Yeah, you're pulling out an essay that I barely remember myself. So. No, it's fairly straightforward. Right. The way that we teach students is so that they really understand a topic. This is very true in philosophy especially is by walking through the issues themselves and coming to a really fine grained understanding of the logical or rhetorical back and forth. And yeah, even if you're just summarizing what you read and literally spitting it back, you know, as in summary, it is the proven way to create a proper understanding of what's going on. So there's the educational value of it. And I actually think that the history of philosophy and other fields, especially in the humanities, just shows, especially in religion. Religion, it's maybe even more true in theology than in philosophy. People are constantly reinventing the wheel in these fields and sometimes it's annoying. It's like, oh, well, that again. But the interesting thing is that one can discover new aspects of old ideas by putting them into a new light or into a new context. And that's, that's important. You know, we do make certain kinds of discoveries by doing that. [00:25:19] Speaker B: Yeah, yeah, I appreciated your insight and you did pretty good for, for having written a while ago and forgetting about it. But, you know, part one of our conversation together discussed your intellectual journey from skeptical philosopher to Christianity. And we're on the topic of technology in part two here. But I did want to mention an argument for God that you made from technology and shared in a blog post. Hopefully it wasn't written too long ago. Let me set it up this way. I recently enjoyed an article in the Wall Street Journal by Gerard Baker called Faith, Freedom and the Long Thread of Technology. And in it he notes that historically, some conclude that magic gave way to religious belief, which itself gave way eventually to the scientific method and empiricism as we accumulated scientific knowledge. But Baker observes that recently our scientific and technological progress has enabled rather than eroded religious faith. He points to different examples happening in the culture right now. And he quotes G.K. chesterton's detective hero, Father Brown, explaining how he captured a thief. I caught him with an unseen hook and an invisible line, which is long enough to let him wander to the ends of the world and still to bring him back with a twitch upon the thread. Baker writes that perhaps the whole long march of science, all that secular accumulation of knowledge, is simply the unreeling of that long, seemingly limitless thread, and that eventually we all feel the twitch. Now, you felt the twitch back in 2017 as you considered the technological marvels all around us. Today or that time. And you wrote this. If it's conceivable that a billion year old super being could bring about the existence of a universe indistinguishable from this one, then it ought to be conceivable that God exists. I thought that was a pretty smart way to kind of extrapolate from our technological world and at least, you know, offer up the plausibility of the existence of God. What would you say to that argument? Was that a strong one for you at the time? [00:27:26] Speaker A: Yes. Well, it wasn't. I didn't really regard it so much as an argument for the existence of God, so much as the conceivability of God. And it's a strong argument there because, you know, as a grad student, I sometimes didn't even want to call myself an agnostic, let alone a atheist or theist because I thought, well, we don't really even have a notion of what God is because we have no notion of what it would mean for a mind to bring things into existence. And that is the core notion of a creator. I have no experience of that. And if I have no experience of that, then I have no way of conceiving of what God is. So therefore I don't even know what it means to say that God exists. Now. It was actually a response to that old argument of my own that led me to write that piece that you're referring to. And I think it is a pretty good argument because basically we can conceive right now just by looking at what AI is capable of and combine that with, you know, the, the stunningly detailed 3D worlds of, of, you know, billion dollar games, you know, and gaming systems. It's, it's pretty amazing actually. And that just extrapolate that after the, the if there were such a thing as a singularity which, which might be upon us in a few years according to some people. And you know, and then imagine another billion years of development after that, you know, then sure, you might be able to press a button on a machine or. No, you just have a thought, right? And things are so wired in some way we don't understand now. So that planet pops into existence or whatever, a whole universe. So now one needn't actually opine on the, on the actual possibility of such a thing occurring. We don't have any way of knowing if it's actually scientifically or naturalistically possible, but we can conceive of it. And if we can conceive of that, then we are conceiving of the action of A creator. If we're conceiving of the action of a creator, then we are conceiving of a mind that creates things. Well, not quite out of nothing, because that's not part of the thought experiment, but pretty close, right? Pretty close. It is a mind that brings a universe into existence with a thought. And then all you have to do is say, well, just imagine that such a thing existed necessarily from eternity. And there you have God. [00:30:56] Speaker C: Provocative. [00:30:58] Speaker A: That doesn't follow that God exists. All that follows from that is that God is conceivable, which is actually pretty important for some people to admit. [00:31:09] Speaker C: It has been disputed. [00:31:10] Speaker A: Yes, it has been disputed. [00:31:14] Speaker C: Well, Larry, one of the things that I'm impressed with is that the work you're doing at the Knowledge Standards foundation is so practical and concrete in trying to address the challenges of living in the information age and accessing, evaluating knowledge and information. So why don't you share, just in closing here, what you have available there and where people can find you? [00:31:45] Speaker A: Sure. So if you want to check out our work, go to encyclosphere.org so Encyclo, as in Encyclopedia Sphere, like the word.org and we have a couple of websites that people can use. There is a Chrome browser plugin. We've got more than two websites, but the two big ones are Encyclo Search and Encycloreader. And if you want to support our work, then please do contribute. We've got a donation page there. If you, if you want, I. I will actually sign these things and, and give you a signed and numbered version for 100 bucks. But if you don't need that, then for 50 bucks you can get 70,000 books or almost 70,000 books from the Project Gutenberg. And they are offline copies. You can. There is a bespoke reader, which is very nice. It works very well. If you're actually doing research with the text themselves. There are a lot of tools that are involved. I made the software myself with the help of an LLM, but it's extremely well tested and it seems to work pretty well. And you can also export the books to EPUB format. So if you want to just like have the ability to have access to all of those books from your own hard drive, then just put it from the flash drive immediately when you get it to your hard drive and then you'll be able to export to your phone and it's like your own. Everything stays within your own ecosystem, which I'm a big supporter of, like own your own data. So this is like Part of that movement. [00:34:02] Speaker B: Yeah. And it might seem like a small thing or an unneeded thing, but you know what happens if you're the only one with a copy in your immediate vicinity? You know, you've got to start teaching the, the classics. You got to share that with those around you. I mean, it's, it's at least feasible. Right. These days. [00:34:21] Speaker A: Right, right. And, and I, I would also say to the better healed watchers or listeners that the, the Knowledge Standards foundation is not well funded at this point. We used to have a dozen people working for us and now I'm the only one. And we, we really do need funding for a full staff to actually do things like make an AI front end for the encyclosphere to finish digitizing all the old encyclopedias and making a two terabyte version of this that has all of the public domain books that, that, that we can get our hands on. So. [00:35:17] Speaker B: Yeah. Well, Larry, we really appreciate your work in this arena and we thank you for your time today. It's been awesome chatting with you on these issues. [00:35:26] Speaker A: All right. Yeah, well, it's been a good interview. I appreciate it. [00:35:30] Speaker B: Well, if you haven't enjoyed part one of the conversation, please look for that in a separate episode and in the show notes for today's conversation, we'll include links to Larry's website and to the video of him talking at the COSM conference, the Discovery Institute COSM conference last year on the very important matter of preserving our knowledge. Well, for now, this is ID the Future. I'm Andrew McDermott. [00:35:53] Speaker C: And I'm Nate Jacobson. [00:35:55] Speaker B: We'll see you next time. ID the Future, a podcast about evolution and intelligent design.

Show Notes

Episode Transcript

Other Episodes

Episode 0

Another Evolutionary Icon: The Long-Necked Giraffe, Pt. 3

Episode 1265

Jay Richards at COSM Talks Kurzweil and Strong AI

Episode 442

ARN's Top 10 Darwin and Design Science Stories of 2010