Josherich's Blog

HOME SHORTS TRANSCRIPT SOFTWARE DRAWING ABOUT RSS

Drug Discovery (The Derby Mill Series ep 16)

19 Aug 2025

Drug Discovery (The Derby Mill Series ep 16)

Welcome to the Derby Mill series, intrepid pioneers of the next economy, featuring discussions with entrepreneurs at the forefront of deploying machine intelligence and brainstorming sessions about where the technology may go at the limit.

Welcome to the Derby Mill series, intrepid pioneers of the next economy. In this show, we meet with entrepreneurs who are at the frontier of their industries in terms of deploying machine intelligence. We springboard from their current use of AI to brainstorm on where this will go at the limit, and how we might get there.

I’m Ajay Agrawal, co-founder of Intrepid Growth Partners, and my three collaborators are:

  • Rich Sutton of the University of Alberta, who pioneered reinforcement learning
  • Sendhil Mullainathan of MIT, who uses machine learning to better understand human decision making
  • Niamh Gavin, an applied AI scientist working on optimizing foundation models, pen modalities for novel semiconductors

Rich, Sendhil, and Niamh are all senior advisors at Intrepid Growth Partners.

The domain we’re exploring in this episode is R&D for drug discovery. BenchSci’s mission is to increase the speed and quality of life-saving R&D to improve the health of patients. BenchSci is based in Toronto, they have approximately 400 employees, and they work with 12 of the 20 largest pharmaceutical companies in the world.

BenchSci uses machine intelligence to semi-automate hypothesis generation and experiment design in the drug discovery process. With us today is Liran Belenzon, BenchSci’s CEO and co-founder.

And with that, let’s start the show.

I, first of all, Liran, this is just super fantastic work, and I think I just want to contextualize it for everyone else. You already know this, but in some sense, I want to contextualize why I think this is so important.

If I just take a zoom back, the thing we know is that drug discovery is pretty important to all humans. We’re going to benefit from it. For some weird reason, every year I seem to care more about it. I’m unclear why, but anyway.

The second fact I’ll just observe is that a lot of drug trials fail, a shocking number given their prices. And that it’s totally plausible that, Liran, I’m not telling you that you don’t know, probably just contextualize for listeners.

I think that as a result, even modest gains in our ability to do bench science and create better shots on goal will have huge, huge benefits.

A drug trial can be in the seven to eight digits, which is an insane amount of money to spend. So the amount of dollars, even small gains in bench science that can influence even by one or two percentage points, probabilities of success are just worth a ton of money.

So you’re doing fantastic work and I just want to emphasize that this is work that is putting aside all the money.

And so the questions I wanted to kind of just ask to help bring out a little bit more of what you’re doing is I think the vision and how things are working right now makes total sense of:

  • Helping with experimental design
  • Figuring out how to do the protocol
  • Figuring out even which things to target, which genes, which molecules

I think it would be helpful for me if I had a sense, and for others, we just had a little bit more tangible sense of:

“What is the, for lack of a better word, what is the scope of existing knowledge that you feel you’ve already ingested at high fidelity? And what do you feel is the next layer or the layer after that, that you feel you’re sort of going to, in some sense?”

Just to make it concrete, you could say,

  • “Look, we’ve gotten all the verbal information in PubMed.”
  • “Or we’ve gotten stuff put into, you know, gene networks and we’ve gotten stuff, but we don’t yet have proteins.”
  • Or “If I have, if we have some sense of like, if you’re an amoeba and you’re absorbing information, knowledge that’s out there, what have you already eaten? And what do you plan to eat next?”

For sure.

And maybe if I’ll answer that question, I’ll say what we don’t do, because drug discovery is a very complex and vast space.

  • We don’t do anything around designing clinical trials, patient engagement, recruiting, and so on.
  • We don’t do anything around drug design.

Where we do apply our technology and the problem that we do tackle is unraveling disease biology.

The biology of the diseases we are trying to cure today is significantly more complex than ever before. And while we are very good at generating a lot of data in drug discovery, we are not good at generating knowledge about how disease biology actually works.

That is exactly what we do at BenchSci and why we build Ascend.

Ascend is a science-first, tech-deep, disease biology AI platform that acts as an AI system for the preclinical scientists and key decision makers, empowering them to unravel the complexity of biology at scale. We believe a successful AI solution in R&D must achieve scientific veracity of the data while maintaining a deep understanding of preclinical R&D workflows.

We also understood the value of providing a holistic view of disease biology, which not only included the world’s scientific findings from a public domain, but also internal findings from our customers. This is why we developed a technology to enable the secure integration of pharma’s unstructured proprietary data, combining the two data sets that allowed us to create the most robust data foundation in the world.

Our biology-specific multimodal AI, including specialized vision and NLP algorithms, focuses on analyzing and understanding experiments from both texts and figures. This enables powerful evidence-based generative AI with scientific explainability.

Ascend’s engine is powered by hundreds of millions of external and internal experiments to answer complex biological questions. We also developed capabilities to ingest and process customers’ internal data so we can link them to relevant research and index them to be easily accessible and searchable by scientists.

To ensure the integrity of the data and to give our models scientific judgment, we have a team of more than 100 scientists working directly with our engineers and AI experts. Our platform increases the productivity of pharma’s preclinical R&D pipeline by supporting its core workflows.

From target ID and prioritization, to experimental design and validation, all the way to translational workflows to ensure clinical success, Ascend notes how a scientist would understand and extract biomedical data and insights. It empowers scientists to effectively:

  • Pick the right experiment
  • Choose the right target
  • Find the best path to creating drugs that can impact a disease

We are in the midst of one of the most significant technological transformations in history. And the most impactful opportunity for AI and general AI is within drug discovery.

We are excited to be on this mission, working closely with our partners, the world’s top global pharmaceutical companies. This collaboration enables us to bring a tremendous amount of value to preclinical R&D and help bring novel medicine to patients faster.

In there, I would say it’s two dimensional:

  • What’s the data source
  • What are you trying to get out of it?

Which specific use case you’re solving and which workflow, which prediction you’re helping with. Because even if you look at saying, “oh, we go after scientific publications,” even within there, there are different sections and each of them has different information that can open up a different use case and different solution in the world. Some are relevant to what we do, some are not.

So really what we help scientists is basically three big things around this biology:

  • Helping them come up with the best idea, the best hypothesis through building specific AI systems around:
    • Target assessment
    • Drug assessment
    • Risk assessments
  • Helping them test as fast as possible:
    • Which protocols should they follow
    • Which experiments should be done or skipped because they’ve been done before
    • Which materials they should use
  • Helping them translate that into the clinic or making sure that’s more successful:
    • Focus on biomarkers, though more work remains to be done here

Our approach was, and this I think is what’s maybe complex in science, that it’s one thing getting access to data, the data that you need. The problem with the data is that it has a lot of garbage in it, a lot of inconsistency, inaccuracy, and has probably the worst dictionary in the world.

So for us, step one was:

“What we really are trying to do is understand everything that has ever been discovered in the most unbiased evidence-based way,”

which basically means experiments that have been done. So let’s focus our technology to understand and predict or classify the results of those experiments, less so around the conclusions of the paper and what scientists came up with, because that can be wrong.

And so it’s kind of similar-not focusing on that is why IBM Watson failed, because that was kind of the conclusion of the doctor versus what was actually in the test results. So imagine the experiments to be those test results. We focus our entire technology to understand that.

And that’s from multimodal AI, focusing on vision-basically looking at the scientific figures and what happened there-combining that with the NLP sometimes to increase the accuracy. As you can see, something happened in both or even increases the data scope.

So that’s one. And the second thing. So to do that, we got access to scientific publications, preprint patents, internal data from pharma, basically a vast net around what are all the documents that contain primary research.

So that was the first thing with it.

And the second thing we did was everything around the ontological knowledge base.

So what are all the known genes, proteins, diseases? What are all their 20 or 30 different names? What are their properties, the relationship among them, and so on and so on.

So imagine that to be kind of like the compass and a knowledge graph. We should understand the experiments to be the map. And those are the two data assets that solve pretty much your power, every single thing that we do.

Sorry to interrupt you.

So just make sure I get the scope of the last one. It’s protein, genes, diseases. That’s the network that you’re sort of building.

And beyond.

And beyond.

Known biomarkers, known risk, drugs, materials, pretty much every single thing that’s needed for the use cases that we mentioned before. Now, it’s ever growing.

So for us, we got the core. And from that core, we understood a few things or many things.

And the other part of innovation that we do that I don’t think other people maybe appreciate as much is actually mapping and understand how drug discovery is done in pharma. And that’s actually nothing that’s ever happened.

And when you understand that, then it kind of goes back full circle to what more I need to understand from the data that I have and which additional data sets do I need?

Because pharma X might be looking, “oh, I’m also looking at this data set and this is what I’m trying to understand from it.” That goes back to product roadmap.

Okay. Now we need to look at OMIM data set because company X takes that into account when they do risk assessments.

And from that, here’s what they’re trying to understand. So let’s build a model to do that. Let’s get in the data pipeline and put it on the platform and add it to the assistant.

So when they ask questions, it takes into that data sets into account as well.

So in a way, it’s ever growing. And maybe that’s what’s hard for a software company or a SaaS company where you reach an economy of scale, but at the same time, you stop building and you’re just maintenance because science just doesn’t work like that.

Hopefully, that answers your question.

No, that’s great. And I think part of it is that’s helpful. And then let me ask a follow-up question.

So let’s start with that first layer that you sort of talked about. I’m a scientist. I’m going to try and come up with a hypothesized target. Eventually, you’ll help me design the experiment and execute it, but I’m just trying to hypothesize the target.

What kind of feedback is happening at that point?

So for example, I can imagine two kinds of feedback:

  • I could imagine you hypothesize the target and I’m like, “that’s not interesting to me. Here’s why.” That could be one form of feedback.
  • Another form of feedback is you hypothesize the target. I run it. I run it. Does what happens there feedback into your system?

What is the feedback at that moment that’s happening?

That’s a great question. So maybe I’ll focus for a second on what we do today and what we’ll be doing in the future.

So today, and you asked me this question before, what are you predicting? So really what we’re predicting is, and probably it’s more of a classification problem, which is understanding what has been researched and what’s the semantic relationship around every single bioentity.

So what’s the bioentity? Is it a protein? Is it a disease? Is it a cell or whatever it is? How it’s related to another entity and what it did to it.

And really understanding that at scale and connecting everything is pretty much 70% of what we do.

And then when you’re a scientist and when you ask something, there’s kind of the question, what is novel?

So with my definition, novel is “I taught you something new.” It doesn’t mean it’s new in the world, but it means you don’t know it.

And to expect everybody to read everything and connect everything at the same time, it’s just impossible. So we always teach something new and add a lot of value.

Now, so that basically helps scientists. Maybe it’s a semi-automated way of forming the hypothesis. But basically they can ask the question:

What are all the proteins or the genes that are associated with this disease and how?

And imagine you had an AI system that read everything and connected everything. So that’s what we do now.

In terms of what is the feedback loop around that, there’s two ways.

One, we have over 100 scientists that work at our company that basically act as that feedback loop, whether it’s

- QA,
- golden data sets,
- training sets,
- actually looking at the data,
- supporting the prompting and coding,
- and so on.

And then there’s on the user side where they can basically say, was this valuable or not? Was this correct and incorrect and so on and so on.

Now, the next thing we’re working on with one of our design partners, which is a customer, is the next stage of that. So now that you know everything has been discovered, can you suggest something new that’s completely novel that no one knows? It’s new to you, but it’s new to the world.

And why? Right? Because that’s very, very important. Not just, “hey, this is related because this is like something the machine sped out and who knows why,” but it’s actually scientific evidence behind it to walk you through the line of reasoning or thinking of why this is interesting, why this can be true.

And what we do there or the plan there is to incorporate also internal data from pharma. So then when they actually run those experiments, we can bring that feedback loop into the system to see if it’s correct or not.

Now, you can also argue even if it’s incorrect to discover something valuable. So that goes back to the knowledge graph in the system. So as long as you discover something, even, “oh, this is wrong or this is not correct,” that’s still valuable.


So last question, I’m going to turn it over to everybody else. I’ll make one comment before.

For those of you who don’t know, Liran, if you don’t know, there’s this guy named Don Swanson, who is fantastic. It’s worth looking him up. He’s your intellectual great-grandfather. He was the first person to do this sort of literature-based discovery. And he did it twice. And he did it basically almost by hand. It was amazing. He’s a physicist at the University of Chicago.

One was for magnesium’s effect on migraines and the other effect was fish oil effect on patients. He’s an amazing, amazing character to read about anyway.

But can I just get the last bit that you said, which is great. It seems like you actually have for the scientist-facing product at the first stage-if I’ve understood you, you actually have two fascinating products. The first is actually the second one most easily:

  • The hypothesis generation that the second is almost a:

    Could an algorithm that's read the literature make a probabilistic guess of what would be a good next thing to try?
    

    It has almost an RL like flavor.

So that’s that. But the first that you have is also fascinating, which as you say, right now, every search engine on PubMed is somewhat broken. Because what I really want to do is to ask some sort of question, as you said, “tell me everything that’s known around this thing” and have it come back to me in some picky way. And that’s not how any search engine or knowledge works.

So in some sense, you have as much a knowledge interface as you do a hypothesis generator. And both seem fascinating.

I’ll just make that comment and then I’ll hand it over to Rich and Niamh.

Great. So let’s go to Rich and then to Niamh.


Okay. I’m still trying to really understand this. Your, your, your bench side in the system. I guess there’s a spectrum, like at one end, it’s an information retrieval system, like, you know, could lead like Google search or a more advanced one based on large language models.

And so it could just, just be an act and interface to the literature and that, and that’s, that’s at one extreme. And then the other extreme, it would be like a whole, it would be the scientist itself. It would propose experiments and ask them to be done and have automated procedures for doing those and so forth.

And it would also do the data analysis and write up the paper. And the first, you know, the, the, the one extreme seems eminently doable.

And so where are we on the spectrum?


So we’re not a search engine. Basically what we did, and that’s a big chunk of what we do as a company, we take everything that is out there and we understand it.

So basically we go and really focus, understand the experiments, sort of extracting the bio entities, understanding what they are, what’s the semantic relationship between them and other bio entities. And basically read every paper in depth and across and basically form new knowledge that just no one has ever connected in the past.

And then we married that with hundreds of different, different data sources.

So I should be clear about that.

You said you guys do it. So you don’t have an AI that does it. You, you, you have a hundred scientists in your organization and they all try to understand everything.

No, we build different AI models that do this. So for one, for entity recognition, understanding causality between one entity and another, and we build a lot of models over the years that can basically identify the bio entities.

And then the semantic relationship between them and other entities and different classifications and different use cases. Some can be for understanding causality while these two and others, some can be around helping you understand how to design a protocol. Some helping you understand how to select the material for your experiment. And it should, when you’re a scientist and if you go and you read the paper, there are many use cases while you do that.

So understanding that and then teaching a machine how to read it for you within a paper and across everything, and then structure everything on a knowledge graph. That’s one piece. Then the second piece is really bringing it to-

Sorry, sorry, Liran, but let me just stop on the first piece. Just so that, Rich, I suspect you may have some clarifications. And I’d rather even just pick one piece and focus on one piece. Like, you know, we don’t need to be comprehensive in this discussion. I’d rather go deeper if necessary.

So, Rich, did that answer your question on that piece or did you want to?

Well, I know there’s work in AI on trying to organize knowledge into, you know, entities and relationships and causal relationships. But it seems extraordinarily ambitious to try to, you know, capture all of science and all the possible knowledge in science and its uncertainty in a knowledge graph. In some kind of automated form, maybe more realistically, what you’ve done is you haven’t done all of the science. You’ve done, you know, some particular aspects and parts of it having to do with things like drug design and some delimited area where you can capture a lot of the knowledge. I guess that’s what you’ve done.

So just for context, how big the company is and how long I’ve been doing this, because, of course, it’s ambitious. If it wasn’t ambitious, I don’t think we would have done it.

But so far, we raised roughly $200 million and with 350 people in the company, we’ve been working on this for the past eight years. And we work today with 12 of the top 20 pharmaceutical companies. So we’re not a small team and have been working on this. But your point is very, very much true, which I’ll never say we do all of science or anything like that.

There’s different aspects, right? And really for us is we don’t do anything around drug design and we don’t do anything around clinical trials. So really, our focus is understanding how disease biology works. There’s probably many approaches to do that. And there are probably many, many different data sources that you can use.

Our approach was focusing on understanding all the wet lab experiments that have ever been done in history and focused on that data source to help us understand how disease biology works. Never would argue that we do every single data source that exists in the world, like maybe omics data and so on. But we bring together hundreds of different data sources that we deemed as important and our customers deemed as important. And then we focus on them to extract different relationships and knowledge to serve specific use cases out there.

So we really start with a problem and go backwards. So for example:

  • What does it mean to do a target due diligence? I have understanding if this is a target I should pursue or not pursue when I start a project.
  • What does it mean to do a risk assessment around a project that you’re working on? And how would scientists do that and basically automate that entire process with an AI from end to end system of actually building a software application for that as well.

That can lead to better efficiency, speed and better outcomes.

So really focus on those specific use cases and go back. And it could be for target identification, target validation and so on and so on. But there’s drug discovery is very, very complex. There are many, many aspects of it. So we never argue we do everything, but really we’re refocusing for all of our energy is understanding how this biology works.

There are other areas where you need to apply AI, like drug design, and then design the actual clinical trials and potentially reward evidence and so on and so on. We don’t play in those areas.

So you’re not saying that your scientists understand that whole area. You’re saying that somehow your system understands that whole area.

Yeah. Yeah. Yeah. That’s-we had scientists working side by side of our engineers over the past eight years, basically teaching the engineers what they know. So the engineers can write machine learning models to understand what they know. And then they tell the engineers if they got it right or wrong. And then we do it at scale. And then the market tells us if we got it right or wrong.

I’m just thinking of, you know, in Texas, they have this big system called Cyc, which is supposed to have understood, you know, lots of world knowledge. And then they always find there—there was never finished because it was just ordinary world knowledge. It went on forever. And I think for the scientific knowledge of biology, it would also go on forever. A hundred percent. That’s why it’s really expensive to do it.

So for us, we always update the system with new scientific papers, new discoveries, new modalities, new techniques, and new applications. And that’s why we have roughly a hundred scientists working here in every therapeutic area that we do - exploring what’s unique for that area and how the models need to change between oncology, immunotherapy, and so on.

So that’s why it was really important for us at the DNA of our company to be science and engineering, because at the end of the day, scientists are the gatekeepers to make sure that every single thing we build actually adds value.

“It can be perfect from an engineer perspective and add absolutely zero value in the world. And it can be perfect scientifically, but doesn’t scale and can actually serve anyone.”

So that tension and building those things together are obviously very, very crucial. And I can attest to that tension being real. I’ve kind of lived it myself.

So maybe kind of picking up on your theme of where you are and where you’re going, I can recap the value add you already delivered today because as a scientist, I do viscerally appreciate it. And then maybe I’d like to kind of tap into how you aim to resolve these three challenges.

To Rich’s point, biology is a complex dynamical system that will never be resolved. But I love that you not just codify the scientific method, but then look into how to optimize it, right? Be it the:

  • Protocol,
  • Workflow design,
  • Reagent selection,
  • Assay design.

These are huge lifts. On top of it, you address the reproducibility issue that has haunted scientific papers for an age. Because you’ve got that:

  • Audit tracing capability
  • Capturing null hypothesis for others to learn from.

And then I liked how you double-clicked on the explainability because it’s not just the what, it’s the why and how that really matters behind the reasoning.

So kind of looking forward, I think there are three things that will always be a bane of our life when it comes to biology.

One is like the fact that it’s systems biology, right? You know, as Adrian was kind of saying, it’s a gene to a cell to a tissue to an organ. We need to move past these kind of predictive states into looking at causal reaction pathways. So I long for a world that’s multi-scale simulation versus prediction, where you’ve got that mechanistic understanding beyond the Mendelian genotype-phenotype mapping.

Thinking about that, the second bit is overcoming the translation hurdle. Moving from assays to kind of single cell organ or organ-on-a-chip, almost like a next-gen CRO might be in the pipeline as you go vertical.

And then kind of the third bit, which I know that you’re not drug design, but the delivery mechanism really matters too. How do you think about that from remembering to kind of apply the AI?

Yeah, no, it’s a great question. So all those questions are very valid and very much true.

What we’re focusing on right now is the mechanistic aspects of how the disease actually works based on everything that’s ever been discovered. Both internally from pharma and we develop pipelines to actually do the same thing on pharma data as we did on the public domain data. So that’s really where we’re focused right now.

And it’s not all machine learning. So it’s:

Machine learning to understand the evidence,
Logic to combine different data sources,
Stitching these data sources together,
Combining a machine learning understanding with different ontological knowledge bases,
Stitching those two together,
Forming a hypothesis in the process.

Basically the same way scientists would do that as well, but as a result forming a hypothesis.

So it very much depends on the specific use case that we’re solving and how science actually is done by doing that at scale with kind of a super assistant with you on the clinical data.

So we’re starting to get into that now. That has its own, I guess, massive complexity and depends on how far you want to take it. But for me, it always goes back to day one, which is basically the underlying biology.

Most companies, I think, kind of started with the clinical data where GWAS studies and so on.

“Here’s a bunch of interesting genes and great. And someone said they’re related to this disease, but no one really knows how.”

So that’s where we will come in on helping with the how.

Now, the more clinical data that we can bring into day one, that will obviously help. But we really focus on:

  • How this manifests itself,
  • Actually in clinical trials over patients,
  • But how, if at all, this actually relates to that disease,
  • And how it’s connected on the entity level. Go back and first to Rich and then to Sendhil and then to Niamh. And just if you can comment on any thoughts that you have.

Let’s focus on two categories:

  • Machine intelligence for hypothesis generation
  • Machine intelligence for experimental design, for experiment design.

You’ve heard about this company that’s one of the leaders, perhaps the leader in its category.

So just how you think about using machine intelligence for hypothesis generation, and then secondly, how you think about machine intelligence for experiment design.

So we’ll start with Rich.

First, I want to say maybe I don’t have really great ideas here. But it certainly makes sense to me that for hypothesis generation, you can have fairly structured hypotheses, like what molecule or characteristics of a drug might be important. You can search the knowledge that you have for good candidates, like for good candidates for things that might be tested that haven’t already been tested.

And other than hypothesis generation, you said the design of experiments. Again, the design of experiments, many of them are relatively structured and repetitive. You could just have good practices and a machine could propose a good practice design. That seems to make sense.

But I have to wonder, wouldn’t the scientists have to have that good practice already?

And Rich, when you think about reward design in either of these categories, whichever one you feel most, you’ve got kind of the most ideas for, what is interesting or special about reward design in either of these topic areas?

So reward design is, of course, an area that I appreciate and enjoy very much. And I’ve been trying to work it into this, but I haven’t really been able to see.

I mean, in some sense, we would like to get reward for discovering knowledge, for proposing experiments that would be new and for the outcome of the experiment being informative. And then we’d like to use that to train the decisions if any of that were possible.

Sendhil, let me turn to you and then to Niamh.

Yeah, let me reframe what I think you guys are doing in a way that might be productive and instructive as we look forward.

I think, Ajay, when you said hypothesis generation, it’s a space that I find very interesting.

What I really like about what you’ve done is it actually starts somewhere much more basic. You alluded to it a little bit where you said,

“We really looked at what pharma does. It’s actually, yes, it ends up with hypothesis generation, but it starts with what is the decision that has to be made? What are the decisions that have to be made that are really pivotal, that have real consequences?”

Where if we can inform those decisions, even modestly, there are huge gains. And it so happens, one of those decisions is the decision as to what to try. And that can be cast as hypothesis generation.

But it’s worth noting that decision has some of those elements that many human decisions do, as well as hypothesis generation has, which is because people are not so certain and already so great at it.

Even modest gains, even an algorithm that has a pretty good suggestion can be super valuable. Because in some sense, and I think this is how I understand the difference between attempts, other attempts at knowledge graphs, where other attempts at knowledge graphs are attempting to actually embody the knowledge in a very high fidelity way that then you have to execute against it.

Here, the knowledge is embodied in a high fidelity way, but what needs to come from it doesn’t need to be more than the current uncertainty that every scientist has anyway.

That is, there’s this cloud of uncertainty that we as humans and scientists and pharma, they all have. As long as this can help move past that cloud, that really is insightful.

So to me, I love that it’s grounded in decisions, and I love that it seems to me like where AI tends to really succeed easily, perversely, is not in those situations where you need it to be perfect.

It’s in those situations where we’re already struggling that if it can help us do better. Okay, 70% accuracy. That’s a major home win, a home run.

And I think this has those nice features. So I’ll just, I’ll conclude, I’ll stop there.

Great. Thanks, Sendhil. And Niamh, over to you.

No, I love this. This is, this is my jam. Because for me, AI is less about AGI, but more about AI for discovery, exactly as Rich and Sendhil were just saying.

So I’m personally less excited when it can do things that we can, like, drive cars. That’s cool.

But I’d rather focus on helping us discover new things and addressing our blind spots.

So the fact that it can do, and again, it leverages to me, the traits of AI, which are our own limitations. So the fact it can do this huge global sweep versus local search void of any anthropocentric lens of scientific discipline. And in particular, if it can do open-ended search, to me, that’s, that’s phenomenal.

And then on the experimental design front, one thing which I’ve been excited about for a while and seen the results is kind of that active learning in a closed loop system whereby the model identifies where it’s least certain and then double clicks and runs experiments in that area versus confirming the known.

And to me, that’s where you start to really get that augmented assistant with AI.

Excellent.

And Rich, now that you’ve heard Sandal and Neve, anything else that you’d like to add?

Just, it’s a, it’s a, it’s a grand ambition. And I think, I think we, we, it’s exciting, exciting, the space of possibilities. And I, I do think someday we will have, we will, we reach the far end where the whole science and research discovery process can be done by machine and, and maybe even by reinforcement learning.

Yeah. Thank you much.

Okay.

Liran, and last 60 seconds are for you.

Thank you. I really want to echo what Niamh said. Drug discovery projects get stuck because either you have the wrong hypothesis or you’re running the wrong experiment or both. Cause basically you’re stuck either working on the wrong thing or you’re stuck in a maybe.

And really the idea is, is how do we help in both now? That’s also broad. The hypothesis can be a tiny thing, can be a big thing and same thing around experiments as well.

So for us, it’s taken a very pragmatic approach of just really, can we understand everything that has ever been discovered? Organizing it and have scientists leverage that. Cause that’s just something that has never, ever been done before. And now that that has been done, how can we leverage that to push the envelope forward?

And that might be less machine learning, less fancy than you would think, but it’s really mimicking the scientist’s mind at scale.

Okay. Even saying,

“No, we know A leads to B. And then there’s another paper that shows B leads to C. Another one C leads to D. A leads to D is potential hypothesis. Just no one has ever tested.”

Now you can say it’s machine learning. That’s just logic. You can call it whatever you want, but that’s really, really powerful.

And I think the greatest power is actually first understanding, cleaning, harmonizing, and organizing the data. And everything on top of that, I actually would argue is probably more simple.

And, you know, given just a couple of things that Neve said, I’m just going to do one more minute, Sendhil.

And this is for you, since you’ve done your own research on hypothesis generation using machine intelligence.

When you see this in the drug discovery setting and given your own research on hypothesis generation, anything that you would, you know, that you’ve updated any priors or thinking, having heard what Leran’s been building?

I think first something Leran mentioned I really, really liked, which is that just the sheer amount of time and energy that goes into getting the first data set in and to recognize that actually turning it into something that’s semantically meaningful is just a huge amount of effort.

And I think that there’s a huge amount of effort. And I think that there’s a business case here, and they’re exploiting it very well.

But I do think that outside of these areas where there are business cases, it makes me realize if we’re going to use AI to truly transform science, it feels like we’re going to need to invest that type of activity in other sciences in a way that I don’t think has been done.

That is a big public good lift, and it is super valuable.

I’m super glad that the private sector is paying for this to happen in the drug discovery cases, but there are many other cases where that’s not going to happen spontaneously.

It’s just worth noting how little was spent now that I’ve thought about this, even on the core data sets behind the protein folding problem. Like, just look at how little was spent on this-crazy.

And so there is an asymmetry here, and I would hope foundations, et cetera, take notice.

Excellent.

Okay.

Leran, thank you very much. And Neve, Rich, and Sendhil.

Excellent, as always. A superb discussion.

Thank you all.

Thanks for having me.

Thank you, guys.

Bye, everyone.

And that’s our show for today.

Thanks to Rich Sutton, Sendhil Malanthainen, and Neve Gavin.

And a special thanks to BenchSci CEO and co-founder, Liran Belenzon.

Follow us on the Intrepid sub stack at insights.intrepidgp.com.

And subscribe on your favorite platforms, including YouTube, Spotify, Apple Podcasts, and more.

Thanks, everyone, for listening.

The views, opinions, and information expressed in this podcast are those of the hosts and guests and do not necessarily reflect the official policy or position of Intrepid Growth Partners. This content is for informational purposes only and should not be considered as financial, investment, or legal advice.