June 29, 2026

Cancer Blood Tests Part 2: The clinical trial

How do you decide whether a clinical trial “worked”? In Part 2 of our Galleri series, we examine the landmark randomized trial of a blood test designed to detect more than 50 cancers. We explore why different outcome measures led to dramatically different headlines, discuss primary versus secondary outcomes, pre-registration, hierarchical testing, and post hoc analyses, and explain why mortality remains the outcome everyone is waiting for. Along the way, we uncover a statistical mystery involving dozens of missing cancers and discover how a little arithmetic can sometimes reveal more than a press release.

Statistical topics

cancer screening
exploratory analyses
hierarchical testing
missing data
multiple testing
outcome measures
post hoc analyses
pre-registration
primary and secondary outcomes
randomized clinical trials
screening tests

Methodologic Morals

“When the simple numbers don't add up, pay attention. The arithmetic may be trying to tell you something.”
“The first question should not be, did it work? It should be, what counts as success?”

References

Giridhar KV, et al. Safety and performance results from PATHFINDER 2, a registrational study of a multi-cancer early detection test in an intended-use population. Presented at the 2026 American Society of Clinical Oncology (ASCO) Annual Meeting. May 2026.
Hubbell E, Clarke CA, Aravanis AM, Berg CD. Modeled Reductions in Late-stage Cancer with a Multi-Cancer Early Detection Test. Cancer Epidemiol Biomarkers Prev. 2021;30(3):460-468. doi:10.1158/1055-9965.EPI-20-1134
Neal RD, Johnson P, Clarke CA, et al. Cell-Free DNA-Based Multi-Cancer Early Detection Test in an Asymptomatic Screening Population (NHS-Galleri): Design of a Pragmatic, Prospective Randomised Controlled Trial. Cancers (Basel). 2022;14(19):4818. Published 2022 Oct 1. doi:10.3390/cancers14194818
ASCO slides: https://grail.com/wp-content/uploads/2026/05/Swanton_ASCO-2026_NHS-Galleri_FINAL-Slides-05.26.2026.pdf
UK registry protocol: https://www.isrctn.com/ISRCTN91431511
Clinicaltrials.gov protocol: https://clinicaltrials.gov/study/NCT05611632

Common biases in cancer screening studies

Cancer screening studies are subject to several well-known biases that can make a screening test appear more effective than it actually is. Three of the most important are:

Lead-time bias: Screening advances the time of diagnosis, making survival from diagnosis appear longer even if the patient's lifespan is unchanged. For example, if a screening test detects a Stage II cancer at age 60 that otherwise would have been diagnosed because of symptoms at age 62, but the patient dies at age 68 regardless, survival from diagnosis appears to increase from 6 years to 8 years even though the patient did not live any longer.

Length bias: Screening preferentially detects slower-growing, less aggressive cancers because they remain detectable for longer than fast-growing cancers. For example, a slow-growing cancer that remains in Stage I for 5 years is much more likely to be found by screening than an aggressive cancer that progresses to symptoms within months. This can make screened patients appear to have better survival simply because screening preferentially found the less aggressive cancers.

Overdiagnosis: Screening detects cancers that would never have caused symptoms or death during a person's lifetime, leading to unnecessary diagnosis and treatment. For example, a screening test may detect a very slow-growing prostate or thyroid cancer in an older adult that would never have become clinically important if it had remained undiscovered.

Kristin and Regina’s online courses:

Demystifying Data: A Modern Approach to Statistical Understanding

Clinical Trials: Design, Strategy, and Analysis

Medical Statistics Certificate Program

Writing in the Sciences

Epidemiology and Clinical Research Graduate Certificate Program

Programs that we teach in:

Epidemiology and Clinical Research Graduate Certificate Program

Find us on:

Kristin - LinkedIn & Twitter/X

Regina - LinkedIn & ReginaNuzzo.com

(00:00) - Intro
(03:39) - The Claim: Not Ready for Primetime
(03:58) - Trial Design: 142,000 Participants
(07:50) - The Primary Outcome Problem
(20:29) - The Primary Endpoint: Complete Miss
(22:14) - Three Arguments for the Defense
(28:29) - - Statistical Sleuthing: Missing Cancers
(41:14) - - The Stage Shift Argument
(50:30) - - Rating the Claim

00:00 - Intro

03:39 - The Claim: Not Ready for Primetime

03:58 - Trial Design: 142,000 Participants

07:50 - The Primary Outcome Problem

20:29 - The Primary Endpoint: Complete Miss

22:14 - Three Arguments for the Defense

28:29 - - Statistical Sleuthing: Missing Cancers

41:14 - - The Stage Shift Argument

50:30 - - Rating the Claim

Normal Curves - Episode 34 Galleri Part 2
[Kristin] (0:00 - 0:09)
No one was really fooled by the spin, because the stock dropped almost 50% the next day, so clearly people read past the headline.

[Regina] (0:10 - 0:14)
That actually renews my faith in people. People can still read, apparently.

[Kristin] (0:20 - 0:29)
Welcome to Normal Curves. This is a podcast for anyone who wants to learn about scientific studies and the statistics behind them. I'm Kristin Sinani.

I'm a professor at Stanford University.

[Regina] (0:30 - 0:35)
And I'm Regina Nuzzo. I'm a professor at Gallaudet University and part-time lecturer at Stanford.

[Kristin] (0:36 - 0:41)
We are not medical doctors. We are PhDs, so nothing in this podcast should be construed as medical advice.

[Regina] (0:41 - 0:46)
Also, this podcast is separate from our day jobs at Stanford and Gallaudet University.

[Kristin] (0:46 - 0:55)
Today we're going to cover part two of our episode on the Grail Galleri liquid biopsy test and whether it's actually ready for primetime.

[Regina] (0:55 - 1:18)
Right, liquid biopsy, also known as multi-cancer early detection blood tests. The dream is that you give some blood, just like you do during an annual physical, and then they tell you if you have cancer anywhere in your body. And that's the dream, at any rate.

But some companies say they are coming close to that.

[Kristin] (1:18 - 1:48)
Right. A company called Grail has a commercially available product called Galleri that Grail says can screen for 50 different cancers with a vial of blood. And they've just concluded a randomized trial where the results were, uh, controversial.

The headlines, Regina, ranged from blood test helps reduce late-stage cancer diagnosis to Grail's cancer detection test fails in major study.

[Regina] (1:49 - 2:19)
Big difference between those. And all of this is really a big deal because if this works well, it has the potential to change how we detect and treat cancer. And there's a lot to talk about, so we broke up the episode into multiple parts.

In part one, last time, we talked about the biology of how these tests work, the history of Grail, and the non-randomized studies that preceded this randomized trial.

[Kristin] (2:19 - 2:53)
And we talked about how these non-randomized studies weren't sufficient to tell us how beneficial this test actually is. In those studies, Galleri definitely detected cancers, but there was no control group. And without a control group, we can't know how many of those cancers would have been found anyway and whether finding them earlier improved patients' outcomes.

And this is why it's so exciting that Grail ran a huge randomized clinical trial. And that's why the results of that landmark trial have been so highly anticipated.

[Regina] (2:54 - 3:03)
Randomized trials are what give us the strongest evidence. And today, in part two, we're going to dig into the design and results of this trial.

[Kristin] (3:03 - 3:15)
Exactly. And, Regina, Grail has not released the full results, so we do not have all the data. But they presented some data at a big cancer conference in Chicago last month.

[Regina] (3:16 - 3:19)
I bet that conference talk was standing room only.

[Kristin] (3:19 - 3:29)
Oh, I am sure, yes. So today, we're going to dig into the data we do have, and we're going to explain the controversy and the conflicting headlines that those data generated.

[Regina] (3:29 - 3:39)
Right. And since we always start with a claim and rate it at the end, Kristin, you picked one of those competing takes for us to evaluate. Right.

[Kristin] (3:39 - 3:47)
The claim we're rating today is Grail's Galleri Multi-Cancer Detection Test is not ready for primetime.

[Regina] (3:48 - 3:58)
And, Kristin, we promised our listeners some statistical sleuthing and some deeper takes on the data compared to the press coverage that's already out there.

[Kristin] (3:58 - 4:31)
Yes. That's why people need to listen to our podcast for the deeper dive on the data. OK, Regina, let's get into the trial now.

It's called the NHS Galleri Trial because they partnered with the NHS, that's the National Health Service in England, in order to pull off this very large randomized trial. They recruited about 142,000 adults between the ages of 50 and 77 who did not have a cancer diagnosis at the time, and if they'd had cancer in the past, had to be more than three years post-treatment.

[Regina] (4:32 - 4:40)
142,000 adults, that is a lot. And it's really hard to pull off a trial of that size, so good for them.

[Kristin] (4:40 - 4:52)
Yeah, this trial is a really big deal, and they did some things well. They pre-registered their protocol, at least in one place, in a UK clinical trials registry in 2021 before they started the study.

[Regina] (4:53 - 4:54)
And we love that.

[Kristin] (4:54 - 5:09)
We do. Regina, they also published their protocol in two other places, in a journal and in clinicaltrials.gov, which is the main U.S. registry, but those protocols were published in 2022, which is actually after the trial started.

[Regina] (5:10 - 5:15)
After the trial started, so, Kristin, not technically a pre-registration, but after.

[Kristin] (5:15 - 5:40)
Right, and as we're going to see, there were some inconsistencies between the three versions, but you know, still good for them for trying to be transparent. Okay, now let's talk study design. Participants were randomly assigned to receive either the Galleri blood test or a placebo blood test.

In about 71,000 in each group, neither the participants nor their doctors knew who was in which group.

[Regina] (5:40 - 5:47)
Right, controlled, randomized, and double-blinded, and we like all of that, especially together.

[Kristin] (5:47 - 5:57)
Yes, and people were only unblinded if they had a positive Galleri test, because of course, then they had to undergo additional testing to look for cancer.

[Regina] (5:57 - 6:09)
That makes sense. So, they had a placebo blood test. What does that mean, Kristin?

Did they just pretend to take your blood, and then they put a Band-Aid on your arm because maybe people would notice that?

[Kristin] (6:09 - 6:30)
Right, yeah, no, they drew their blood, but then they just stored the blood, so they never ran the Galleri test on that blood. They did keep the blood, though, because they might go back and test it later. And, Regina, there were three rounds of screening, so everyone got a blood test at baseline.

You actually did not get randomized into the study unless you did that first blood test.

[Regina] (6:30 - 6:34)
Ah, right, good, because that prevents dropouts after randomization.

[Kristin] (6:35 - 6:45)
Exactly. So, that was the baseline blood draw, and then everyone was supposed to come back for another blood test about a year later, and a final one about a year after that.

[Regina] (6:45 - 6:50)
So, people got either three Galleri tests or three placebo tests, right?

[Kristin] (6:51 - 7:33)
Exactly. Participants in both groups also continued to receive their usual cancer screening with their regular doctor. So, for example, if they were due for a colonoscopy or a mammography, they might have gotten that test just through their regular doctor. And the researchers then recorded all cancer diagnoses during follow-up.

Now, one detail that's going to be important later. When they presented their data, they sometimes broke the data up into three screening rounds corresponding to the three years of the study. So, round one were cancers diagnosed after the first, but before the second blood draw.

Round two cancers included those diagnosed between the second and third blood draws, and round three was all cancers diagnosed after the third blood draw.

[Regina] (7:34 - 7:38)
Hmm, that does sound important, but I guess we'll get to talk about that later, yes?

[Kristin] (7:39 - 7:40)
Yes, we're going to see why this is important later.

[Regina] (7:41 - 7:50)
So, you said that the researchers recorded all cancer diagnoses, and what was the primary outcome measure that they analyzed then?

[Kristin] (7:50 - 8:19)
This is really important, Regina. It goes to the heart of the controversy over the results. First of all, the outcome everyone really wants to see is mortality.

Who died and who lived? We want to know whether screening with Galleri, compared to placebo, ultimately saves lives. But the company argued that a mortality trial would take too long, and that in the meantime, people might be missing out on a potentially life-saving technology.

[Regina] (8:20 - 8:39)
Right. Seeing differences in death rates in the two groups might require a much longer study. You have to wait for people to die.

So, if this test does work, it would be good to let everyone know sooner, before people die. Yeah. Right.

So, Kristin, the primary outcome was not death. What was it then?

[Kristin] (8:39 - 8:54)
The primary outcome was the rate of diagnosis of advanced cancers, meaning stage 3 or 4 cancers. And their hypothesis was that the Galleri test would reduce advanced cancers because they would catch cancers before they could become advanced. That's the idea.

[Regina] (8:55 - 9:25)
Okay. So, this is different from all of the Galleri studies we've talked about so far, because there, we wanted to detect all cancers, including advanced ones, but here, we're changing it. Right?

So, we're saying stage 3 or 4 cancers can serve as a kind of proxy for death. Yeah. Because cancers that are caught this late are often deadly, right?

So, now, Grail wants fewer stage 3 and 4 cancers to show up in the Galleri arm, which is backwards from what we had before.

[Kristin] (9:25 - 10:02)
Exactly. Remember, Galleri is supposed to be an early detection test. The idea is that it could help patients by shifting the stage of diagnosis, by catching cancers earlier before they have a chance to become advanced.

So, this means we want to find more stage 1 and 2 cancers, but fewer stage 3 and 4. And Regina, this endpoint was not chosen out of thin air. In 2021, before the trial started, researchers published a modeling study that projected that giving the Galleri test would result in substantial reductions in stage 3 and stage 4 cancers.

[Regina] (10:03 - 10:17)
Okay. So, they put a lot of stock in this modeling study, because they actually used it to design the trial. They chose their primary outcome endpoint based on its results.

For a modeling study, that is a lot of responsibility.

[Kristin] (10:17 - 10:41)
It is, yes. Regina, one other important detail. Their primary outcome was not the rate of diagnosis of all stage 3 and 4 cancers.

Their primary outcome started with just 12 pre-specified cancers. Lung, head and neck, colorectal, pancreatic, myeloma, liver bile duct, stomach, esophageal, anal, lymphoma, ovarian, and bladder cancer.

[Regina] (10:41 - 10:53)
I hope there's not going to be a quiz on that later, because I am not going to remember all 12. That would be quite a memory test. But they only cared about 12 cancers?

Why those? Why not all the others?

[Kristin] (10:53 - 11:50)
Okay, so they didn't completely ignore the others. Their statistical analysis plan used something called a hierarchical testing procedure, and it's actually kind of a neat statistical idea.

[Regina]
Oh, I love that kind of testing.

[Kristin]
These are the things that nerdy people like us get super excited about, Regina. So let me explain how hierarchical testing would work here. First what they do is compare the rate of stage 3 and 4 cancers for just those 12 pre-specified cancers between the two groups, Galleri versus placebo.

If that's not statistically significant, the analysis stops. We're done. Game over.

But if that comparison is statistically significant, then we get to compare a bigger set of cancers between the two groups. Now all stageable cancers except prostate. If that one's not statistically significant, we're done.

If it is statistically significant, we again get to keep going. And now we get to compare all stageable cancers, including prostate cancer, between the groups.

[Regina] (11:50 - 12:40)
So they use this gatekeeping to reduce the chance of getting a false alarm, right, to control the overall type 1 error rate. And Kristin, I'm thinking here of our favorite guy, multiple testing dude. So in this study, left to his own devices, multiple testing dude would flirt with every cancer subgroup in the study until one of them said yes and we don't want that.

So hierarchical testing slows him down a bit, right? So it's like he's at a big party and he can only flirt with one girl at first. And she's the gatekeeper, I guess, and only if she deems him worthy enough is he introduced to her hot friend.

[Kristin] (12:40 - 12:42)
Oh, I like that analogy.

[Regina] (12:42 - 12:56)
But why bet the farm, Kristin, on those 12 cancers in the first place? Because I thought Galleri was supposed to detect more than, what did you say, 50 different cancers? So why are they focusing on only 12?

[Kristin] (12:56 - 13:30)
That is exactly the question I had. And Regina, in all their marketing, the company hand waves a bit on this one. They say these 12 cancers account for about two thirds of all cancer deaths in the U.S. And this makes it sound like they chose those 12 because they're particularly deadly. But when you dig further, that's not actually how they picked them. Oh, okay. In their protocol, they admit that they chose those 12 because these cancers release more cell-free DNA into circulation and the Galleri test is better able to detect them.

[Regina] (13:30 - 13:43)
Wait a minute. So they're picking Galleri's best-performing cancers.

[Kristin]
Yes.

[Regina]
And they're trying to stack the deck to increase the likelihood of getting a good result. That's, I don't know.

[Kristin] (13:44 - 14:37)
Yes, it's important to understand that the primary analysis focused on a subset of cancers where the test was already known to perform relatively well. Okay, so that's the primary outcome. And then, Regina, there were also a bunch of pre-specified secondary outcomes, and these were things like reduction only in stage four cancers, not both stage three and four, and test performance, safety and harms, and also mortality.

What's tricky, though, Regina, is that these secondary outcomes are a bit shifty.

[Regina]
Shifty? What do you mean by shifty?

[Kristin]
So remember I said there were three different protocols published early on? The secondary outcomes aren't entirely consistent across those three different protocols. And then in 2025, and again in 2026, they updated some of the secondary outcomes in these protocols.

You can see the changes online.

[Regina] (14:38 - 15:07)
And if you update the secondary outcomes after you have the data, that is no longer considered a pre-specified outcome. Kristin, you lost the pre in there.

[Kristin]
Oh, yeah, very much, yes.

[Regina]
That distinction is really important because when you choose an outcome afterwards, you lose that impressiveness factor, right? Because they could have already peeked at the data and figured out how to make the data tell a more favorable story. But that would be cherry picking.

[Kristin] (15:07 - 15:58)
Exactly. That is the worry. And of course, you can see online that they've changed the protocols. They didn't try to hide it.

But, you know, if somebody is just looking at their presentation at the conference, they might not realize that things have been changed from the beginning. So when we talk about the, quote, secondary outcomes, I'm going to point out which ones appeared in the early published protocols and which ones seem to have been added later. But it's not always easy to tell because sometimes the original description was vague.

And so maybe the update is just a clarification and not truly a new outcome. It's fuzzy. Regina, I was able to get some questions to some of the main investigators involved in the NHS Galleri trial, including the lead statistician.

And I am waiting to get clarification about this point and some other important points that we're going to discuss later today. I haven't heard back yet, though, because, you know, apparently we weren't a priority for them.

[Regina] (16:00 - 16:05)
You're saying our podcast was not on the top of their answer list. And I'm shocked.

[Kristin] (16:06 - 16:40)
I know. Like, obviously, we should be priority number one of all the questions that I imagine that they're getting. But maybe if this podcast makes the rounds and makes a splash, though, maybe then we'll hear back.

So I'm going to encourage our listeners to promote this episode because we are raising some important questions and we would like to get the answers.

[Regina]
Kristin, you're clever. That was very cleverly done.

[Kristin]
Slipping in a little, please promote us, send this to your friends. And Regina, when we do hear back from them, I do hope we will eventually hear back from them, that we may have some updates to include in a future episode, and we may have to get a part three out of this one.

[Regina] (16:41 - 16:43)
That would be fun. That would be a first for us.

[Kristin] (16:43 - 16:43)
Yeah.

[Regina] (16:44 - 17:04)
Okay, Kristin, I want to point out here that we're talking about secondary outcomes and not the primary outcome, right? Because adding or changing secondary outcomes after you have the data, it's not good, but it's not as big of a sin, right, as doing that with the primary outcome. It just makes the results more exploratory and less convincing overall.

[Kristin] (17:04 - 17:38)
Yeah, exactly. I want to point out they did not shift their primary outcome, so at least that's good. It's the secondary outcomes that are a little fuzzy.

[Regina]
Okay, so what did they find?

[Kristin]
Well, on February 19th, 2026, they announced the main results in a press release. Regina, they did, however, bury the lead because the headline of their press release was the following, Landmark NHS Galleri Trial Demonstrates a Substantial Reduction in Stage 4 Cancer Diagnoses, Increased Stage 1 and 2 Detection of Deadly Cancers, and Fourfold Higher Cancer Detection Rate.

[Regina] (17:38 - 17:53)
Okay, Kristin, I was listening carefully when you talked about primary outcomes and secondary outcomes, and that headline left out that all of these successes they just mentioned were all in the secondary outcomes, not the primary one, right? How convenient.

[Kristin] (17:53 - 18:19)
Very convenient, yes. The headline, Regina, is more positive than it should be because later in the press release, they acknowledged that the primary endpoint of their trial had not been met. Later.

That is the very definition of burying the lead. It is, yes. Regina, no one was really fooled by the spin because the stock dropped almost 50 percent the next day, so clearly people read past the headline.

[Regina] (18:20 - 18:25)
That actually renews my faith in people. People can still read, apparently.

[Kristin] (18:25 - 18:56)
Yes. Despite all we keep hearing about how college students are no longer able to read, at least it's the case that investors know how to read. Now, the problem was that press release did not provide many details, but the company did promise a detailed presentation at the American Society of Cancer Oncology, the ASCO meeting in Chicago in late May and early June of 2026, which they delivered on.

And so now we have those data, which is why we waited until now to do an episode on this.

[Regina] (18:57 - 19:03)
The ASCO presentation has finally arrived. And Kristin, what have we learned? Well, we learned two major things.

[Kristin] (19:03 - 19:13)
So first, the primary endpoint was a complete miss. Second, the company argues that there are still signals in the data that suggest that the test is ready for primetime.

[Regina] (19:14 - 19:33)
First point is important, right? Because according to what they bet the farm on, the test did not beat the placebo group. But this company is not giving up because of point two, the test might still be worthwhile because of what?

What other things are they digging up and highlighting?

[Kristin] (19:33 - 19:46)
This is where we get into the nuance. And again, we don't have the full data. We have only what the NHS Galleri investigators chose to focus on in their presentation.

But I want to talk about those results in detail now.

[Regina] (19:46 - 20:28)
All right. Excellent. Cannot wait for first a break.

Welcome back to Normal Curves. We're talking about the Galleri Multi-Cancer Early Detection Test. The controversial recent results from its randomized controlled trial.

And we were about to dig into the trial results recently presented at the American Society of Cancer Oncology meeting.

[Kristin] (20:29 - 21:18)
And again, this was just a conference presentation. It hasn't been peer reviewed, and it does not give us a comprehensive look at the data. But we're going to work with what we have.

So let's go through now what we do know from this presentation. And Regina, let's start with the easy part, which is the primary outcome. For the primary outcome, they found nothing.

It was a completely null result. Overall, there were 706 stage 3 and 4 cancers from those 12 pre-specified types detected in the Galleri Intervention Group versus 688 in the Placebo Group. This translated to an incidence rate ratio of 1.03 comparing Galleri to Placebo. And it was not statistically significant. And Regina, I'll remind listeners that we talked about incidence rate ratios in the Daylight Savings episode.

[Regina] (21:18 - 22:14)
We did. We did. OK, 1.03, that is a very null result, Kristin. And I noticed that the rate was actually higher in the Galleri Group, which is not what they were hoping for. Yes. Right.

Right. Yeah. OK.

But to unpack that, a rate ratio of 1.03 means the Galleri Screening Group had a 3 percent higher rate of those stage 3 and 4 cancers than the Placebo Group. But 3 percent is so small that statistically speaking, it's like saying there's no meaningful difference between the groups. Exactly.

So, Kristin, that explains the negative headlines, I'm guessing, right? Because the main endpoint of the trial wasn't met. So they didn't even get to unlock those doors, right, past the set of the 12 cancers, those gatekeepers, because there was no clear reduction in the rate of diagnosis for the Galleri Group.

Exactly. But the company argues that this is not the whole story.

[Kristin] (22:14 - 22:41)
And this is where they point to a set of secondary and exploratory results that explain the positive headlines we saw. And Regina, let's tackle three of the most important key arguments that the NHS Galleri investigators made in their presentation. Their first argument is that we should not pay much attention to the cancers diagnosed in Round 1.

Remember, Round 1, those are the cancers diagnosed between the first and second blood draws.

[Regina] (22:41 - 22:43)
OK, that sounds weird.

[Kristin] (22:43 - 23:06)
Why should we not pay attention to those? Explain that. Their argument is that many of the advanced cancers found in Round 1 were already there and already advanced when people entered the study.

So there was no opportunity for Galleri to prevent them. They call this the prevalent round because prevalence refers to the number of people who already have a disease at a given time point.

[Regina] (23:06 - 23:38)
Ah, so I'm picturing this kind of like, OK, you've got a clinical trial where you're testing smoke detectors, but in the trial, some houses are on fire already, right? So when those fires get caught, it's not a sign of failure of the smoke detectors because those houses came into the trial already engulfed in flames, or in this case, I guess patients were already in Stage 3 or 4 cancer, and Galleri didn't even have a chance to prevent them.

[Kristin] (23:38 - 24:17)
Right, that's a great analogy. The researchers are basically saying, you shouldn't hold those Round 1 advanced cancers against us because the test could not have found them before they became advanced. It was already too late.

So Regina, let's look at the numbers that they used to support this argument. In Round 1, the Galleri group actually did worse than the placebo group. They had 250 Stage 3 and 4 cancers compared with 211 in the placebo group.

And remember, that includes just those 12 pre-specified cancers, of course. That corresponds to a rate ratio of 1.19, or a 19% higher rate of advanced cancers in the Galleri group.

[Regina] (24:17 - 24:35)
Ah, so this is actually consistent with the argument that they're making, right? That there were more roaring advanced fires detected in the Galleri group than the placebo group. But that's not Galleri's fault because the infernos were already firing all at the start of the study.

[Kristin] (24:35 - 25:18)
Exactly. And we're only installing these smoke detectors in the Galleri arm, so of course we're going to find more fires in the Galleri arm. They then, Regina, point to the later rounds, which they call the incident rounds because incidence is the term that we use for new cases.

They say that in Round 2, there were 179 advanced cancers in the Galleri group versus 189 in the placebo group. So that's actually 10 fewer in the Galleri group, which translated to a 5% rate reduction. In Round 3, it was 214 in Galleri versus 243 in the placebo group, which was 29 fewer cancers in the Galleri group, and that corresponded to a 12% rate reduction.

[Regina] (25:18 - 25:49)
Ah, so now in later rounds, we're seeing a 5% reduction, a 12% reduction. So they're arguing that once you've cleared out those advanced cancers that were already there at baseline, now you're seeing that increased benefit from the screening test. And instead of 19% worse outcome at the beginning, now we're seeing a 5% and 12% better outcome.

And the improvement also looks like it's growing over time.

[Kristin] (25:50 - 26:06)
That pattern supports their argument. And you know, it sounds like a reasonable explanation and the numbers tell a good story. But there are several issues with their argument.

So first of all, none of those differences, the 19%, 5%, and 12%, none of them is statistically significant.

[Regina] (26:07 - 26:18)
Well, that seems important. Yeah, kind of undermined their argument because all of this could just be chance fluctuation that there's nothing really happening here.

[Kristin] (26:18 - 27:10)
Right. And you know, Regina, they might argue, well, but we messed up on choosing the primary outcome. And we didn't properly design our study to be able to find a significant difference in the later rounds.

And that's why we didn't find anything. But that's a post hoc explanation. So we have to take it with a grain of salt.

And Regina, this analysis was not preplanned. I could not find a pre-specified secondary endpoint in any of their initial protocols that said ignore round one and focus only on rounds two and three. Also remember, we talked about that 2021 modeling study.

The authors in that study did distinguish between prevalent and incident rounds. So this is not a new idea. But their modeling actually projected substantial reductions in advanced cancers even after that single prevalent screen.

So this tells me that they were expecting a benefit even in round one.

[Regina] (27:10 - 27:40)
Right, because that modeling study played a role in designing this study, this trial. And that modeling study projected an early benefit. So this is a clue that their reversal is post hoc.

And we do not like that because there's no guarantee that someone didn't peek at the data and cherry pick things to make the whole story look better. Now, of course, we're not saying that this group did that. All of the explanations are reasonable.

Like you said, they could be the full story.

[Kristin] (27:41 - 27:46)
Right. And, you know, maybe they didn't anticipate this result until they saw the data. And maybe it is the right explanation.

[Regina] (27:47 - 28:12)
Yeah, but statisticians get really testy about this. And I can see why that's confusing to some people. But the idea is that pre-specifying all of your analyses and outcomes ahead of time, that is the guardrail that science uses to reassure everyone that there were no shenanigans.

Right. There was no peeking, no post hoc justification going on.

[Kristin] (28:13 - 28:24)
Exactly. It can become really easy to make up a story that seems to fit after you see the data. Right.

But you have to take that with a grain of salt. Regina, I also noticed something strange in the numbers.

[Regina] (28:24 - 28:29)
I feel like some statistical sleuthing is about to enter the room.

[Kristin] (28:29 - 28:49)
Yeah. And actually, Regina, I want you to discover this for yourself. So I'm going to ask it as a quiz and listeners can follow along.

Remember, the total number of advanced cancers was 706 in the Galleri arm and 688 in the placebo arm. Now, Regina, were you paying attention? Did you catch the numbers I gave you for each group for each individual round?

[Regina] (28:49 - 28:51)
Well, I caught them, but then I lost them. I absolutely do not remember.

[Kristin] (28:52 - 29:03)
OK, I'll remind you. Let's start with the Galleri group.

It was 250, 179 and 214. Now I want you to add those up and see what you notice.

[Regina] (29:03 - 29:26)
Add those up in my head? Are you kidding me? OK, wait, let me get out a pencil and paper here.

250, 179, 214. That is 643.

[Kristin]
Correct.

[Regina]
But wait a minute. I thought you said it was 706. 643 is not even close.

Did I add wrong, Kristin?

[Kristin] (29:26 - 29:36)
No, you added right. The issue is that we lost 63 cancers from the Galleri group. Now let's look at the placebo group.

The numbers were 211, 189 and 243.

[Regina] (29:36 - 29:47)
OK, adding those up. OK, cool. 643 also.

That is weird. But wait a minute. That is still less than 688.

[Kristin] (29:47 - 29:52)
Yes, it turns out that we also lost 45 cancers in the placebo group.

[Regina] (29:52 - 30:03)
Wait a minute. So they just had some data. Some cancers just disappear.

Somehow 63 cancers in the Galleri group and 45 in the placebo group went poof? Where did they go?

[Kristin] (30:03 - 30:19)
So there was an explanation. A footnote in the slides from the presentation explains that some cancers were not assigned to a screening round because some participants missed their follow-up blood draws.

[Regina] (30:19 - 30:41)
Hmm. Now that is odd. So it sounds like maybe if a participant missed one of their scheduled blood draws but then got diagnosed with cancer later, the investigator said, hey, we don't know where that cancer belongs.

We don't know which round it happened in. So we don't know which round to assign it to. And we're just going to toss it out entirely.

[Kristin] (30:42 - 30:55)
Yes, exactly. That's it. And presumably all round one cancers could be assigned because everyone did have a first blood draw.

So the missing cancers are almost certainly coming from the time periods that correspond to rounds two and three.

[Regina] (30:55 - 31:04)
Hmm. And it just so happened that in the Galleri group, there are 18 more excluded cancers than in the placebo group. How handy.

[Kristin] (31:04 - 31:34)
How handy, yes. And this is really important because the apparent benefit in rounds two and three amounts to 39 fewer advanced cancers in the Galleri group. Remember, there was 10 fewer in round two and 29 fewer in round three.

So if there were 18 more cancers excluded from the Galleri arm in rounds two and three, that matters. 18 is a substantial fraction, almost half of the 39 cancer difference that the investigators are highlighting here.

[Regina] (31:34 - 31:44)
Hmm. So the decision of what to do with those ambiguous cancers, this is pretty important to the bottom line. It seems a little sus to just disappear them.

[Kristin] (31:44 - 31:47)
Did you say sus, Regina? My kids say sus.

[Regina] (31:48 - 31:51)
No. I'm trying to sound cool. Is it working?

I don't think it's working.

[Kristin] (31:52 - 32:22)
Well, at least I knew what you meant because of my kids. But yes, it is suspect. It's a bit of a red flag, because if you're going to argue that the later rounds show a benefit, I'm going to need a much clearer explanation of exactly which cancers were excluded from those rounds and why.

These exclusions are not just bookkeeping. They could meaningfully affect the story. One other related thing puzzled me, Regina.

Some participants were followed for up to 22 months after their third blood draw, rather than just 12 months.

[Regina] (32:23 - 32:28)
Oh, so the follow-up was longer for some people? Why? Did they enter the study earlier?

What happened?

[Kristin] (32:28 - 33:34)
Yeah, that's exactly it. So we have to remember that recruitment takes time. They didn't enroll 142,000 people on day one of the study, right?

So if you were one of the first people to join the study, you ended up being followed for longer than 12 months after the third blood draw, because the study continued until everyone had at least 12 months of follow-up after that last blood draw. So what I found confusing is that cancers diagnosed up to 22 months after that third blood draw were still counted as round three cancers. And this seems a little inconsistent to me with this whole idea that they're excluding cancers that they couldn't assign to a round, because what I'm guessing is, let's say a participant missed the second blood draw, but they had a cancer diagnosed 16 months after the first blood draw.

I think what they're saying is that they didn't know whether to assign that round one or two, so they excluded it. But they're taking cancers that occurred 16 months after the third blood draw, and they're assigning them to round three. So that would, to me, say, well, why didn't you just assign that other cancer to round one that was 16 months after the first blood draw, right?

It seems a little inconsistent to define it one way for round three, but differently for the other two rounds.

[Regina] (33:35 - 34:04)
Right, right. And statisticians, we want everything to be consistent and decided ahead of time. Right, yes, yes.

And they may have had a perfectly good rationale, right? But it's these kinds of inconsistencies, you know, that make you and I wonder how robust the findings really are. And we need to know whether the conclusions hold up, even if you make slightly different choices, right, about how the cancers are assigned and classified and how long the follow-up needs to be.

[Kristin] (34:05 - 34:54)
They are pointing to this pattern, higher rate in round one and increasingly lower rates in rounds two and three. They're pointing to this pattern and saying, oh, this explains what's happening. But if that pattern is not robust, if it depends on all these tiny little choices that you're making, then that's not a very compelling story.

Yeah. And I don't find this pattern robust for several reasons that we've talked about. So first, again, none of the round-specific results were statistically significant.

So this could just be chance fluctuation. Second, it doesn't appear that this was a pre-specified secondary endpoint, which makes it also more likely to be just a story after the fact. Right.

Third, the analysis excludes a substantial number of cancers from the very screening rounds being used to argue for a benefit, but there are more exclusions in the Galleri group than the placebo group.

[Regina] (34:54 - 34:57)
Yeah, this is not a good look for them.

[Kristin] (34:57 - 35:15)
Right. It just doesn't seem like a strong story. And again, Regina, I've sent these and several other questions to the NHS Galleri investigators, and I do think they're very important questions, and I really hope we hear back.

And if we hear back, we will happily do a follow-up or even a part three episode. So I hope we hear back from them.

[Regina] (35:16 - 35:21)
All right. This is great sleuthing and reporting, Kristin. I hope we get some good stuff.

Fingers crossed.

[Kristin] (35:21 - 35:52)
One other note, Regina, even if this pattern turns out to be real, with more advanced cancers being found in the first round and then increasingly fewer in subsequent rounds, that still doesn't necessarily mean more lives are being saved. Some of these stage 3 and 4 cancers may be relatively slow-growing, and Galleri may be preferentially detecting slow-growing cancers in the first round and thereby preventing them from showing up in later rounds, but it may turn out that diagnosing them a year earlier or so might not meaningfully change the patient's prognosis.

[Regina] (35:52 - 36:03)
Right. That makes sense. And I think it's why cancer specialists, right, really are saying they want to see mortality results, not just the results that depend on which stage was caught when.

[Kristin] (36:04 - 36:39)
Exactly. Okay. So, Regina, that was their first kind of key argument. The second major argument they make in their presentation is they say, let's focus only on stage 4 because stage 4 is the most important.

And this was a pre-specified secondary outcome to look just at the rate of diagnosis of stage 4 cancers, not both 3 and 4. And by the way, Regina, we're still focused on just those 12 pre-specified cancers. Overall, they found 342 stage 4 cancers in the Galleri group versus 397 in the placebo group.

And that's 55 fewer in the Galleri group. The rate ratio was 0.86, and it just made statistical significance.

[Regina] (36:39 - 36:45)
Okay. So, 0.86, that's a 14% reduction in favor of Galleri. Exactly.

[Kristin] (36:45 - 36:58)
I find this argument a little stronger than the previous argument because this was a pre-specified secondary outcome. Also, they have not excluded any cancers, and it made statistical significance even if just barely and even if it's only in a secondary outcome.

[Regina] (36:59 - 37:32)
Okay. That is not a wholehearted, full-throated endorsement from you, Kristin, but it's something. Yeah.

So, Kristin, you said that this is where they use stage 4 cancers as an outcome. So, that means they're looking to see whether Galleri can catch more cancers as late as stage 3, right, 1, 2, or 3. It looks like maybe it can, but the question is whether catching them as late as stage 3 corresponds to any benefit in survival.

But we can't know that until we look specifically at mortality.

[Kristin] (37:32 - 38:24)
Yes. Without mortality data, we don't know for sure. But, of course, stage 4 is, on average, deadlier than stage 3 because that's the point at which the cancer has spread all over the body.

So, you know, maybe there is something here. Regina, they then go on to break the stage 4 cancers into different screening rounds, but I don't think we should pay much attention to this analysis because, as we've already talked about, those round-specific analyses are a bit, in your words, sus because they omit a lot of cancers. They omit the cancers that they say could not be assigned to a round because the participant had missed one of those later blood draws.

Here's something really interesting, though, in the data, Regina. Remember we said earlier that 63 cancers were excluded from the Galleri group and 45 from the placebo group? Well, it turns out that almost all of those excluded cancers were stage 4, not stage 3.

103 out of the 108 omitted cancers were stage 4 cancers.

[Regina] (38:24 - 38:30)
Whoa. Okay, that is striking. That is a pretty big coincidence if this were just random chance.

[Kristin] (38:30 - 38:55)
Yeah, these exclusions absolutely do not appear to be random. So this doesn't look like it's just, you know, people who happen to skip the next screening visit and then, by chance, they were suddenly diagnosed with stage 4 cancer. Something seems to be preferentially removing stage 4 cancers from the round-specific analyses, and whatever is causing that, it's not balanced between the two arms of the trial.

Something seems to be happening.

[Regina] (38:55 - 39:41)
I like the way you put that. So I'm trying to picture why would people skipping their appointment be more likely to be diagnosed with stage 4 cancer later on in both the placebo and the Galleri group? Because it's not like skipping the appointment is going to cause stage 4 cancer, right?

But maybe it goes the other way. Is that what you're thinking? Maybe someone has symptoms of cancer.

They go to their doctor 11 months after the first blood draw. Probably takes a few months to get a full diagnosis. Then they find out they have stage 4 cancer, but they are not putting higher on their priority list going to their cancer screening study visit, right?

Yes. Maybe that's why we have more stage 4 cancers missing in the screening rounds, yeah?

[Kristin] (39:41 - 40:29)
Yeah, exactly. It takes a while to get a full diagnosis and a stage when you're getting diagnosed with cancer. And if you're already in the midst of a cancer diagnosis, yeah, you're probably not prioritizing this cancer screening study that you're participating in.

So I think that's very plausible. But that, Regina, doesn't explain why there are more excluded cancers in the Galleri arm than the placebo arm because the study is blinded. So these things ought to be balanced.

And, Regina, you know, I did get a quick email back from the NHS Galleri investigators saying that they would get back to us. I'm sure they have a ton going on. And again, we're probably not high on their priority list.

But the fact that they didn't get back to us with a quick, easy answer is a bit of a red flag because it at least indicates to me that there wasn't a simple, obvious explanation. So maybe they didn't notice this and they haven't thought through it carefully before. Right.

[Regina] (40:29 - 41:13)
OK, Kristin, if I'm counting correctly, that was two arguments. And we have one more third key argument that you wanted to bring up. But let's do that after the break.

Welcome back to Normal Curves. Today we're talking about the NHS Galleri randomized control trial. And we are about to hear, Kristin, the third key argument that you wanted to highlight.

Tell us.

[Kristin] (41:14 - 41:36)
OK, so their next argument is that Galleri shifted cancers toward earlier stages. So we're still, again, focusing on those 12 pre-specified cancer types. And among those, there were 647 stage one and two cancers caught in the Galleri group compared with 559 in the placebo group.

That's 88 more in the Galleri arm. And it was a 16% rate increase, which was statistically significant.

[Regina] (41:37 - 41:50)
And just a reminder, because I had to remind myself, we want more stage one and two cancers caught here because we want to catch those cancers in the early stage before they can progress. So this sounds pretty good. It does.

[Kristin] (41:50 - 42:26)
And of all the positive findings they presented, this is one of the more compelling. But as always, the details matter. So first, this endpoint is a little murky from a pre-specification standpoint.

It was mentioned in their clinicaltrials.gov protocol, but not in the other two protocols that I talked about earlier. So I'm willing to count it as pre-specified, but maybe with an asterisk. Second, when you break down those 88 extra early stage cancers, the increase was not spread evenly across all cancer types.

More than half of the entire signal came from colorectal cancer alone, 47 of the 88 additional cancers in the Galleri group.

[Regina] (42:26 - 42:30)
Ah, that is interesting because we already have pretty good colon cancer screening.

[Kristin] (42:31 - 42:57)
Right. We already have multiple effective screening approaches for colon cancer. Colonoscopy, stool DNA testing, and there's even an FDA-approved blood test already.

So it's not obvious that this is where a multi-cancer early detection test adds the most unique value. Another 24 of the 88 cancers were blood cancers. And as we discussed earlier with the Pathfinder study, the clinical benefit of finding some of those cancers earlier isn't always obvious.

[Regina] (42:58 - 43:10)
So you're saying Galleri did detect more stage 1 and 2 cancers, which is good, but just how much of a real-world benefit that corresponds to, still unclear. Exactly.

[Kristin] (43:10 - 43:17)
We talked in Part 1 about some of the reasons why an earlier cancer detection doesn't necessarily translate to a better outcome for the patient.

[Regina] (43:18 - 43:42)
And Kristin, in Part 1, we mostly avoided the jargon. Did you notice that? Yeah.

The names of the specific biases that can occur in screening studies, we didn't give it the formal name. We did talk about over-diagnosis, but there are a few other named biases. So for the really nerdy fans out there, we're going to put those in the show notes just for you.

Oh, good idea, Regina.

[Kristin] (43:43 - 44:27)
Regina, I should also note that for four of the 12 cancers, lung, head and neck, bladder, and anus, the placebo group detected more stage 1 and stage 2 cancers than the Galleri group. And Regina, in addition to the presentation given by the NHS Galleri investigators, the conference organizers invited Mark Robeson from Memorial Sloan-Kettering to give a separate presentation about the trial results. He was not involved in the trial, so his role was to provide an outside perspective.

Oh, that's good. Someone with fresh eyes and maybe a little less skin in the game. Exactly.

And one of his slides I thought was particularly helpful. He highlighted the cancers for which he thought early detection was most likely to benefit patients and where the Galleri arm appeared to detect more stage 1 and 2 cancers than the placebo arm.

[Regina] (44:27 - 44:36)
So, this is where the Galleri test had potential greatest benefit to people in the population and the strongest signal in the study.

[Kristin] (44:37 - 44:51)
Yes, he highlighted four of the 12 cancers, colon, which we already talked about, and then also esophageal, stomach, and ovarian. Galleri caught 54 of these three cancers in stage 1 or 2 versus only 31 in the placebo arm.

[Regina] (44:51 - 44:53)
Okay, notable difference there.

[Kristin] (44:53 - 46:08)
Right, and they didn't provide p-values because looking at single cancers is definitely an exploratory analysis, but that's real people whose lives may have been saved. These are cancers where there isn't a good screening test already and where we believe that survival is better in stage 1 or 2 than in stage 3 or 4. Yeah, I find that compelling.

Absolutely. But it's only a small fraction of the cancers detected by Galleri. So, it paints a picture that's somewhat similar to what we saw in the Pathfinder study.

It looks like we are catching some cancers earlier than we might have otherwise, and for a subset of these cancers, there's reason to believe that this may have benefited the patient. But again, without data on morbidity or mortality, we really don't know how many people would benefit from population screening with Galleri or how large those benefits would be. Now, Regina, I want to briefly mention one other analysis.

Up until now, we've just been talking about those 12 pre-specified cancers, but when the researchers looked at test performance, that was a pre-specified secondary outcome, by the way, they did include all cancer types. Test performance is things like sensitivity, specificity, positive, predictive value. They were looking at this just in the intervention group, the Galleri group only, and for this analysis, they only counted cancers diagnosed within 12 months of a blood draw.

[Regina] (46:08 - 46:19)
Okay, so they did not include those round three cancers, right, that we talked about from the people who were followed longer where their cancer was diagnosed later, more than 12 months after the third blood draw.

[Kristin] (46:19 - 46:48)
Right, they didn't include those because for test performance analyses, they wanted a fixed 12-month window after each blood draw. So you could say, like, the sensitivity for catching cancers in 12 months or the positive predictive value in 12 months. They needed that time window.

So they looked at 36 months of data here, and there were a total of 3,051 cancers diagnosed specifically in the Galleri arm, and Galleri, the test, detected 937 of these.

It turns out, though, Regina, the test performance metrics were very similar to what they found in the Pathfinder studies, so the sensitivity was 30.7%.

[Regina] (46:48 - 47:20)
Okay, so they are catching about 3 in 10 cancers, right, and like we said with the Pathfinder, this is actually pretty good for a screening test.

[Kristin]
It is, yeah. The specificity was 99.6%.
[Regina]
Okay, I'm going to, again, unpack what all these things mean. So people without cancer correctly tested negative 99.6% of the time, so only 0.4% of false positives.

[Kristin] (47:21 - 47:23)
Yeah, positive predictive value was 52%.

[Regina] (47:24 - 47:41)
Right, meaning if you had a positive test, there's a 52% chance that you do have cancer. Right, and negative predictive value was 98.9%. Meaning with a negative test, you have a 98.9% chance of not being diagnosed with cancer in the next 12 months.

[Kristin] (47:41 - 48:54)
Yeah, and these are all very good metrics, similar to what we saw in Pathfinder. But one detail that I would like to have that we don't have here is, I would like to see those test performance metrics broken down by stage. Like, what was the sensitivity specifically for stage 1 cancers?

We've seen in earlier studies of Galleri that the sensitivity for stage 1 cancers is lower. Another thing that I noticed, and another detail I'd like to have, Regina, is that according to one of their slides, 65 of those 937 Galleri-detected cancers were not actually assigned a stage. 10 were considered unstageable, and 55 had missing stage information.

Now, we don't know how many of those were among the 12 cancers that were used in the primary analysis. It might only be a few. But because all of their arguments rely on this stage-shifting argument that we've been discussing, it's really important to have stage information.

And I would really like to know how many cancers in the placebo arm were also missing stage data, just to make sure that the missingness was balanced. Because if more cancers are being excluded from one group than the other because of missing stage information, that's going to affect the results when we're talking about stage shift as the main argument here.

[Regina] (48:54 - 49:10)
Oh, absolutely. That could affect the results and the conclusions, the interpretation. But, of course, there's not a lot of room for details like what you are asking for in a conference presentation.

And, Kristin, that's why you and I like peer-reviewed papers where they have more room.

[Kristin] (49:10 - 49:59)
Yeah, I don't blame the investigators for not including these kinds of details. They only had so much room in the conference presentation, but that's why I have some questions out to them, and hopefully they will report back. Now, Regina, the trial investigators are going to eventually look at the thing we keep asking for, which is mortality data.

Mortality was one of the pre-specified secondary outcomes. To me, the fact that they didn't present early mortality data suggests that they are not seeing anything interesting for mortality yet. But, of course, there may not have been a lot of deaths yet, so the number may be too small, really, to draw any conclusions at this point.

They do plan to continue to follow people for up to six years more to look at mortality endpoints. And I guess, Regina, we're just going to have to wait around for six years to see those results because that may answer a lot of our questions, finally.

[Regina] (50:00 - 50:04)
Six years?

Yes. Okay, hopefully we'll still be doing the podcast in six years.

[Kristin] (50:04 - 50:27)
Yes, and hopefully we'll still look as young. Yeah, six years sounds very far away, and I'm not sure if we're going to make it that far on this podcast. I hope so.

If we're going to make it, though, I think we're going to need a lot more listeners and a lot more people to donate a coffee to us, or we're going to need to get some more sponsors if we're going to make it six more years. I'm hopeful. I'm hopeful.

Pass this around, everyone.

[Regina] (50:28 - 50:29)
Very subtle, Kristin. Yeah, right.

[Kristin] (50:30 - 50:59)
No, no, a little plugging the podcast here. Yeah, yeah, yeah. Okay, Regina, now I think we are actually ready to wrap up this episode.

And rate the strength of evidence for our claim. And how do we rate strength of evidence in this podcast? With our highly scientific, trademarked 1-to-5 smooch rating scale, where one smooch means little to no evidence for the claim, and five smooches means strong evidence for the claim.

And just as a reminder, the claim for today is the Grail Galleri multi-cancer detection test is not ready for prime time.

[Regina] (51:00 - 51:15)
And Kristin, you framed this as a negative. So, that means if we give a high rating, that we don't think that the test is ready for prime time. And a low rating would be good for Grail, meaning we do think the test is ready for general use.

[Kristin] (51:16 - 53:33)
Right, yes, exactly. If we give lots of smooches, that's bad for Grail, and low smooches is good for Grail. Good, yes.

[Regina]
So, what is your take, Kristin?

[Kristin]
Regina, I'm going 4.5 smooches, subject to change, of course, if we ever hear back from the NHS Galleri investigators on our questions. And subject to change in six years when we get the mortality results.

But right now, I don't see good evidence that this test should be implemented on a wide scale. There's no doubt that the Galleri test catches cancers. From this randomized trial, we actually know the counterfactual.

We know that it catches more cancers than if you did not give people the test. And as I said at the beginning of part one, I personally believe that this technology is incredibly powerful and promising, and that somebody is going to get it right. And this is probably how we're going to screen for cancer sometime in the future.

For me, though, if we are talking specifically about the evidence for Grail's Galleri test as it stands now, they have come up short in my book. First of all, they missed their primary endpoint. And they can tell all the stories they want to explain that, and maybe those stories are right.

Maybe they just chose the wrong primary endpoint. Maybe the incident rounds are more informative than the prevalent round. But this whole story they are telling really rests on these round-specific analyses that were post-hoc, not statistically significant, and had these weird non-random exclusions.

And that's just a red flag for me. There is something going on there. There are more of these emissions in the Galleri arm than the placebo arm.

And I'm going to need some clarifications to put any stock in those round-specific analyses. Regina, this is not saying that individual people can't or haven't benefited from the test. So if you're the person who this test catches a stage 1 pancreatic cancer or a stage 2 ovarian cancer for, this test indeed may have been life-saving.

I believe that there are individual cases out there where this test has saved lives. This is why I find this technology so exciting. But the question for now is whether the benefit is large enough and the evidence is strong enough to justify population-wide screening.

Because population screening isn't judged by whether the test ever helps anyone. It's judged by whether the overall benefits outweigh the costs. And there are costs here that we can't forget, right?

False positives, false negatives have costs. And I just don't think we are there yet with the benefits outweighing the costs. How about you, Regina?

[Regina] (53:33 - 54:50)
I'm going to go with four smooches on this one. Not really ready for primetime in the population, at least with the evidence that we have. And that's not to say that some people would not benefit, like you're saying, Kristin.

But this is the difference between looking at the benefit of a test for an individual person and looking at the benefit for a population. And Kristin, you and I did a very, very rough back of the envelope calculation. And we figured that it's something like one person in maybe several thousand is likely to benefit from getting screened with this test.

And it's hard to know exactly, right? Maybe it's one in one thousand, maybe it's one in three thousand, but it's probably on the order of thousands, one out of a thousands, not hundreds or tens, right? So benefiting would not be as rare as winning the lottery, but it's not the same benefit as like quitting smoking or increasing how much you exercise.

So you need to weigh all of this, the cost and the benefits, the financial cost, how much you need to pay to get this test, but also the emotional cost, like you said, of the false positives and false negatives. You need to take all of that into consideration and be prepared for that.

[Kristin] (54:51 - 55:05)
We won't judge you if you want to buy a lottery ticket here with a one in three thousand chance of winning. I understand that. But again, that's an individual decision.

And I think a lot of the experts agree here, Regina, that this is not ready for widespread screening.

[Regina] (55:06 - 55:09)
So how about methodologic morals? What's yours, Kristin?

[Kristin] (55:09 - 55:45)
You know, Regina, there's been a lot of press coverage about this trial, but I've noticed it almost all just quotes the exact numbers given in the conference slides. I'm actually kind of surprised that no one else seems to have noticed what I noticed, which is that the round specific numbers don't add up to the totals and that these exclusions are non-random. This statistical sleuthing I did, literally, it just relied on my ability to add and subtract.

So I think this whole thing highlights the importance of basic arithmetic. So my moral is, when the simple numbers don't add up, pay attention. The arithmetic may be trying to tell you something.

[Regina] (55:45 - 56:54)
Oh, I like that one a lot. But I want to add, Kristin, it's not just arithmetic, because you're bringing this bloodhound sense. You know, healthy suspicion and skepticism and just sniffing this down.

So that, plus being able to add. All right, how about you, Regina? Yeah, I'm going to go with something about how the results are all different here, depending on which outcome you're using, right?

Is it stage three and four cancers or just stage four, stage one and two? And I think it's easy for journalists, when they cover these kinds of trials, to just say, hey, it's significant or hey, it's not significant. But the really interesting story might be behind all of that, which is that it really depends on how you define works.

Does it reduce death? Here, we don't know that yet. Does it reduce stage four cancers?

Yeah, maybe. And here's what it means, right? So journalists have the opportunity to do more.

So here is my moral. The first question should not be, did it work? It should be, what counts as success?

[Kristin] (56:55 - 57:11)
Oh, I love that, Regina, because that really goes to the heart of the controversy around this trial. It's the choice of primary endpoint. How are we defining success?

And I think that is such an important point. And most of the experts out there, I think, are saying, show me the mortality data.

[Regina] (57:12 - 57:19)
Yeah, yeah. So I think this is a great opportunity for journalists, though, to help bring this nuance and this story to a bigger audience.

[Kristin] (57:20 - 57:20)
Yeah, I agree.

[Regina] (57:21 - 57:31)
So, Kristin, this has been super exciting. This is a first for us, a two-part out of perhaps two or maybe even three. Follow up in six years.

[Kristin] (57:31 - 57:37)
I think we're going to get a part three on this at some point, Regina. Exactly when, I don't know. But hopefully soon, yes.

[Regina] (57:38 - 57:44)
But thank you. This has been fascinating and eye-opening.

[Kristin] Thank you, Regina.

And thanks, everyone, for listening.