April 6, 2026

Diagnostic Testing: Do the stats tell you what you need to know?

Diagnostic testing: what do those statistics actually tell you? Sensitivity, specificity, positive predictive value . . . you’ve probably seen these terms before. Maybe you memorized them for a test. But do you actually know what they mean? In this episode, we take a closer look at how diagnostic tests are evaluated—and how they’re often misinterpreted. From a genetic test for cellulite to a blood test for autism, we explore how “statistically significant” findings can turn into tests that don’t actually help anyone. Along the way we meet the freckle gene, the wanderlust gene, and infidelity gene.

Statistical topics

Base Rate
Bayes Rule
Case-Control Study
Matching
Conditional Probability
Sensitivity
Specificity
Positive Predictive Value
Prevalence
Negative Predictive Value
False Positives and Negatives
True Positives and Negatives

Methodological morals

“A biomarker paper is not the same thing as a biomarker test.”
“If your sample doesn't match the real world, then for all of your positive predictive value needs, call on Bayes' theorem.”

Detailed Show Notes with calculations

References

Emanuele E, Bertona M, Geroldi D. A multilocus candidate approach identifies ACE and HIF1A as susceptibility genes for cellulite. Journal of the European Academy of Dermatology and Venereology; 2010. 24: 930-35.
https://genomelink.io/traits/cellulite
https://www.genexdiagnostics.com/
Ebstein RP, Novick O, Umansky R, et al. Dopamine D4 receptor (D4DR) exon III polymorphism associated with the human personality trait of Novelty Seeking. Nat Genet. 1996;12:78-80.
Kluger AN, Siegfried Z, Ebstein RP. A meta-analysis of the association between DRD4 polymorphism and novelty seeking. Mol Psychiatry. 2002;7:712-7.
He Y, Martin N, Zhu G, Liu Y. Candidate genes for novelty-seeking: a meta-analysis of association studies of DRD4 exon III and COMT Val158Met. Psychiatr Genet. 2018 Dec;28(6):97-109.
Smith AM, King JJ, West PR, et al. Amino Acid Dysregulation Metabotypes: Potential Biomarkers for Diagnosis and Individualized Treatment for Subtypes of Autism Spectrum Disorder. Biol Psychiatry. 2019;85:345-54.
Sainani K, Goodman S. Lack of Diagnostic Utility of “Amino Acid Dysregulation Metabotypes.”
Biol Psychiatry. 2018; 85: e41-e42.

Kristin and Regina’s online courses

Find us on:

Kristin - LinkedIn & Twitter/X

Regina - LinkedIn & ReginaNuzzo.com

(00:00) - Introduction
(02:24) - The Cellulite Test
(06:41) - Understanding Sensitivity and Specificity
(12:50) - Enter Positive Predictive Value
(18:40) - Why Base Rates Matter
(24:06) - More Ridiculous Tests
(33:30) - The Wanderlust Gene Deep Dive
(41:27) - The NeuroPoint Autism Test
(53:34) - Trying to Set the Record Straight
(01:02:39) - Personal Stories
(01:05:54) - Wrap-up

00:00 - Introduction

02:24 - The Cellulite Test

06:41 - Understanding Sensitivity and Specificity

12:50 - Enter Positive Predictive Value

18:40 - Why Base Rates Matter

24:06 - More Ridiculous Tests

33:30 - The Wanderlust Gene Deep Dive

41:27 - The NeuroPoint Autism Test

53:34 - Trying to Set the Record Straight

01:02:39 - Personal Stories

01:05:54 - Wrap-up

[Regina] (0:00 - 0:26)
Look at this whole thing. Okay, they're offering you a test to find out if you or someone else is prone to infidelity. Then they're offering you a test to find out whether you have an icky STD germ right now.

And look over on the left, they are also offering you paternity tests. So, I feel like they have cornered a certain market here, don't you think? There's a narrative arc that is happening.

[Kristin] (0:30 - 0:40)
Welcome to Normal Curves. This is a podcast for anyone who wants to learn about scientific studies and the statistics behind them. I'm Kristin Sainani.

I'm a professor at Stanford University.

[Regina] (0:41 - 0:46)
And I'm Regina Nuzzo. I'm a professor at Gallaudet University and part-time lecturer at Stanford.

[Kristin] (0:47 - 0:51)
We are not medical doctors. We are PhDs, so nothing in this podcast should be construed as medical advice.

[Regina] (0:52 - 0:57)
Also, this podcast is separate from our day jobs at Stanford and Gallaudet University.

[Kristin] (0:57 - 1:07)
Regina, today we're going to do something a little different, kind of similar to our p-value episode, which surprisingly turned out to be one of our most popular episodes.

[Regina] (1:07 - 1:12)
Surprisingly, because it was 73 minutes on p-values.

[Kristin] (1:14 - 1:19)
Yes, 73 minutes on p-values. And somehow we made it fun, believe it or not.

[Regina] (1:20 - 1:23)
Yeah, testing your psychic abilities was one of the highlights, Kristin.

[Kristin] (1:24 - 2:24)
That made it fun. By the way, I have none. Okay, so today I want to tackle another statistical topic that a lot of people have to learn at some point, just like p-values.

And this is the statistics that we use for diagnostic and screening tests. Things like sensitivity, specificity, and positive and negative predictive value. And eventually we will even get to the infamous Bayes rule.

So Regina, this episode focuses on statistical tools rather than a particular scientific topic, which means I had to be a bit creative to come up with a claim for the episode. These statistical tools themselves are not controversial like p-values, but we are going to encounter a bunch of fun case studies involving genetic and biomarker tests for things like cellulite, wanderlust, and infidelity. So I made the claim today about these specific tests.

So here's our claim, many direct-to-consumer genetic and biomarker tests are just useless junk.

[Regina] (2:24 - 2:32)
Wow, that's a fun one. We're not going to get a lot of sponsors from the genetic test industry this way, are we, Kristin?

[Kristin] (2:33 - 2:47)
Probably not if we rate that claim highly. No, Regina. Okay, so our story today starts with a very fun example that I've used in class for years, which is on a genetic test for cellulite.

[Regina] (2:47 - 3:01)
So you are telling me I can get a genetic test that will tell me if I am going to get a cottage cheese butt or maybe already have a cottage cheese butt because I haven't checked in a while.

[Kristin] (3:03 - 3:16)
Yes, and this example, by the way, how did I come across this? I discovered this years ago when I was writing a health column for Allure magazine. It is a beauty magazine, so I often covered things like cellulite.

[Regina] (3:17 - 3:21)
This might be a good place to talk about cellulite, actually, before we get to the stats.

[Kristin] (3:22 - 3:23)
Ooh, good idea, Regina.

[Regina] (3:23 - 4:14)
Okay, so despite the weird sounding name, cellulite, it's not actually a disease. I looked it up. It is just about how your skin looks.

It's about the fat deposit, the fat deposits on your thighs or butt or hips when they push through the connective tissue beneath your skin and it bulges out in places. And that's what gives your skin that beautiful orange peel appearance or cottage cheese appearance, which is perfectly normal and harmless, right? But not something that we want.

It's something women grapple with and stress over. Men have hair loss. Women have dimply thighs.

So, Kristin, I started looking online to see if there's anything that you can actually do to treat cellulite or prevent it.

[Kristin] (4:14 - 4:15)
Ooh, what did you find?

[Regina] (4:15 - 4:33)
Yeah, not a whole lot. You can't do much to prevent it other than just generally be healthy, work out, don't smoke. Treatments are weird.

They're things like injecting fillers or carbon dioxide gas under your skin or getting laser treatments.

[Kristin] (4:33 - 4:39)
Oh, interesting. Maybe those treatments sound like a topic for another episode of Normal Curves.

[Regina] (4:39 - 4:41)
They sound a little scary to me.

[Kristin] (4:42 - 5:03)
Yeah, a little suspect. All right. So this was back in like 2010.

I was looking for things to cover for Allure and I came across some headlines about a new genetic test for cellulite. Regina, I did not end up writing about it for Allure for reasons that we are going to see, but I did turn around and use the example for teaching in my statistics class.

[Regina] (5:03 - 5:05)
Ooh, which actually sounds more fun.

[Kristin] (5:05 - 5:59)
I think so. The test was called Cellulite DX and it was sold by a company called DermaGenoma. Interestingly, Regina, you mentioned that men worry about hair loss.

That company also sold a genetic test for hair loss called Hair DX.

[Regina]
Equal opportunity.

[Kristin]
Yes.

So here's how the test worked. You would send them a cheek swab and for about $250, they would analyze your DNA and tell you something about your risk of cellulite. But here is the fun part for nerdy people like us who like numbers, Regina.

In their marketing materials, they tell women how to interpret the test result and here's what they say. If you take this test and you test positive, you have a 70% chance of eventually getting moderate to severe cellulite. And if you take the test and test negative, there's a 50% chance that you won't get moderate to severe cellulite.

[Regina] (5:59 - 6:13)
That is a very particular way to present the numbers. They did that very carefully. But if you stop and think about it, it's not actually making a very compelling case for the test, is it?

[Kristin] (6:13 - 6:14)
No, it's not. Yes.

[Regina] (6:15 - 6:41)
Because they're saying if you test positive, okay, you have a 70% chance of cellulite. But if you test negative, there's a 50% chance of not getting it, which means there's a 50% chance of getting it. So 70%, 50%, no matter what, it feels like they're just betting on most women getting cellulite, right?

The test doesn't really help you distinguish whether you are one of those people or not.

[Kristin] (6:41 - 7:09)
And actually, the numbers are even a little worse if we dig further. In fact, I titled my lecture slides Fun Case Study, Bad Investment. The cellulite DX test was actually developed by some Italian researchers who published a paper in 2010 in the Journal of the European Academy of Dermatology and Venereology.

And that paper was the basis of the cellulite DX test. In that paper, they conducted what's called a case control study.

[Regina] (7:10 - 7:17)
Ah, case control study. I don't think we've talked through what a case control study is in any detail in this podcast before, have we?

[Kristin] (7:17 - 8:22)
I don't think we have. So let me just explain the term. A case control study is a study design where researchers start by finding people who already have the condition or disease of interest.

Those are the cases. And then they find similar people who do not have the condition or the disease. And of course, those are the controls.

This design is great for rare conditions because you go out and you specifically select people who have the condition you are interested in. And that way, you make sure you get enough of those people in your sample. But the trade-off is that the proportion of cases to controls in your sample, it's artificial.

It doesn't reflect the real world. And that means some statistics, as we are going to see, cannot be calculated directly from a case control study. So getting back to the 2010 study, they recruited 200 lean women with moderate to severe cellulite.

Those are our cases. And they recruited 200 lean women with no cellulite at all. And those are the controls.

And the controls were what we call age and BMI matched.

[Regina] (8:23 - 9:17)
Oh, interesting. Matching is such a cool technique. Kristin, how about if I just kind of think out loud about what that would look like here?

Yeah, good. Right, because it's kind of easy here. Okay, so say one of their cases in the study was a 25-year-old who had a lot of cellulite.

And let's say her BMI was 21. So that means they purposely went out to find a control with the same age and same BMI. So a 25-year-old with a BMI of 21 who did not have cellulite.

So kind of like doppelgangers, right? But one with cellulite, one without. And they do this because age and BMI, I'm guessing, both affect cellulite.

So now we can avoid confounding by those variables, right? It's like we have a nicely matched control group.

[Kristin] (9:17 - 9:59)
Right, age and weight affect cellulite. So they want the groups to be similar in those so that they can isolate the effect of genetics. And the groups were indeed nicely balanced.

Both groups had an average age of about 30 and an average BMI of 22, which is fairly lean. Then the researchers analyzed the women's DNA. They looked at 15 different genes that they thought might be related to cellulite.

They found two statistically significant, statistically discernible differences between cases and controls. They only used one of those genes in the cellulite DX test, though. So I'll just talk about that gene, which was the ACE gene, which stands for angiotensin-converting enzyme.

[Regina] (10:00 - 10:05)
Hmm, ACE. Is that like the ACE inhibitors that people take for blood pressure?

[Kristin] (10:06 - 10:11)
That's it. Exactly. Yes.

The ACE gene is part of the system that regulates blood pressure.

[Regina] (10:11 - 10:15)
Okay. How do we get from blood pressure to cellulite, then?

[Kristin] (10:15 - 10:46)
It's not entirely clear, but the authors speculated that changes in blood flow or how much oxygen is reaching the skin, maybe that could affect cellulite formation.

[Regina]
So what did they find about the ACE gene?

[Kristin]
Remember, everyone carries two copies of each gene, one from each parent.

For the ACE gene, they found that 168 women in the case group carried at least one copy of a particular variant of the gene called the D allele.

[Regina] (10:46 - 10:50)
Hmm, D allele, D for dimpled, dimpled butt.

[Kristin] (10:51 - 11:12)
Ah, yes, that's a great way to remember that. Actually, I think D was for deletion, but we're going to remember D for dimpled here. Yes.

Thank you. So 168 cases carried at least one copy of the D, the dimpled allele, and that number was 148 in the control group. And this was a statistically significant difference.

[Regina] (11:13 - 11:23)
But wait a minute. You said 168 versus 148, each out of a group of 200 women. That is not exactly a huge difference.

[Kristin] (11:23 - 11:51)
It is not, which is why we're going to see that this does not make a great predictive test. But the authors of this paper still decided to commercialize it. The cellulite DX test is very simple.

If you carry at least one copy of the D allele, that is a positive test. If you don't, that is a negative test. So now, Regina, let's actually calculate some numbers about test performance and get ready because I am going to quiz you in a moment.

[Regina] (11:52 - 11:53)
I should have had more coffee this morning.

[Kristin] (11:55 - 12:39)
Before we get into the numbers, let me just give a brief overview of the statistics we're going to cover. Sensitivity and specificity tell us about the accuracy of the test. Positive and negative predictive value, on the other hand, instead of asking about how the test behaves, they ask what the result means for you.

Regina, I'm going to rant just for a second here, if you'll permit me. Of course. Because when I teach these concepts in class, students drive me crazy because they want me to give them a formula for these concepts.

Okay, we often put the numbers in a two-by-two table, meaning a table with two rows and two columns, and we label the cells A, B, C, and D, and the students want me to give them a formula like sensitivity is A divided by A plus C.

[Regina] (12:39 - 12:49)
You know, you're just going to confuse yourself that way. You have to think through what sensitivity and specificity is. I think it's the only reliable way to understand them.

[Kristin] (12:50 - 13:15)
Yeah, exactly. You can really mess this up if you rely on formulas, and the fact is you don't need formulas. Regina, quiz time.

Let's start by calculating sensitivity. Sensitivity is how sensitive the test is. Sensitivity asks, out of everyone who actually has the disease or condition, how many does the test correctly identify?

So, Regina, what is the sensitivity for this cellulite DX test?

[Regina] (13:15 - 13:48)
Okay, sensitivity, like you said, is the fraction of the cases that the test actually caught, and here we have 200 cases, so we're just going to look at those 200 cases with cellulite, and you said 168 of them tested positive for the D allele, so sensitivity is 168 divided by 200, which, let me see, divide both by 2, 84 out of 100, 84 percent, which actually sounds pretty good. That's a good sensitivity.

[Kristin] (13:49 - 14:14)
Yeah, that's correct, Regina, and it does sound pretty good, but it's not enough just to have good sensitivity. We also need good specificity, meaning that the test is specific, that it avoids false positives. So, specificity asks, out of everyone who does not have the disease, how many correctly test negative?

And quiz time again, Regina, what's the specificity of the cellulite DX test?

[Regina] (14:14 - 14:56)
All right, specificity. We're going to look now at the 200 women without cellulite, the 200 controlled, and you said 148 of them carried the allele, so 148 tested positive, but those are the false positives because they don't have cellulite, and we want to know for specificity how many correctly tested negative, so 200 minus 148, 52 tested negative. So, the specificity is 52 divided by 200, and then divide top and bottom by 2, 26 percent, which is not great, not a very specific test, is it?

[Kristin] (14:57 - 15:03)
It is not. I mean, we can see almost three quarters of the controls are falsely testing positive, so not a great test.

[Regina] (15:04 - 15:20)
This is fascinating. So, this specificity, this low specificity means you are going to be potentially freaking out a lot of people who do not need to be freaked out here. They might be running out to get treatment that they don't need.

[Kristin] (15:20 - 15:26)
Yeah, exactly. So, already we can see that this is probably not a great test, but it gets worse.

[Regina] (15:26 - 15:27)
How could it be worse?

[Kristin] (15:28 - 15:50)
Well, patients don't care about sensitivity or specificity, right? What a patient wants to know is when they get a result, what's the chance that the result is correct? Meaning, if you test positive, that you actually have the disease, and if you test negative, that you actually do not have the disease.

And these quantities are what are called positive and negative predictive value.

[Regina] (15:50 - 16:02)
Right. We want to personalize it, right? Sensitivity and specificity are helpful, but they're about the test.

Itself. But what we care about is us. What does it mean for me and my thighs?

[Kristin] (16:03 - 16:19)
Exactly. So, let's start with the positive predictive value, also called PPV. That tells you, if you test positive, what is the chance that you actually have the disease or condition?

So, pop quiz again, Regina. What was the positive predictive value for cellulite DX?

[Regina] (16:20 - 16:26)
Nice try, Kristin. This is a trick question. You cannot fool me that easily.

Nope.

[Kristin] (16:26 - 16:35)
Yeah, I didn't, I didn't think you were going to fall into the trap, Regina. I expected you would see through this one, but I want you to explain for listeners why you cannot calculate this.

[Regina] (16:36 - 16:44)
Right. It's a trick question because you cannot calculate positive predictive value from a case control study.

[Kristin] (16:44 - 17:55)
Right. This is where, as I alluded to earlier, case control studies have some statistical limitations. Students may be tempted to calculate positive predictive value by counting up the total number of positive tests in the sample and then determining the fraction of the positive tests that are cases. But that is wrong.

Positive predictive value depends on how common the disease or condition is in the population you're testing. That's what we call the base rate. But the prevalence of cellulite in our sample is artificial.

Remember, here the researchers forced the sample to have 50% cases and 50% controls, but 50% is not the prevalence in the real world. So if you use the numbers from the sample to calculate PPV, you are baking in this fake prevalence and you are going to get PPV wrong. To calculate PPV correctly, you need to incorporate information about the true base rate in the population.

And calculating PPV wrong, it's actually a common mistake and we're going to see a real world example of this mistake later in this episode.

[Regina] (17:55 - 17:57)
Oh, no. Did the researchers here make that mistake?

[Kristin] (17:58 - 18:17)
You know what? Actually, no, Regina. Believe it or not, they did a good job.

They got this part right. So here's the fun part. In their marketing materials for physicians, they use something called Bayes' rule to incorporate the correct base rate of cellulite.

And they even cited Bayes' rule in their marketing materials, which I just got a kick out of.

[Regina] (18:17 - 18:40)
Oh, that is so cool. Reverend Thomas Bayes, the developer of the Bayes' theorem. You know, side note, I actually made a pilgrimage to see his grave in London, which also happens to be not far from the Royal Statistical Society that I was visiting.

And that's all a fun, useless fact. So there you go. You're welcome.

[Kristin] (18:40 - 18:53)
Thank you. Actually, that is a fun fact. Maybe only one that statisticians like us care about.

But you know who else might appreciate that fun fact, Regina? Trivia enthusiasts. This is great for Trivia Night.

[Regina] (18:55 - 19:02)
Statisticians Trivia Night at the local pub? I don't know, Kristin. That would be something scary to behold.

[Kristin] (19:02 - 19:21)
Wait, Regina, this is actually a great idea. We could find dates there at the Statisticians Trivia Night at the local pub. And, you know, we can be a little lenient.

Like, you don't have to carry your establishment statistician card to enter this Trivia Night. We'll take anyone who is tall, dark, and handsome and likes numbers.

[Regina] (19:22 - 19:23)
No, that's your fantasy.

[Kristin] (19:23 - 19:31)
That is my fantasy. Okay, back from fantasy, back to reality. Wait, what were we talking about before?

Oh, Bayes' rule. Bayes' rule.

[Regina] (19:32 - 19:33)
I love Bayes' rule.

[Kristin] (19:34 - 20:20)
Bayes' rule lets us flip what are called conditional probabilities. So here, we know the sensitivity of the test. We know that if you have cellulite, then you have an 84% chance of testing positive.

But what we want to know is the reverse of that, the flip of that. We want to know if I test positive, what's the chance that I will actually develop cellulite? Those two probabilities are not the same.

And to get from one to the other, you have to bring in the base rate, that underlying prevalence, using Bayes' rule. To the credit of the company, they did this part correctly. They used Bayes' rule to calculate that positive predictive value.

And they used a value for the base rate of 65%.

[Regina] (20:20 - 20:29)
Ah, so 65% of women in the general population have moderate to severe cellulite, which is a lot of women, I'll just point out.

[Kristin] (20:29 - 20:33)
Yeah, that's very high. It seems like a reasonable estimate based on other studies.

[Regina] (20:34 - 20:36)
So what was the positive predictive value then?

[Kristin] (20:36 - 20:58)
Regina, I'm not going to go through the whole calculation on air. I'll put the details in the show notes. But when they applied Bayes' rule, they got a value of 68%.

Remember, we talked about their marketing materials earlier. Remember, they said in that marketing that if you test positive, you have a 70% chance of developing moderate to severe cellulite. It's actually 68%, and they round it up to make it seem more impressive.

[Regina] (20:59 - 21:05)
I'm not sure that's really kosher. I do not approve of that. That is reckless rounding, what I would call.

[Kristin] (21:06 - 22:34)
Yeah, playing a little fast and loose with numbers. They also used Bayes' rule to estimate the negative predictive value. And that is if you test negative, what is the probability that you actually do not have or won't get cellulite?

And that number came out to be 47%. And again, they played a little fast and loose with those numbers there because in their marketing materials, they said that you had a 50% chance of not developing cellulite if you tested negative. Really, it was only 47%.

Of course, a 47% chance of not developing cellulite means you have a 53% chance of developing it. So let me sum up all the numbers here, Regina. If you don't buy their test, you have a 65% chance of getting moderate to severe cellulite just because that's how common it is in women.

If you take the test and test positive, your probability moves from 65% to 68%. And if you test negative, your probability drops from 65% to 53%. And I should note that this finding has never really been replicated.

So these numbers may even be an overestimate of the test's performance. Regina, there is an important take-home lesson from this example. You can have a biomarker that is statistically associated with a disease or condition of interest, and that does not necessarily mean that it will make a useful predictive test.

And we're going to see this again and again in this episode.

[Regina] (22:34 - 23:10)
Wow. So I am spending a lot of money to get this test, let's say. And if I'm testing positive, it's not really adding much information onto what I would already know about my butt getting dimply.

If I test negative, I still am more likely than not to get a dimply butt. So I'm not sure this is changing my life very much, Kristin. I think I should maybe just save the money, spend it on a gym membership and some, you know, carrot juice and just go on the assumption that I will probably get cellulite because I am a human woman.

[Kristin] (23:10 - 23:34)
Yeah, that is a great way to sum it up. And, Regina, I want to point out that I've been talking about how much the test moves your probability, like from 65% to 68%. There is a statistic that formalizes that idea, and it's called a likelihood ratio.

I'm going to save that detail for the show notes for anyone who's curious. Certainly, we can conclude from all of this that this was indeed a bad investment, and I'm happy to report, Regina, that the company went defunct eventually.

[Regina] (23:35 - 23:40)
Maybe your students went on to warn everyone else away from your bad investment.

[Kristin] (23:40 - 23:44)
Yeah, I am sure that I single-handedly took down that company, Regina.

[Regina] (23:44 - 23:47)
Could be. You could have if you wanted to.

[Kristin] (23:47 - 24:06)
I'd like to think so. Here's the funny thing, though. This test has reemerged in other forms.

So, Regina, if you really wanted to, you could still get it done. And when I was searching up to see what had happened to Cellulite DX, I actually discovered a wealth of other hilarious direct-to-consumer tests that I want to talk about now.

[Regina] (24:06 - 24:43)
Ooh, I cannot wait to hear about these, but after the break.

Welcome back to Normal Curves. Today, we are talking about the statistics used for diagnostic and screening tests.

And now we are going to hear about some funny genetic tests that are commercially available.

[Kristin] (24:44 - 24:49)
Okay, so we already talked about how I single-handedly put DermaGenoma and Cellulite DX out of business.

[Regina] (24:49 - 24:53)
I am so disappointed that I cannot get the Cellulite test anymore.

[Kristin] (24:53 - 25:43)
Well, don't worry, Regina, though. It turns out you can get the same gene test because it has reemerged in other places sold by other companies. So I was Googling it and I found that there's a company called GenomeLink where you can pay money to upload your DNA data from sites like 23andMe, and it will look for that same gene variant from the Cellulite DX test.

And if you have it, it will flag you for Cellulite. And they even cite that 2010 paper as evidence for that gene. And there's also a company, GeneEx Diagnostics, that sells the DNA skin health test.

And that includes the same Cellulite DX gene and a bunch of other genes supposedly linked to skin health, including one that they say is related to skin glycation.

[Regina] (25:44 - 25:49)
Oh, interesting. Glycation. Didn't we talk about glycation in that Sugar Sag episode about wrinkles?

[Kristin] (25:50 - 26:15)
We did, and everyone should check that episode out because it's fascinating. But, Regina, this company, GeneEx Diagnostics, it is a goldmine of laughs. They offer some hilarious tests.

Ooh, I can't wait. Tell me. You can pay $149 each to find out if you have the female infidelity gene, the promiscuity gene, or the warrior gene.

[Regina] (26:15 - 26:31)
Okay, Kristin, I am Googling this right now. And let me get this straight. The idea is that you send your partner's hair or cheek swab to this company and you find out whether they have, for example, the warrior gene?

[Kristin] (26:31 - 26:39)
Yeah, you're right. You may not be sending this in for yourself. This might be more useful if you're sending it in, you know, for other people like your romantic partner.

[Regina] (26:39 - 27:04)
Okay, Kristin, so you're saying I need to swab my date's cheeks.

[Kristin]
Yes, possibly.

[Regina]
You have a very interesting conception of what my dating life looks like.

Send that in. Okay, but, Kristin, looking through the site again, this is the gene test that I really want. I'm seeing it here.

It's the test for the male pair-bonding gene.

[Kristin] (27:04 - 27:04)
Oh.

[Regina] (27:06 - 27:29)
Okay, but wait a minute, Kristin. This is fascinating. Look a little more closely at this.

The male pair-bonding gene is the exact same gene as the so-called female infidelity gene, but they just market it differently. It's the same thing. They just call it different names.

[Kristin] (27:30 - 27:37)
That's crazy. Oh my goodness, that is fascinating. I hadn't noticed that.

I mean, have a little gender bias, a little judgment on women there. Wow.

[Regina] (27:38 - 28:13)
That is clever. That is clever marketing. I will give them that.

Okay, the warrior gene, I'm interested in that one. Looking at it now, because who can resist that name? Kristin, it says that it will tell you if you are aggressive and make good business decisions.

That is quite a split there. Am I looking for positive or negative on that one for my dates? Like, do I want them aggressive and good at business, or do I want them non-aggressive, making bad business decisions?

[Kristin] (28:14 - 28:21)
I think it depends on your dating goals, Regina. I mean, do you want the seven-boat guy if he comes with a side of aggression? Remember him?

[Regina] (28:23 - 28:30)
Okay, that is an excellent point, and I think the seven-boat guy that you're referring to might have come with a side of aggression.

[Kristin] (28:32 - 28:56)
I suspect so. Regina, there is one test on their website that might not be total bunk. They test for the gene for alcohol intolerance, which we actually talked about that gene back in the alcohol episode.

It's a real thing, and that gene is strongly linked to alcohol use, and therefore they can use it for Mendelian randomization studies, as we talked about.

[Regina] (28:56 - 29:06)
Kristin, I'm going to guess that you probably already know if you're intolerant to alcohol. You probably don't need a gene test to tell you you feel bad after you drink.

[Kristin] (29:06 - 29:35)
Yeah, you can just use the vodka test for that, which is going to be a lot cheaper. Like, pound some vodka and see how you feel, and you're probably right. You don't need this expensive gene test.

Another fun one on their website. They have a test called the DNA fitness test, and this is not about the fitness of your DNA. This one has genes that are supposedly related to athletic performance.

So I guess you can see if you're going to be the next Michael Phelps, and interestingly, this panel of genes also includes the ACE gene. That one comes out again.

[Regina] (29:35 - 29:39)
Okay, so Dimply Betts or Michael Phelps. It could go either way.

[Kristin] (29:40 - 29:41)
High blood pressure, too.

[Regina] (29:43 - 29:46)
Okay, but Kristin, getting back to the infidelity gene.

[Kristin] (29:47 - 29:49)
You are really fascinated by that one.

[Regina] (29:49 - 30:08)
Okay, I might be a little obsessed. Probably not surprising. All right, here's what I love, Kristin.

Take a big-picture view of this site. Look at this site. They are offering tests for the infidelity gene, and they're also offering tests for STDs.

Do you have a sexually transmitted disease right now?

[Kristin] (30:09 - 30:15)
And we should know, Regina, that those are not gene tests. Those are like, are you carrying an icky germ right now test?

[Regina] (30:16 - 30:44)
Right, right. Exactly. But look what they're doing.

Look at this whole thing. Okay, they're offering you a test to find out if you or someone else is prone to infidelity. Then they're offering you a test to find out whether you have an icky STD germ right now.

And look over on the left. They are also offering you paternity tests. So I feel like they have cornered a certain market here.

Don't you think? There's a narrative arc that is happening.

[Kristin] (30:45 - 30:51)
That is fascinating, Regina, and I bet you this is how this company is surviving. This is how they're making their money.

[Regina] (30:51 - 30:53)
Repeat business. People keep coming back.

[Kristin] (30:54 - 30:55)
Oh, yeah.

[Regina] (30:55 - 31:32)
So I am still looking here, Kristin. They have a test. Did you see this?

For increased susceptibility to wrinkles, which cracks me up. How does this help anyone? Because if you are human, you are going to get wrinkles, like guaranteed.

Are you susceptible to wrinkles? Are you susceptible to aging? Yes, you are.

Wait a minute though, Kristin. There is one for freckles. Did you see this?

Can't you just look and see whether you have freckles? The mirror test for freckles has 100% sensitivity and specificity. The mirror test. That's all you need.

[Kristin] (31:32 - 31:33)
It does.

[Regina] (31:33 - 31:37)
You are right. Goldmine of laughs. I love it.

Good find.

[Kristin] (31:37 - 32:05)
It really is. And this is also, though, Regina, why statistical literacy is so important. But, Regina, I am seeing a value in this company, actually, a silver lining of sorts.

This is how you and I should be screening dates. And I don't mean by sending their cheek swabs and paying this money. I have a better and easier idea.

Just bring up the website during the date. And if they laugh hysterically like us, then I say it's a keeper.

[Regina] (32:06 - 32:32)
Okay, so basically you're saying this website has very high sensitivity for being a keeper. And I'm going to say it has okay medium specificity because sensitivity, if they laugh at it, you know, they are definitely a keeper. Now, if they don't laugh at it, I don't know about you, not necessarily a weed out, but, you know, not feeling totally hopeful.

[Kristin] (32:33 - 33:00)
Of course, Regina, all of this got me curious and I started to dig into some of the studies behind these tests. Regina, I know you like the infidelity gene, but I was very curious about the wanderlust gene because in my family, we joke that my mom's side of the family has the worry gene and my dad's side of the family has the wanderlust gene. And I have always felt personally that somehow I got exactly 50% of each and that perfect 50-50 balance actually gets me into trouble.

[Regina] (33:01 - 33:11)
You are an interesting example of a hybrid vigor. Does that mean that you wander but then you worry about it while you're doing it? How does this play out in your life?

[Kristin] (33:11 - 33:37)
Yeah, something like that. I mean, I think sometimes the wanderlust gene gets me into something adventurous and then about halfway through my adventure, the worry gene kicks in and then I'm in trouble. So, for example, when I first moved to California, I wanted to go to wine country and take a hot air balloon ride because it sounded super cool to the wanderlust gene.

But then after I was already like thousands of feet into the air, I suddenly remembered, oh wait, I'm not a huge fan of heights.

[Regina] (33:40 - 33:47)
That's a hilarious story. That was a twist I was not expecting. That is a bad interaction effect.

I will give you that.

[Kristin] (33:47 - 34:03)
It is, yeah. But that's why it really got me laughing when I saw that there's actually a real thing called the wanderlust gene. I thought my family had just made it up.

And of course, I had to look up the story behind this test. So, let's take a minute to go in more depth in this one because it's another good teaching example.

[Regina] (34:03 - 34:08)
Ooh, wanderlust is fun and it's got the word lust in it. So, I'm happy.

[Kristin] (34:08 - 34:19)
Well, we've gotten sex into this episode. I guess we already had with the infidelity and sexually transmitted diseases, but all right, good. So, this gene is called DRD4 and it codes for a dopamine receptor.

[Regina] (34:20 - 34:26)
Dopamine. Oh, that's very hot these days. Related to the brain's reward and motivation systems.

[Kristin] (34:26 - 34:43)
Exactly. And this wanderlust gene traces back to a study published in 1996 that found that people carrying a particular variant of this gene called the 7-repeat allele, they scored a few points higher on a personality trait called novelty-seeking.

[Regina] (34:44 - 34:48)
Novelty-seeking is what exactly? And how do they measure it?

[Kristin] (34:48 - 35:11)
They use the novelty-seeking scale. And let me just read how one paper describes it. People who score higher on the novelty-seeking scale are characterized as impulsive, exploratory, fickle, excitable, quick-tempered, and extravagant.

Whereas those who score lower than average tend to be reflective, rigid, loyal, stoic, slow-tempered, and frugal.

[Regina] (35:11 - 35:24)
That is quite a list of adjectives. I think someone swallowed a thesaurus there. But there was nothing about wanderlust in there.

Why do they think this gene is related to wanderlust?

[Kristin] (35:24 - 35:39)
Right. It might be a bit of a stretch, but researchers speculate that people with the 7-repeat variant might have underactive dopamine receptors. So the theory is they might need more stimulation in order to get the same dopamine hit as other people.

[Regina] (35:39 - 35:45)
Okay. So they have a lust to wander to get the dopamine hit. Okay.

All right. I'll buy it.

[Kristin] (35:46 - 35:54)
So the 1996 study, as you might imagine, Regina, got a lot of press and it inspired a slew of follow-up studies, which found kind of mixed results.

[Regina] (35:55 - 35:56)
Of course they did.

[Kristin] (35:57 - 36:15)
Yeah. Some researchers found small associations. Some researchers found no association.

Eventually, there were enough studies to do a meta-analysis and in 2002, there was a meta-analysis that combined data from 20 studies and about 4,000 people. And Regina, guess what they found when they pooled the data?

[Regina] (36:16 - 36:20)
The wanderlust effect disappeared. The wanderlust wandered off.

[Kristin] (36:21 - 36:34)
Exactly. There was no significant difference between carriers and non-carriers of the gene. The Cohen's d was, drumroll please, 0.06 standard deviations.

[Regina] (36:35 - 36:40)
0.06 is tiny. Remember, we need something like 0.2 before we start to care about it.

[Kristin] (36:41 - 37:00)
Right, because zero means no association and that's awfully close to zero. Now, to be fair, more studies were done after 2002 and there was a 2018 meta-analysis that added in some additional studies and they did find a significant effect, but the Cohen's d was just 0.16. Not really much better then.

[Regina] (37:00 - 37:02)
Still not 0.2. Not much better.

[Kristin] (37:02 - 37:14)
And Regina, I just want to link this back to positive predictive value to bring that point home that we talked about earlier. To calculate things here though, wanderlust is not a binary condition, right? They measured it on a scale.

[Regina] (37:15 - 37:27)
But Kristin, cellulite is not really binary either. You can have a whole range of cellulite, but our researchers before were not deterred by that fact. Can we, like, dichotomize it the same way they did?

[Kristin] (37:27 - 38:25)
Yeah, that's exactly what I did. Just for illustration purposes, I said, hey, let's imagine that we divide people into wanderlusters and non-wanderlusters and I used, if you are in the top quartile, that is the top 25% of that novelty-seeking scale, we're going to call you a wanderluster and everybody else is not a wanderluster. So then we can do some back-of-the-envelope calculations to find the positive predictive value.

If we assume a Cohen's d of 0.16 from that last meta-analysis, we can show that the positive predictive value for that gene test would be 29%. The negative predictive value would be 76%, meaning that 24% of the people who tested negative would be wanderlusters. In other words, before you take the test, you have a 25% chance of being a wanderluster.

If you take the test and test positive, this increases your chances to 29% and if you take the test and test negative, this decreases your chances to 24%.

[Regina] (38:25 - 38:42)
That's not changing things very much at all. This is not worth a lot of my money, Kristin. It feels kind of like a bad fortune teller.

You go and you pay your money and then they just kind of shrug at you and say, maybe, maybe not. Who knows?

[Kristin] (38:43 - 38:55)
Yeah, so once again, we have a statistically significant, statistically discernible association that sounds dramatic in headlines, but has almost no predictive value for an individual person.

[Regina] (38:55 - 38:59)
So how can they continue to offer these tests? How do they get on the market?

[Kristin] (38:59 - 39:17)
Regina, there is a loophole. Of course there is. Many of these direct-to-consumer tests do not require FDA approval or oversight, so pretty much anyone can sell anything and there's no guarantee that the test actually works or has any meaning or has been validated in a real population.

[Regina] (39:17 - 39:21)
That is a pretty big loophole. That's like a gaping crater hole.

[Kristin] (39:21 - 39:26)
It is, but of course people see an opportunity to make money selling you useless information.

[Regina] (39:26 - 39:27)
Yay, capitalism.

[Kristin] (39:28 - 39:37)
Hey, I'm not anti-capitalism because maybe some capitalists want to actually sponsor our podcast and we don't mind if people want to pay for the information we're providing.

[Regina] (39:38 - 39:43)
But Kristin, there's an important difference I want to point out. What we provide is actually useful and accurate.

[Kristin] (39:44 - 40:30)
True, and instead of paying $149 to take the Wanderlust gene test, I say you just pay us a fraction of that so we can tell you why the test is worthless and then you come out ahead, simple math. Regina, the other thing I want to point out about tests like this, it's not just about the statistics. A test is only useful if it's actionable.

For example, if there's a treatment that you can take to prevent cellulite, and it's not clear that any of these tests that we're talking about are actionable. All right, Regina, so far we've been talking about some pretty funny tests, but the next test I want to talk about is more serious. It purports to test for autism, which of course is more consequential.

And not only that, but the researchers here got some important numbers wrong.

[Regina] (40:30 - 41:12)
Oh, that sounds like some sleuthing. Cannot wait to hear more, but first a short break. Welcome back to Normal Curves.

Today, we're talking about the statistics that are used for diagnostic and screening tests. And we were about to talk about another case study that had some wrong numbers. Kristin, what happened?

[Kristin] (41:13 - 41:26)
In 2018, a journalist from the publication Spectrum reached out to me about a paper in the journal Biological Psychiatry. The paper was still under embargo, which means it had been released to journalists, but not the public.

[Regina] (41:27 - 41:41)
Embargoes. For people who don't know, this is when the journal gives reporters time to report on a scientific paper they're about to publish, so the journalist's news story can come out right when the paper is published.

[Kristin] (41:42 - 41:58)
So this paper used a subset of data from a larger study that had over a thousand kids. And in this paper, the researchers compared 516 kids who had autism to 164 age-matched kids who were developing normally.

[Regina] (41:58 - 42:02)
Ah, so it's kind of like a case-control sub-study.

[Kristin] (42:02 - 42:34)
Yeah, exactly. The larger study wasn't exactly a case-control study, but the analysis done in this paper is comparing cases to controls, so we'll call it a case-control analysis. And as we saw with the previous case-control study, our sample is enriched for the outcome of interest.

It's enriched for autism here. The aim of the analysis was to identify blood markers that could distinguish the two groups, cases from controls. And these markers were amino acids and ratios of these amino acids.

So, for example, if I take glycine levels and divide that by leucine levels.

[Regina] (42:35 - 42:37)
And amino acids, why?

[Kristin] (42:37 - 42:53)
The researchers hypothesized that there might be differences in metabolism in kids with autism that would show up potentially as altered amino acid levels. But Regina, they actually found no difference in the average levels of these biomarkers in the two groups.

[Regina] (42:53 - 42:55)
Okay, so they found nothing?

[Kristin] (42:57 - 43:22)
Well, almost. They didn't find any differences in the average values, but they found that more kids with autism ended up above a cutoff on just a few isolated amino acid ratios. And they said that being high in these ratios represented a metabolism problem that they named amino acid dysregulation metabotype.

[Regina] (43:22 - 43:36)
Ooh, metabotype. That is kind of fun to say. It kind of sounds like a 1950s robot or like a 1950s laundry detergent.

Hey, housewives, get your metabotype now.

[Kristin] (43:37 - 43:47)
Yeah, it's one of those names that feels like maybe it's overcompensating for something, trying too hard. Maybe the fancy jargony name is masking a lack of substance, Regina?

[Regina] (43:48 - 43:52)
Yeah, I'm wondering about this. Is it just a fishing expedition, Kristin?

[Kristin] (43:52 - 44:36)
It does seem like they're really reaching here, Regina, right? No differences in the amino acids, so look at ratios. No difference in the ratios, so look at cutoffs.

What they're finding could just be noise. But Regina, they did do something statistically to try to address this. I want to give them credit for that.

I'll put more details in the show notes. I'm not sure it entirely fixes the problem, but we're going to go with what they found as if it's a reproducible pattern and then just see where the numbers take us. Okay.

Because here's where it gets interesting. The authors went from saying, hey, we see a potential statistical pattern to, hey, let's now use that as a predictive test. So a positive test here is the presence of any one of these metabotypes, these elevated ratios.

[Regina] (44:37 - 44:43)
So did they calculate any of these diagnostic statistics that we've been talking about?

[Kristin] (44:43 - 45:20)
That's exactly what they did, Regina. And this is where they got something majorly wrong. So let's get into the numbers now.

Remember, we have 516 kids with autism and only 164 without autism. 86 of the kids with autism had one of these metabotypes versus only 6 of the kids without autism. In other words, 86 true positives, 6 false positives.

In terms of percentages, that's about 17% of the kids with autism testing positive versus about 4% of the kids without autism. And so, Regina, quick quiz again. What would the sensitivity and specificity be here then?

[Regina] (45:21 - 45:47)
Luckily, I was paying attention and I like how you did some of the math for me already. So sensitivity is that given that you have the condition, how well does the test detect it? And you said of the kids with autism, 17% tested positive for this metabotype.

So sensitivity is 17%, which is not great. It's a low number.

[Kristin] (45:47 - 45:54)
It's not great, but the authors actually got this right and they transparently reported it. How about specificity?

[Regina] (45:54 - 46:16)
Specificity is how specific the test is, meaning if you don't have the condition, what is the chance that you correctly test negative? And you said of the kids without autism, 4% tested positive, but we want the other side. So 96% tested negative.

So 96% is the specificity.

[Kristin] (46:17 - 46:23)
That's right. And again, that is exactly what the authors reported. Now, Regina, how about positive predictive value?

[Regina] (46:23 - 46:49)
Kristin, you tried once already to trick me. I know, trick question. We talked about how this is like a case control analysis.

They chose how many people in the sample have autism and don't have autism. That means we don't know the base rate of autism in the population. And if we don't know the base rate, we cannot calculate positive predictive values.

So...

[Kristin] (46:49 - 46:50)
A plus, Regina, A plus.

[Regina] (46:50 - 46:51)
Thank you.

[Kristin] (46:51 - 47:04)
Right, because about three-quarters of their sample was children with autism. Clearly, three-quarters of children in general do not have autism, right? So if you calculate PPV using these fake percentages in the sample, you're going to get PPV wrong.

[Regina] (47:04 - 47:17)
Well, no, don't tell me that's what they did. They incorrectly calculated positive predictive value by looking at everyone who tested positive in their sample and asking how many had autism, which is wrong.

[Kristin] (47:17 - 47:20)
That is exactly what they did, Regina. And yes, they would fail my class.

[Regina] (47:21 - 47:23)
I am so disappointed in them.

[Kristin] (47:23 - 48:12)
Yeah, they said, hey, look, 92 total positives in our sample, right? 86 plus 6. 86 of them have autism.

So PPV is 86 divided by 92, which is a whopping 93.5%. And to see why this is wrong, just think about if we applied this test here to the general population. So let's imagine we test 1,000 random kids and let's say about 2% of kids have autism in the population. So that would mean that there would be 20 kids with autism that we were trying to find.

If the sensitivity is 17%, that means we're only going to find three out of those 20 kids. Only three out of those 20 kids are going to test positive. We have 980 kids, on the other hand, who don't have autism.

And when we apply the test to them, 4% of them are going to falsely test positive. And that's about 39 kids who would falsely test positive.

[Regina] (48:13 - 48:30)
Oh, I love the way you have laid this out because you can really see how those numbers come out. Because the base rate is so low, most kids don't have autism. So even though we're just talking about 4% false positive, 4% of a lot of kids is a lot of kids.

[Kristin] (48:31 - 48:44)
So if you look at the total positive test, there's 42 kids testing positive. Only three of them have autism. So the positive predictive value is just 7%, nowhere near the 93.5% that they were reporting in their paper.

[Regina] (48:44 - 49:21)
Oh, I love how you laid that out. You're saying, hey, there are way more false positives, false alarms than anything else if you administer this test in the general population. So I love this as a teaching example because it's not even that this kind of test is kind of useless.

The thing is, it could be potentially harmful, psychologically harmful or cost money because it's needlessly going to worry a bunch of parents who are just getting what turn out to be false alarms about their kids. So Kristin, you saw this mistake in the paper very clearly. It's very clear.

What did you do next?

[Kristin] (49:21 - 49:42)
Well, since I was reviewing this paper for a journalist, I let the journalist know about this glaring error, as well as some other major problems in the paper. But then I also reached out to the corresponding author of the paper because the paper hadn't come out yet. And that meant he still had a chance to rescue the paper and fix his mistake prior to publication.

So I wanted to let him know.

[Regina] (49:42 - 49:49)
That's you being a good statistical Samaritan, isn't it? Instead of just a statistics cop. I admire that.

[Kristin] (49:49 - 50:08)
Yeah, I mean, I think it's only fair to let authors know when you spot errors so they have the opportunity to correct them. But Regina, unlike the gracious response that we got when we reached out to the authors of that marathon paper that we talked about back in the marathon fueling episode, this time their response was less satisfying.

[Regina] (50:08 - 50:16)
Less satisfying does not sound very good. Were they not grateful? I mean, they should have been thanking you.

What did they do?

[Kristin] (50:16 - 51:04)
They just kind of ignored me or brushed me off. So they never actually acknowledged that their number was just flat out wrong. They came back and tried to argue.

Oh, but wait, we're not planning to apply this test to the general population. We're going to apply it to a group of higher risk kids. So the prevalence of autism would be higher than 2%.

But that is still a total fudge because their 93.5% value for PPV is only correct if they are applying the test to a population in which three quarters of the kids being tested have autism, which is just not a population you're ever going to see in the real world. Just as an example, one of the highest risk populations out there is male younger siblings of girls with autism. Even in that very high risk population, only about one-sixth of the kids have autism.

[Regina] (51:04 - 51:37)
Which is about, what, 16%? You are saying there is just no real-world scenario where this testing population would have a prevalence of autism that is so high as what they claim, 75%. And that is why it's misleading for the researchers to have used those numbers for their conclusions.

Kristin, this just gets back to our teaching point about positive predictive value. It depends on the prevalence of the condition in the population that you're testing.

[Kristin] (51:37 - 52:00)
Exactly. And, you know, I followed their logic to its natural conclusions. I said, okay, let's say what would happen if we applied this test to a high-risk population like male younger siblings of girls with autism?

The positive predictive value there would be 48%. And that 48% actually had a really wide confidence interval, meaning it could be a lot lower than 48%.

[Regina] (52:00 - 52:16)
Okay, still nowhere near their 93.5% that they reported and nowhere near what would be needed for this to be a good diagnostic test. So, Kristin, after you brought this to their attention, did they not fix the paper?

[Kristin] (52:16 - 52:24)
No, they basically politely blew me off. And then within a few days, the paper was published and it got some media attention.

[Regina] (52:24 - 52:25)
I bet it did.

[Kristin] (52:26 - 52:44)
So the piece from Spectrum, from the journalist that I was helping, that piece was appropriately skeptical, but a number of other outlets were much more enthusiastic. And what I realized then, Regina, was why the researchers had blown me off. It turns out that they had a major financial stake in this paper.

[Regina] (52:44 - 52:53)
Oh, we need to come up with a good name for these people. I don't want to call them grifters, but they're not not grifters either.

[Kristin] (52:54 - 53:09)
Yeah, believe it or not, Regina, there was already a company involved at this point, and they were already selling this metabotype test described in the paper as a commercial test for detecting autism at $1,000 a test.

[Regina] (53:10 - 53:16)
Okay, they were already selling it. That is fast. And a thousand bucks?

Whoa.

[Kristin] (53:16 - 53:43)
Yeah, they could not afford for the paper to have mistakes in it because it was the justification for a product that was already on the market. And that test is called NeuroPoint DX, and it was already being sold direct to consumer. And the disclosures in the paper showed that multiple authors were employees or equity holders in Stemina.

That's the company commercializing the test. And they were all listed as inventors on the patent for the test.

[Regina] (53:43 - 54:10)
Wow. So this changes things, right? It would have been one thing if they and their paper were basically saying, hey, we found these amino acid patterns, whatever, and it's an interesting scientific observation, which would be fine and reasonable for exploratory paper.

But they were basically saying, we have a commercially available test that we claim you can use to identify autism early. Totally different.

[Kristin] (54:10 - 54:24)
That's it, exactly. And Regina, some of the press coverage of this paper and this test was wild. So here is a quote in the press that came out from the CEO of that company.

And she says, it is not an exaggeration to say.

[Regina] (54:24 - 54:29)
When they start out that way, you know, instantly it is an exaggeration.

[Kristin] (54:29 - 54:58)
Oh, yes. Okay, so it is not an exaggeration to say that NeuroPoint DX will revolutionize diagnosis and precision medicine by identifying imbalances in the patient's metabolism. We can diagnose neurological disorders and identify targeted treatments.

These interventions may be as simple as modifying diet or dietary supplements or as complex as developing new drugs to correct the imbalance.

[Regina] (54:59 - 55:19)
Wow. So they were not only selling this as a diagnostic test to predict autism, but also as what they called precision medicine tool, like you might just modify your diet. I mean, that's a whole new level.

They were attributing biological meaning to these patterns they found.

[Kristin] (55:20 - 55:31)
And at this point, this whole saga rose to a whole different level for me. So no more Good Samaritan on my part. I wrote a letter to the editor with my colleague Steve Goodman criticizing their paper.

[Regina] (55:31 - 55:40)
Ooh, I know Steve, the two of you together, gloves are off. These guys were in trouble. So what do you put in the letter?

[Kristin] (55:40 - 55:48)
So it was actually a lot of things. Of course, we started with pointing out the wrong PPV calculation, but then we pointed out some other problems as well.

[Regina] (55:49 - 55:58)
Okay, let's say we are going to put the details in the show notes along with a link to the infamous letter, but maybe just share one or two of your points.

[Kristin] (55:59 - 56:45)
Yeah, I'm sure everybody is going to want to read this as exciting bedtime reading, but we'll just stick to two issues here. So one of the problems actually illustrates a cool statistical point. So I want to bring this one up.

Remember the average values of the amino acids and of the ratios did not differ between kids with autism and kids without autism. The only thing that differed is that the kids with autism had more variation in these measures. The spread of values was wider in the kids with autism.

That means there were more kids with autism in both the high and low tails of the distribution. So when you impose an upper cutoff, more kids with autism are naturally going to fall above that line. But interestingly, had you imposed a lower cutoff, more kids with autism also would have fallen below the line.

[Regina] (56:46 - 56:58)
Oh, that is an interesting statistical point, Kristin, a wider distribution. So what does it mean biologically, right? If kids with autism are more likely to be either very low or very high.

[Kristin] (56:58 - 57:21)
Right. Does this really indicate a, quote, imbalance in metabolism or is it just picking up on something like kids with autism have more varied and weird diets? Maybe it's not a signature of the condition at all, right?

Which means this whole argument about how this test might be used for precision medicine, like you could correct autism by correcting these amino acid variations, seems like total nonsense to me.

[Regina] (57:21 - 57:22)
Yeah, that's kind of sketchy.

[Kristin] (57:23 - 57:48)
Also, Regina, in their marketing, this company was claiming that the value of the test was that it could be used to flag kids with autism earlier than they would have been flagged clinically. Here's the flaw in that logic. The kids in the study had already been diagnosed clinically with autism.

So the test was not identifying them early. It was identifying them after the fact, after a diagnosis.

[Regina] (57:49 - 58:14)
Okay, to make the claim that they wanted to, that this test can predict autism, they needed to have applied this test to young kids, right? Younger kids who'd not been diagnosed with autism, then followed over time to see if they were later diagnosed with autism. But the problem with this, and probably why they didn't do it, this takes longer and more expensive, and there was money to be made.

No time to waste.

[Kristin] (58:14 - 58:22)
Yeah, this test only is useful if it actually catches kids early, but the authors provided no evidence that it catches kids early.

[Regina] (58:23 - 58:33)
Okay, feels like this company was totally exaggerating, despite this grandiose caveat of, it is not an exaggeration to say.

[Kristin] (58:34 - 59:39)
I feel like this is going to be a catchphrase now on Normal Curves, that you and I are just going to go, it is not an exaggeration to say whenever we want to make fun of something. I mean, it's kind of wild, right? Companies, of course, we know companies exaggerate, but this is like a wild exaggeration when you actually look at the data.

So I wanted them to start the quote, it is not a totally wild and inappropriate exaggeration to say.

The good news is, Regina, it does not appear that NeuroPoint DX was a success. So I was following it for a few years after 2018, and for a while, it seemed that they might get away with their misrepresentations of the product, but in preparing for this podcast, I checked in again by going to their website, which I haven't done in a few years, and I can still find NeuroPoint DX listed on Stemina's website, but all the links from this landing page to like marketing materials or shopping carts are all now defunct. So I think the test is no longer being sold. And in fact, it's not clear to me if Stemina is still doing much either because a lot of the links on that website are broken.

[Regina] (59:40 - 59:44)
Maybe you helped shut this company down. What do you think?

[Kristin] (59:44 - 1:00:05)
I'd like to think so, Regina. Or maybe the fact that the test was useless and expensive just became obvious over time to both consumers and doctors. Regina, though, I think we could have made money, right, as anti-investors.

Isn't that a thing, betting against these companies? Of course, that would have required me to know something about investing, but I think we missed an opportunity.

[Regina] (1:00:05 - 1:00:10)
Hmm. I'm not sure we have time to do that if we're doing this podcast.

[Kristin] (1:00:11 - 1:00:17)
Somebody out there is going to teach us how to be anti-investors because I think we can do very well. Pick out all the companies that are going to fail.

[Regina] (1:00:17 - 1:00:42)
Yeah. So Kristin, we're talking about these stories and these stories of the companies that you've encountered, but I know that you and I both have personal stories about positive predictive value, actually, comes to our disease diagnoses and screening. And I think we're not alone in that, especially if you are our age.

So this stuff really matters to individual people, too, not just investors.

[Kristin] (1:00:42 - 1:01:54)
It really does. Most people in their life are going to encounter sensitivity, specificity, positive predictive value, negative predictive value in their health care at some point. And so let me share a personal story.

So I had invasive breast cancer two years ago, and it was diagnosed because I found a lump and I went in for a mammogram and ultrasound. Going in for those tests, my PPV was quite low because most lumps, it turns out, don't turn out to be cancer. So I wasn't too worried.

But because of what they were seeing on the imaging, they decided to biopsy me right away. And biopsy results, unfortunately, take a few days to come back, but I could see those imaging reports right away. So, of course, the moment I got home, I started looking up academic papers about the features that I was seeing on my imaging report.

And I was able to figure out from those features that my positive predictive value was about 85%, which means that I pretty much knew that I had cancer before I got the biopsy results back. And that was actually helpful to me because I was totally prepared. I knew when the doctor called me that he was going to tell me I had cancer.

I pretty much knew, so I was prepared at least.

[Regina] (1:01:55 - 1:03:15)
I think it shows how knowledge really can be power. It didn't change the outcome, but that knowledge helped you prepare. I have an example where I used these things to make a medical decision, so not quite the same as you, but for screening.

So I had an ultrasound mammography, and they found a little aberration, something called a microcalcification, just a little spot that either could be a tumor or it just could be nothing, just a blip. And because of where the spot was located, they wanted to do a surgical biopsy, where they would put me under to poke around and try to find this little spot. But I didn't want surgery if I didn't need it.

So I went home, same as you, started looking up the academic literature to find out the positive predictive value of the particular type of spot that I had. Given that I had this microcalcification, what was the chance it would be an actual tumor? And I found that it was somewhere between 2 and 10 percent.

That was the estimate of positive predictive value. So I decided not to get the biopsy and had to explain to my doctor on the phone, who was very upset, I explained positive predictive value. I honestly think he did not understand.

He did not understand probability.

[Kristin] (1:03:16 - 1:03:17)
Needs to listen to our podcast.

[Regina] (1:03:18 - 1:03:27)
Oh, goodness. It was, it was worrisome. So I decided to take that risk, not get the biopsy.

And at my next checkup, the spot had disappeared.

[Kristin] (1:03:27 - 1:03:44)
Oh, well, so good decision. I'll point out that it's a very personal decision because some people, probably myself actually, because I have a family history, had I known the positive predictive value was 2 to 10 percent, I probably actually would have gotten the surgery. But different people have different risk tolerances and, you know, different histories.

[Regina] (1:03:45 - 1:03:57)
So I think we are ready now on the very serious, let's bring it back. I think we are ready to rate the claims. So Kristin, remind us of what this claim was.

[Kristin] (1:03:57 - 1:04:09)
Right. So the claim today was many direct-to-consumer genetic and biomarker tests are useless junk. And Regina, do you like how I set that claim up so that we can give it a lot of smooches?

[Regina] (1:04:10 - 1:04:18)
This is kind of a, you did a very nice job here. This is kind of a negative claim. I think there was a little bit of claim hacking going on here.

[Kristin] (1:04:19 - 1:04:21)
But that's the takehome message that I wanted to convey.

[Regina] (1:04:23 - 1:04:39)
Okay. All right. I am fine with that.

So just to remind everyone, our smooch scale, one to five smooches, one being little to no evidence for the claim and five smooches being lots and lots of evidence for this particular claim. So Kristin, you can go first.

[Kristin] (1:04:39 - 1:04:58)
So I'm going five smooches. I've nicely set up the claim in a way that I feel very comfortable saying there's strong evidence that many of these tests, we haven't looked at all of them, but many tests that are out there are useless junk. Somebody finds a small statistical association that might be real, but that doesn't mean it makes a good predictive test. How about you, Regina?

[Regina] (1:04:58 - 1:05:25)
I like it. So I think I'm going to go a little lower, actually, for smooches.

Not because I think that they are going to give me any actionable information. I don't think that. I agree with you in sentiment, but they are fun to talk about at a cocktail party or on a date.

It makes for an interesting conversation, which is worth money because I have been to some boring cocktail parties.

[Kristin] (1:05:27 - 1:05:42)
Fair enough, Regina. Kind of like playing the lottery. As long as you know that you're doing it only for entertainment purposes, then maybe it's giving you a dopamine hit if you think for a minute that you're going to win the lottery or you learn that you have the Wanderlust gene, and maybe that's worth paying for for some people.

[Regina] (1:05:44 - 1:05:50)
True, true. Okay, methodological morals. I feel like we have so many from this episode.

[Kristin] (1:05:50 - 1:05:50)
We do.

[Regina] (1:05:52 - 1:05:53)
Okay, so what are you starting with?

[Kristin] (1:05:53 - 1:06:01)
A biomarker paper is not the same thing as a biomarker test.

[Regina]
That's great.

[Kristin]
How about you, Regina?

[Regina] (1:06:01 - 1:06:14)
I was really torn on this one, but how about this? If your sample doesn't match the real world, then for all of your positive predictive value needs, call on Bayes' theorem.

[Kristin] (1:06:14 - 1:06:18)
Oh, I feel like that goes on every student's cheat sheet for my class.

[Regina] (1:06:19 - 1:06:41)
Kristin, this has been a lot of fun, and I'm considering breaking into my piggy bank so I can go find out if I have the female infidelity gene and the Wanderlust gene and freckles and cellulite. It's worth it. Thanks, Kristin, and thanks, everyone, for listening.

[Kristin] (1:06:41 - 1:06:42)
Thanks, Regina.

Diagnostic Testing: Do the stats tell you what you need to know?

Listen On

Support On

Recent Episodes