Oct. 6, 2025

Ultramarathons: Can vitamin D protect your bones?

Ultramarathons: Can vitamin D protect your bones?

Ultramarathoners push their bodies to the limit, but can a giant pre-race dose of vitamin D really keep their bones from breaking down? In this episode, we dig into a trial that tested this claim – and found  a statistical endurance event of its own: six highly interchangeable papers sliced from one small study.  Expect missing runners, recycled figures, and a peer-review that reads like stand-up comedy, plus a quick lesson in using degrees of freedom as your statistical breadcrumbs.


Statistical topics

  • Data cleaning and validation
  • Degrees of freedom
  • Exploratory vs confirmatory analysis
  • False positives and Type I error
  • Intention-to-treat principle
  • Multiple testing
  • Open data and transparency
  • P-hacking
  • Salami slicing
  • Parametric vs non-parametric tests
  • Peer review quality
  • Randomized controlled trials
  • Research reproducibility
  • Statistical sleuthing

Methodological morals

  • “Degrees of freedom are the breadcrumbs in statistical sleuthing. They reveal the sample size even when the authors do not.”
  • “Publishing the same study again and again with only the outcomes swapped is Mad Libs Science, better known as salami slicing.”


References

Kristin and Regina’s online courses: 

Demystifying Data: A Modern Approach to Statistical Understanding  

Clinical Trials: Design, Strategy, and Analysis 

Medical Statistics Certificate Program  

Writing in the Sciences 

Epidemiology and Clinical Research Graduate Certificate Program 

Programs that we teach in:

Epidemiology and Clinical Research Graduate Certificate Program 


Find us on:

Kristin -  LinkedIn & Twitter/X

Regina - LinkedIn & ReginaNuzzo.com


 00:00 Intro & claim of the episode
 00:44 Runner’s World headline: Vitamin D for ultramarathoners
 02:03 Kristin’s connection to running and vitamin D skepticism
 03:32 Ultramarathon world—Regina’s stories and Death Valley race
 06:29 What ultramarathons do to your bones
 08:02 Boy story: four stress fractures in one race
 10:00 Study design—40 male runners in Poland
 11:33 Missing flow diagram and violated intention-to-treat
 13:02 The intervention: 150,000 IU megadose
 15:09 Blinding details and missing randomization info
 17:13 Measuring bone biomarkers—no primary outcome specified
 19:12 The wrong clinicaltrials.gov registration
 20:35 Discovery of six papers from one dataset (salami slicing)
 23:02 Why salami slicing misleads readers
 25:42 Inconsistent reporting across papers
 29:11 Changing inclusion criteria and sloppy methods
 31:06 Typos, Polish notes, and misnumbered references
 32:39 Peer review comedy gold—“Please define vitamin D”
 36:06 Reviewer laziness and p-hacking admission
 39:13 Results: implausible bone growth mid-race
 41:16 Degrees of freedom sleuthing reveals hidden sample sizes
 47:07 Open data? Kristin emails the authors
 48:42 Lessons from Kristin’s own ultramarathon dataset
 51:22 Fishing expeditions and misuse of parametric tests
 53:07 Strength of evidence: one smooch each
 54:44 Methodologic morals—Mad Libs Science & degrees of freedom breadcrumbs
 56:12 Anyone can spot red flags—trust your eyes
 57:34 Outro: skip the vitamin D shot before your next run 


[Regina] (0:00 - 0:09)
It's like a research Mad Libs. You remember those? You just swap out the outcome, and then, ta-da, you've got yourself a whole new paper.


[Kristin] (0:10 - 0:43)
Mad Libs is a great way to describe it, Regina. These six papers are a lot like Mad Libs. Welcome to Normal Curves.


This is a podcast for anyone who wants to learn about scientific studies and the statistics behind them. It's like a journal club, except we pick topics that are fun, relevant, and sometimes a little spicy. We evaluate the evidence, and we also give you the tools that you need to evaluate scientific studies on your own.


I'm Kristin Sainani. I'm a professor at Stanford University.


[Regina] (0:43 - 0:49)
And I'm Regina Nuzzo. I'm a professor at Gallaudet University and part-time lecturer at Stanford.


[Kristin] (0:49 - 0:54)
We are not medical doctors. We are PhDs, so nothing in this podcast should be construed as medical advice.


[Regina] (0:54 - 1:00)
Also, this podcast is separate from our day jobs at Stanford and Gallaudet University.


[Kristin] (1:00 - 1:07)
Regina, just last week, I saw a headline on Twitter and in Runners World Online that caught my attention.


[Regina] (1:08 - 1:13)
Oh, once again, covering breaking news here on Normal Curves. I know.


[Kristin] (1:13 - 1:38)
Here's the headline from Runners World, September 25th. Pre-race vitamin D could do wonders for ultra-runners' bone health, according to science. And this is how the article summarized the study.


The researchers found that the runners who took a hefty single dose of vitamin D before the ultramarathon were better guarded against bone damage than those who didn't.


[Regina] (1:38 - 1:46)
Kristin, this is right up your alley, isn't it? Runners and bones and vitamin D.


[Kristin] (1:46 - 2:02)
Yes.This is why I pulled this study, because I've been a statistician for many projects about runners and bones. And I've even analyzed data on ultramarathoners. But because I'm a vitamin D therapeutic nihilist, as we've talked about before, I definitely pulled this paper with a skeptical eye.


[Regina] (2:03 - 2:11)
Right. We do have two episodes debunking research on vitamin D. See vitamin D episodes one and two.


[Kristin] (2:11 - 2:20)
Yes. But, Regina, when I pulled this study, I actually got all excited because in the title it says double-blind randomized controlled trial.


[Regina] (2:20 - 2:31)
Probably your heart was like mine, all a flutter, double-blind and randomized. You are melting hearts here, Kristin. Yes.


[Kristin] (2:31 - 3:01)
So, I actually, when I saw this, I was determined to keep an open mind because maybe this is a really good study. And this is a huge dose of vitamin D in a very specific context. So it is totally possible that this does something even if the daily vitamin D pill doesn't.


[Regina]
I like that you're open-minded.


[Kristin]
I went into this with a very open mind, Regina. So the claim for this episode is just the claim of the paper that taking a big dose of vitamin D right before an ultramarathon can reduce bone breakdown from the race.


[Regina] (3:02 - 3:06)
Oh, interesting claim and very targeted too.


[Kristin] (3:06 - 3:31)
It's the only paper that I know of, Regina, that's looked at this narrow question, which should make my life easier for this episode in theory because in theory this episode is just about a single paper. In theory? Yeah, in theory, because for reasons I'm going to reveal later, I actually ended up having to scrutinize six papers for this episode.


[Regina] (3:32 - 3:37)
Six. Nice foreshadowing, Kristin. I like the suspense.


[Kristin]
Yeah.


I'm going to keep you in suspense here.


[Kristin] (3:37 - 3:50)
All right. Let me give a little background. First, ultramarathons are any race longer than a standard 26.2-mile marathon. Regina, I've only done regular marathons, but you have done ultras, and what's your longest?


[Regina] (3:51 - 4:06)
I have 50 kilometers or about 31 miles, some were a little longer because they were out in the mountains, kind of hard to get the exact distances right, but, you know, these go up to 100 miles and more.


[Kristin] (4:06 - 4:10)
They can be quite long. And Regina, how did you get into doing ultras?


[Regina] (4:11 - 4:39)
It was a boyfriend.


[Kristin]
The things we'll do for love.


[Regina]
It's true.


My boyfriend at the time was a big ultrarunner. He did a lot of 100-milers and he knew a lot of ultrarunners, the Virginia Happy Trails Run Club, and I kind of fell in with the crowd. You know how, like when you're a teenager, you fall in with a bad crowd that does drugs?


I fell in with the ultramarathoner crowd.


[Kristin] (4:41 - 4:45)
Did you fall out of the club when you broke up with him then?


[Regina] (4:45 - 4:53)
I did, as often happens. Plus, I had an IT band injury, the whole thing.


[Kristin] (4:53 - 5:02)
Now, you mentioned long races. Regina, the race we're going to talk about today, it was actually 240 kilometers. That is almost 150 miles.


[Regina] (5:02 - 5:38)
Woohoo! Okay, that is an exceptionally long one. I remember, can I tell a story about a friend?


[Kristin]
Of course.


[Regina]
I had a friend who did 135-miler, but hers was in Death Valley in July, in the summer.


[Kristin]
Whoa.


[Regina]
Super, super hot. She had a whole crew of pacers and, you know, people keeping her hydrated and well-fed. And get this, she and her pasters needed to run on the white line in the middle of the road, you know, in the middle of the tarmac.


[Kristin] (5:38 - 5:38)
Why?


[Regina] (5:38 - 5:46)
Because otherwise, the black asphalt would melt the rubber off their shoes.


[Kristin] (5:47 - 5:47)
No way. That's how hot it was?


Oh, my goodness.


[Regina] (5:47 - 6:28)
She was in full garb, like everything covering her head, I think it was like she was going out like a beekeeper. You couldn't see anything of her. It took her, guess how long, 42 hours nonstop.


That is a lot of time, especially, okay, get this, also, it was just like a couple of weeks before she had done the Western States 100-miler, so 100 miles just two weeks before, which I guess was a warm-up for this.


[Kristin]
Wow, that is kind of crazy. Wow.


[Regina]
They are super crazy. I know a lot of crazy ultra runners. I would actually say they're crazier than, you know, the drug crowd.


[Kristin] (6:29 - 6:31)
Really? And how would you know, Regina?


[Regina] (6:33 - 6:46)
No comment. I'm going to plead the fifth on this one. But one last thing, that Death Valley runner, a wonderful friend of mine, she's actually a stat teacher in Regina.


[Kristin] (6:46 - 6:53)
Oh, wow. You know, and she might have been in the study that I analyzed data from because I've analyzed data from Western States.


[Regina] (6:54 - 6:55)
Ooh, small world.


[Kristin] (6:56 - 7:01)
Yeah. So, Regina, as you know from personal experience, ultramarathons are brutal on the body.


[Regina] (7:02 - 7:05)
My IT band, RIP. Right.


[Kristin] (7:06 - 8:01)
And there's plenty of research, small studies, but solid ones, that this kind of extreme exercise is what we call catabolic, meaning your body breaks things down, including bone. Now, your body is actually constantly breaking down and rebuilding bone, but normally those two processes are in balance, so your total bone mass stays about the same. But during extreme endurance events, bone breakdown, also known as bone resorption, that goes way up and bone formation goes down.


[Regina]
Hmm. And how bad is that?


[Kristin]
We don't really know.


It's temporary, so it may not really have any major long-term consequences. And so I want to point out, Regina, that that Runner's World article, they kind of overhyped the study, right? They said that vitamin D did wonders for bone health and that it prevented bone damage.


Both of these are overstating, in fact, the claims of the study.


[Regina] (8:02 - 8:09)
Wonders. Right. I wonder what it did, I think, probably what they should have written.


[Kristin] (8:09 - 8:46)
Yeah, maybe they needed to use the verb there and not the noun, yes. But Regina, I do have a boy story here.


[Regina]
Ooh, do tell.


[Kristin]
All right, so after college, I took a year off to run semi-professionally, and I spent three months in Boulder, Colorado, which, by the way, is Runner's Mecca. And I ran with a group that did this workout every Wednesday night that I still remember. I can't remember anyone's name from the group, but I still remember the workout.


It was six by six minutes hard at 10k pace with one minute rest. Awesome workout.


[Regina] (8:49 - 8:53)
I'm going to…stop. Not everyone understands what you're saying, Kristin.


[Kristin] (8:53 - 8:59)
Really?


[Regina]
No.


[Kristin]
I literally think in intervals.


No? People are not familiar with this concept?


[Regina] (8:59 - 9:02)
I promise you, this is worse than stats jargon.


[Kristin] (9:03 - 9:03)
No.


[Regina] (9:03 - 9:14)
Yes. But it does not matter because I do not want your nerd running stats stuff. I want to get back to the boy part.


The boy part.


[Kristin] (9:14 - 9:54)
Back to boys. Right. One of the men in the group who was young and cute, and I had a slight crush on, he did ultramarathons.


And he was telling me about how he had run this 100 miler and he wound up developing four stress fractures, these micro cracks in the bone, during the race.


[Regina]
Four? During the race.


[Kristin]
Yeah.


During the race. So maybe there really is something to this idea of bone damage from ultramarathons.


It's certainly a lot of stress on the body.


[Regina]
But he finished. He finished the ultra.


[Kristin]
Yeah. He finished the race. And I think only after the fact did he realize that he had broken four bones during the course of the race.


[Regina] (9:54 - 9:59)
Yes. Sounds like an ultra-runner. Yeah. That's about par for the course.


Yeah.


[Kristin] (9:59 - 10:08)
Yeah. I always thought, Regina, that ultra-runners were a little crazy. But of course, us regular marathon folks, we are totally normal and sane.


[Regina] (10:10 - 10:13)
Super normal.


[Kristin] (10:13 - 10:14)
Normally distributed. Normal curves.


[Regina] (10:15 - 10:19)
You keep believing that, Kristin. Keep telling yourself that.


[Kristin] (10:20 - 10:53)
All right, Regina, let's jump into the paper now. This study was published recently in the Journal of the International Society of Sports Nutrition. It involved 40 male ultra-runners, average age 41.


They were all running the same ultramarathon race in Poland.


[Regina]
And why only men?


[Kristin]
Well, it's true that more men run ultras than women.


In the data set I analyzed from the Western States 100 miler, that contained about a two-to-one male-to-female ratio. So I imagine it was just logistically easier for them to recruit men.


[Regina] (10:54 - 11:09)
Okay. Makes sense. I'd like to point out that as we age, I believe that women are better at ultra-running than men.


We age better, and I think that we are better at enduring things. So just a little fun fact.


[Kristin] (11:10 - 11:33)
Oh, I like that fun fact. I want to hear something good about aging. There's not much good about it, but glad to hear there's something good about aging, Regina.


Thank you.


[Regina]
Yep.


[Kristin]
So they randomized the runners to vitamin D or placebo, 20 per group.


But here's the catch. They didn't end up analyzing data from all 40. They reported that they excluded five runners who didn't finish the race, leaving a final sample size of 35.


[Regina] (11:33 - 11:38)
Hmm. So they did not follow the intention-to-treat principle.


[Kristin] (11:38 - 11:59)
No, they did not. And we talked about intention-to-treat in the exercise and cancer episode. It means once randomized, always analyzed.


In this case, 40 were randomized, but only 35 were analyzed. So they violated intention-to-treat. They ended up with 16 in the vitamin D group and 19 in the placebo group.


[Regina] (11:59 - 12:16)
This is a worry because the people that you're left with then might not be representative anymore. Maybe the people who dropped out of this ultramarathon were just different in their bones, you know, in their body somehow. So now you are losing the benefits of randomization.


[Kristin] (12:16 - 12:28)
Maybe the four that dropped out from the vitamin D group are the four that got stress fractures during the race, right?


[Regina]
Right.


[Kristin]
Yeah.


So it's not ideal. Another complaint I have, Regina, they did not include a participant flow diagram.


[Regina] (12:29 - 13:01)
That really should be standard in every randomized trial. And it sounds kind of geeky, but here's just a reminder. Flow diagram shows how many people were recruited, how many were excluded, and why, how many were randomized, how many were then lost to follow-up, you know, dropped out of the race, and then how many were actually analyzed.


It gives you all the information in one easy-to-read diagram. And without it, it's hard to track what really happened. So it's geeky, but important.


[Kristin] (13:02 - 13:20)
Very important. And as we're going to see, keeping track of the exact number of participants is going to be an issue here. And a flow diagram would have helped with that.


Okay, Regina, all the participants ran the Lower Silesian Mountain Run Festival in Poland. Are you familiar with that race?


[Regina] (13:21 - 13:32)
I am not, but Poland Mountain sounds kind of fun. And I know people who travel internationally for these kinds of ultras. What time of year? Should we go?


[Kristin] (13:33 - 13:45)
First of all, I want their job that allows them to go around the world and travel doing ultras. Can I have their job, please? It was July.


It was in summer, and it was July, actually, of 2018.


[Regina] (13:46 - 13:56)
Wait a minute. 2018? And the study was just published, 2025? So that's what, a seven-year lag between study and publication?


What happened?


[Kristin] (13:56 - 14:34)
Well, yeah, this is important. This is a red flag that should jump out at you when you're reading papers. I'm not going to tell you exactly what happened yet, Regina.


Hold that thought, because we're going to get back to that in a minute. First, I want to tell you a little bit more about the race. Again, almost 150 miles.


It was in the mountains, and the runners had nearly five miles total of straight vertical ascent and descent. The highest summit was almost a mile above sea level. So this is kind of a brutal race.


It took the runners in the study an average of about 42 hours to complete the race, so almost two full days. But the winning time was 30 hours and 22 minutes.


[Regina] (14:35 - 14:48)
Okay, that is indeed brutal. But, Kristin, it actually sounds like an ultra-runner's idea of a good weekend.


[Kristin]
Really?


[Regina]
I am not joking. This is a good time. Yep.


[Kristin] (14:49 - 15:08)
So, Regina, now let's talk about the intervention. 24 hours before the race, the runners were given either a placebo or a 150,000 IU dose of vitamin D. That's a lot.


That's 250 times more than the recommended daily allowance for vitamin D.


[Regina] (15:08 - 15:21)
Is that even safe? Because I know we talked about a clinical trial where the participants got a 30,000 IU dose once a month, but this is, what, five times more.


[Kristin] (15:21 - 15:32)
Right. It's pretty high. But these were pretty healthy people, and they didn't report any adverse effects in the study, so I guess it was okay.


But yeah, it might be pushing it.


[Regina] (15:32 - 15:34)
You know I'm tempted to try this now.


[Kristin] (15:35 - 15:37)
Do not try this at home.


[Regina] (15:37 - 15:46)
Okay, okay. But how did they actually give this to the runners? Are we talking about like a giant horse pill?


What did the placebo look like?


[Kristin] (15:47 - 15:51)
The vitamin D was given in an oil solution that they had to drink.


[Regina] (15:51 - 15:52)
Yum.


[Kristin] (15:54 - 16:48)
Right. And the placebo was an anise-flavored oil. They added that taste, because it's kind of an odd taste, to make people think that they were getting the vitamin D.


Both the participants and the researchers administering the intervention, they were both blinded, and they write in the paper, the supplementation and placebo solutions were presented in carefully sealed, sintered glass bottles marked with randomly assigned numbers.


[Regina]
Oh, nice. That's good.


So nobody knew which solution they were getting.


[Kristin]
Exactly. But after that, it gets murky, because they gave zero details about the randomization.


They just say they randomized 20 runners to each group. But we don't know how they were randomized. Did they use block randomization?


Did they stratify by anything, like age or training? Who actually filled the bottles or made sure that the right runner got the right one? None of that is described.


[Regina] (16:50 - 17:11)
And in clinical trial, as we've talked about, Kristin, in our clinical trials course, those details are boring but crucial, because without them, you cannot be confident, right, that the study was properly randomized, which, by the way, is the whole point of a randomized clinical trial. So that's not good.


[Kristin] (17:13 - 17:39)
Regina, moving on to outcomes. This is where things get even thornier. They drew the runner's blood three times, 24 hours before the race, at the same time they were getting the intervention, and then immediately after the race, and 24 hours later.


Regina, I can't even picture that. So, like, you stagger across the line after 42 hours straight of sleepless running, and then they, like, put you on a gurney and stick you with a needle?


[Regina] (17:39 - 17:52)
You know, you're in no shape to protest, though, at that point. You are ready to get this done so you can move on to your celebratory vodka shots and pizza.


[Kristin] (17:52 - 17:56)
Right. So they literally lie you down on the gurney, stick the needle in, and hand you a vodka shot.


[Regina] (17:57 - 18:05)
Mm-hmm. Yeah. That is at least what I'm picturing, and not that far off from the ultra parties that I have seen.


Yep. All right.


[Kristin] (18:05 - 18:12)
Well, yeah, I guess you're burning a lot at this point. It's catabolic, so you might as well put the vodka in, and I'm sure that gets burned off, too. No consequences, right? And a bunch of chocolate.


[Regina] (18:13 - 18:17)
Right, right. It's all free.


Calories and alcohol.


[Kristin] (18:18 - 18:18)
Free calories, yes.


[Regina] (18:18 - 18:22)
Uh-huh. So what were they actually measuring in the blood?


[Kristin] (18:22 - 18:39)
They were measuring different biomarkers. In this paper, they report six bone-related biomarkers, including markers of bone resorption and formation, which we talked about earlier. But Regina, they did not specify which of these markers was the primary outcome.


[Regina] (18:40 - 19:11)
Oh, not good. In a randomized trial, you must decide ahead of time what your primary outcome is, what is the one result that really matters the most to you, and then commit to it. Because if you don't commit to it, Kristin, you're like the multiple-testing bad boyfriend that we talked about in the review episode.


You are just leading the variables on under false pretenses, leading to false positives and broken hearts.


[Kristin] (19:12 - 20:00)
Yes. False positives, there's a huge risk of false positives if you don't pre-specify that primary outcome. And the problem here is I don't think they ever specified a primary outcome because usually you would specify that in a publicly available protocol ahead of time posted on a site like clinicaltrials.gov. And in the paper, they do cite a clinicaltrials.gov registration number.


[Regina]
Oh, good.


What did the registered protocol say then?


[Kristin]
So I went to clinicaltrials.gov, pulled up that number. Unfortunately, the protocol that came up had nothing to do with ultramarathons.


What? Yeah, it was by the same research group, but it was a totally different protocol. It was a protocol for a Nordic walking study that also involved vitamin D, but it was not the ultramarathon study.


[Regina] (20:01 - 20:15)
Nordic walking, not the same as running for 150 miles, just in case there was any confusion about that. So we have no pre-registered protocol here then?


[Kristin] (20:15 - 20:34)
No, we don't. At least not on clinicaltrials.gov, which is where the authors refer us to. And it gets even more problematic because what I found next, Regina, suggests that this whole study was just a wild fishing expedition.


Oh, no. Yeah, but I'm going to keep you in suspense, Regina. Let's take a short break first.


[Regina] (20:57 - 21:10)
Welcome back to Normal Curves. Today we're looking at a randomized trial of ultramarathoners. Kristin, you were about to tell us something shocking that your statistical sleuthing turned up.


What do you have?


[Kristin] (21:11 - 21:31)
So there's a line in the paper that really caught my eye. They write, details of the participants' physical characteristics, training loads, and performance are reported elsewhere. And then they cite a paper that turns out to be a 2021 paper from the same study, using the same participants and same data sets.


[Regina] (21:31 - 21:44)
Wait a minute. So this is not the first paper on this study or data set?


[Kristin]
Nope.


[Regina]
And, oh. So this explains that seven-year gap then between the race and the publication, doesn't it?


[Kristin] (21:44 - 21:58)
It does. And I started Googling around for other papers on this study, and I didn't just find one more. I found five more papers, all published from this same data set.


One published every year since 2020.


[Regina] (21:59 - 22:08)
Like clockwork. Yep. So they have been slicing off pieces of this data set and publishing them for years.


[Kristin] (22:09 - 22:16)
Yes. It's a classic case of salami slicing the data. We've talked about salami slicing before on this podcast, Regina.


[Regina] (22:17 - 22:41)
We have. We talked about it in the Dating Wishlist episode. And in that case, we actually said the paper there was the opposite of salami slicing.


That was a full salami there. That was the big salami, but here we have the perfect example of this sliced up salami.


[Kristin] (22:42 - 23:00)
Yes. Let me remind everybody what salami slicing is. It's what happens when researchers take a single data set and they chop it into as many papers as possible. So instead of writing one solid study, they publish little fragments, each with the same participants, same methods.


But for example, they might just swap out the outcome variables.


[Regina] (23:02 - 23:19)
And why do they salami slice their data? They do this to pad their publication record. It's kind of clever, actually.


It's like a research mad libs. You remember those? You just swap out the outcome and then ta-da, you've got yourself a whole new paper.


[Kristin] (23:20 - 23:43)
Mad libs is a great way to describe it, Regina. These six papers are a lot like mad libs. They cut and paste a lot of the text from the intro methods and statistics sections from one paper to another, but then they just swapped out which biomarkers they were looking at.


The papers even have the same basic tables and figures. They look exactly the same, except they, again, just swapped the biomarkers out.


[Regina] (23:44 - 24:30)
So in addition to being just lazy science, the problem with this is that when you test a bunch of outcomes, like they've done here, your chances of false positives shoot way up. And if you're salami slicing those results across separate papers, then you're just losing the bigger picture here. So readers are just seeing one thin slice, not the whole salami.


And we want to see the whole salami. So readers have no idea how many tests were run. And that means they do not realize how high the risk of false positives really was.


[Kristin] (24:30 - 24:56)
Yeah, that's it exactly, Regina. When you see only one paper in isolation, it looks more convincing than it really is because you're missing the rest of the picture, the rest of the salami. Also besides the false positive risk, salami slicing is also just a waste of space, right?


It's padding your resume with recycled, really self-plagiarized material, and it's a mark of someone trying to game the system rather than to do good science.


[Regina] (24:57 - 25:01)
Yep, absolutely. This is why we do not like salami slicing.


[Kristin] (25:01 - 25:41)
Now it's true that sometimes with randomized trials, the authors will publish one main paper with the primary outcomes first, and they may publish some additional papers on secondary and exploratory outcomes later. But those will always be clearly labeled as exploratory, and that did not happen here.


[Regina]
So what were the different papers about, Kristin?


[Kristin]
The first paper was published in 2020, and that was just looking at vitamin D levels in the blood. A few different variants of vitamin D. They found that vitamin D levels in the blood increased in both groups, but not surprisingly, it increased more in the group to which they had given vitamin D.


[Regina] (25:41 - 25:51)
Uh, so isn't that kind of obvious? Isn't it just showing, yeah, people took the vitamin D they were supposed to take?


[Kristin] (25:51 - 26:23)
Yeah, it's just like a compliance check. It does not deserve a whole separate paper, for sure. Oh, goodness. Then in 2021, they published a paper where they looked at markers of inflammation, like IL-6.


Then in 2022, it was markers that were related to the tryptophan breakdown pathway. This influences, apparently, stress and brain function. Then in 2023, this one's fun, they looked at all 20 amino acids in your blood, all amino acids separately, like leucine, arginine, and so on, and then a bunch of ratios of amino acids as well.


[Regina] (26:23 - 26:30)
Oh, my goodness. So, this is getting in the weeds, and what do these things even tell you?


[Kristin] (26:30 - 26:50)
I don't know. What does the level of leucine in your blood tell you? I don't know that it tells us anything, yeah.


The 2024 paper, they reported markers related to the amount of iron and iron metabolism and also cardiac markers, like troponin, the breakdown of heart tissue. And then finally, in 2025, that's the paper we're looking at today, they reported markers of bone health.


[Regina] (26:51 - 26:58)
Oh, goodness. So, do they admit that they were salami slicing, or at least were they transparent about all their previous papers?


[Kristin] (26:59 - 27:57)
No, they weren't. Now, they didn't completely hide the other papers, but they also did not spell out, oh, by the way, this is the sixth paper from the same study. So, remember, in the paper we're looking at today, they did reference their 2021 paper, at least in passing, to indicate that some of these data had been published before.


But they never cite the 2022 or 23 papers, and they did cite their 2020 and 24 papers, but they include them in the introduction, and they present them as if they're just unrelated background papers. They never tell us that those are papers from the same dataset.


[Regina]
Oh, misleading.


[Kristin]
Very misleading. Another issue, Regina, when I looked across all six papers, I found some troubling inconsistencies. The big one is that in the 2020 paper, they say that their sample size was 27 people.


They randomized 13 to vitamin D and 14 to placebo. They never mention anywhere that the study began with 40 men.


[Regina] (27:58 - 28:19)
Wait, so they dropped from 40 men down to 27 without explanation? Yeah, this is definitely a problem there, not being transparent. Yeah, that is a big red flag.


And getting back to that participant flow diagram, this is why journals ask for that, so you can see these kinds of inconsistencies a little bit more clearly.


[Kristin] (28:20 - 29:05)
And, Regina, I'm going to hold off on telling you why I think they only reported 27 in this paper until a little bit later, but I also want to point out that in the 2021 through 2024 papers, they say the sample size was 35, and they never mention that they started with 40 men. Now, this might just be a reporting omission, but it's misleading to not report that you had dropouts. And, Regina, there are other details that shift from paper to paper.


For example, their exclusion and inclusion criteria change. In the first paper, they said one of the inclusion criteria was that participants had to be experienced ultramarathoners, and they defined that as having started at least two ultras before. But in later papers, that number jumps to five ultras.


[Regina] (29:05 - 29:10)
Well, that's a big difference. Two and five, definitely not the same when it comes to ultras. Exactly.


[Kristin] (29:11 - 29:42)
And then for the first five papers, they say that their age criterion was at least 30. But in the last paper, that's one we're looking at today, this suddenly becomes that they had to be between 30 and 50 years old.


[Regina]
Again, very different.


[Kristin]
Right. They're capping it at 50 all of a sudden. And then in this 2025 paper, they also added two new inclusion criteria about BMI and your baseline vitamin D level.


And you can't just change your inclusion and exclusion criteria after the fact, because those are the criteria that you used originally to select your participants.


[Regina] (29:43 - 29:57)
Oh, my God. So this is beyond sloppy, because if they cannot keep track of something as basic as how they chose participants for their own study, how can we trust any of their results?


[Kristin] (29:57 - 30:09)
That's the problem, Regina. If the write-up is this careless, then how careful were they with the parts that we can't see, like randomization, blinding, data collection, and data checking and cleaning?


[Regina] (30:10 - 30:21)
Exactly. Exactly. So data cleaning, for example, might sound boring and not very sexy, but it is super foundational, super important.


[Kristin] (30:22 - 30:23)
I find it very sexy, Regina.


[Regina] (30:24 - 31:06)
I know you do. I wanted to give you a chance to say that you found it sexy. The aphrodisiac of data cleaning.


Not boring at all. Because not boring at all. See, you really do like wholesome things.


Kristin, because if you mess that up, if you mess up the data cleaning, if you mislabel samples, if you let all these data errors slip in, then your entire analysis can be wrong. And, yeah, you can still churn out these pretty graphs and pretty p-values, but really it's just garbage in, garbage out.


[Kristin] (31:06 - 31:53)
And I'm afraid there's a lot of garbage out there in the literature, Regina.


[Regina]
Yep, yep, yep.


[Kristin]
Regina, there were some other sloppy elements scattered all over these papers, too.


So in that paper we're looking at today, you go to the end of the discussion section, and it literally ends with a stray note half in Polish. Something just like the authors forgot to delete.


[Regina]
You're kidding.


[Kristin]
Nope, just sitting there, right? Another paper, they left table one near here as a note in the text, like a reminder to the copy editor. Again, they forgot to delete it or take it out.


That means nobody checked the proofs carefully. And across all the papers, I found misnumbered references. Like, they cite reference 21 when they clearly meant to cite reference 28.


And I wasn't digging for mistakes. These were just the ones I stumbled on.


[Regina] (31:53 - 32:10)
Really sloppy. Kristin, you and I have talked about preprints needing to brush their teeth and wear deodorant. Right.


[Kristin]
See our episode on that chat GPT study.


[Regina]
But it's even more important when we're talking about peer-reviewed publication. What would be the equivalent now, Kristin?


[Kristin] (32:10 - 32:15)
Right. It's not even like a first date anymore. It's like your engagement or something, right?


[Regina] (32:16 - 32:39)
So you've got to raise the bar above just showered. I'm thinking you need to put on maybe like a suit or at least some shoes, you know? Right.


[Kristin]
Whiten your teeth.


[Regina]
Whiten your teeth. Oh, I like that.


They did not use a comb or put on a suit or whiten their teeth for this. If you've got straight little things half in Polish that they forgot to delete.


[Kristin] (32:39 - 33:01)
Yeah, exactly. And Regina, speaking of peer-reviewed publication, another thing that doesn't give me confidence in this paper is the whole peer review process behind the scenes. The first five papers in this salami slicing were all published in the same journal called Nutrients. And this is an MDPI journal.


And we've talked about MDPI before in the sugar sag episode.


[Regina] (33:02 - 33:08)
MDPI is an open access publisher with, let's say, a questionable reputation.


[Kristin] (33:09 - 33:29)
Yes. It doesn't mean every MDPI paper is flawed, but given the publisher's reputation, we need to approach these with extra caution. And what I saw of the peer review for these papers didn't give me much comfort.


This journal does allow authors to opt into publishing their peer review reports publicly, which is great.


[Regina] (33:29 - 33:33)
Oh, interesting. And did our authors opt in?


[Kristin] (33:33 - 33:37)
They actually did, but only for the 2020 and 2021 papers.


[Regina] (33:39 - 33:44)
The fact that they stopped opting in might just tell you something.


[Kristin] (33:44 - 34:35)
Right. I will give them credit for the two peer reviews that they did release. That is at least better than nothing.


[Regina]
So what do we learn from those single two?


[Kristin]
Regina, I was not terribly impressed with the peer reviewers. No one asked for basic stuff like a participant flow diagram or details on randomization. And one of the 2021 reviews is so thin, I wouldn't even count it as a peer review.


I can read the whole thing for you, Regina, right here.


[Regina]
It's that short?


[Kristin]
Yes.


[Regina]
All right. Go ahead. All right.


[Kristin]
Again, this is for the 2021 paper. Reviewer 1 writes, an interesting study on the use of high doses of vitamin D in marathon runners. I have some queries.


You did not mention smoking in the exclusion criteria. I think it should be added.


[Regina] (34:39 - 34:49)
That is hilarious. Yeah. I have some queries.


One thing, go back in time, change your entire study. Off you go. Off you go.


[Kristin] (34:50 - 34:54)
Go back, make new exclusion criteria, select new participants, and start all over again.


[Regina] (34:56 - 34:58)
I don't think they understand how time works.


[Kristin] (34:59 - 35:36)
Right. So there's a little more, though. Here's the rest from illustrious reviewer 1.


They write, no need to specify from what vein blood was collected. I also find it very unlikely that in 35 patients, all blood samples were from a single vein. And a small description of vitamin D and its function would be a great addition to the introduction.


Here's an article you should consider. And that's the whole peer review. To summarize, this peer review basically says, good study.


Please add an exclusion criterion after the fact. I want to quibble about how you described a vein. And please define vitamin D.


And that's it. No real substance.


[Regina]
That is it?


[Kristin]
Yes.


[Regina] (35:36 - 35:43)
Oh, my God. This is amazing. Kristin, we need to frame this somewhere, maybe.


We'll put it on a t-shirt.


[Kristin] (35:43 - 35:46)
Coffee mug. This would be a great merch item, Regina.


[Regina] (35:47 - 36:06)
If your peer review can fit on a t-shirt or coffee mug, maybe you should do a better job. Also, I think this is why the authors stopped opting into releasing their peer review. Because this really exposes a lot.


[Kristin] (36:06 - 36:08)
I know. It airs their dirty laundry, right?


[Regina] (36:08 - 36:14)
This is not a peer review. This is like comedy gold, though. It's like a parody of peer review.


[Kristin] (36:15 - 36:58)
Yeah, Regina, I agree. This is comedy gold, but only to nerds like us. They are not going to fill the comedy club with this material, as funny as you and I find it.


Now, to be fair, peer reviewers are volunteers. So I want to point out that even in good journals, you sometimes get a lazy reviewer. But a good editor is going to throw that review away and get another one.


[Regina]
And that didn't happen here.


[Kristin]
Yep. Now, for that 2021 paper, reviewer two was a little better.


They at least flagged a few important things, including that they noticed that the authors had published a paper previously in 2020. And they noticed that that previous study had only 27 participants and not 35. And they asked their authors about that discrepancy.


[Regina] (36:58 - 36:59)
Oh, good for them. They were on the ball.


[Kristin] (36:59 - 37:04)
I'm not going to give them a medal in statistical sleuthing for this one, but at least they noticed that.


[Regina] (37:04 - 37:07)
And did the authors come back with an explanation?


[Kristin] (37:07 - 37:28)
Yes. And if I'm interpreting their explanation correctly, they basically said, hey, we only ran the vitamin D assays on 27 out of the 35 people back in that 2020 paper because that was sufficient to give us statistical significance. So why would we keep going and spend the money running the assays on those other eight samples?


[Regina] (37:29 - 38:01)
This is not how science should work at all. So, Kristin, it sounds like they are admitting that they stopped analyzing blood samples, right, in the first paper because they got the p-value that they wanted. Yeah, exactly.


And this is p-hacking, plain and simple. No two ways about it. You do not just stop measuring once you get the result that you want.


That is another red flag here. Sorry. Nope.


[Kristin] (38:01 - 38:31)
Yeah, it's a type of torturing the data.


[Regina]
Yeah. Yep.


[Kristin]
Let's go back now, Regina, though, and look at the reviews for the 2020 paper, because there's another piece of comic gold in here. So one of the reviewers asked the authors to run a statistical test that's pretty standard for this type of data.


It's built into every major stats package. And I think you're going to get a kick out of the author's response. They respond, we agree that this approach would be beneficial.


However, our statistical software package was unable to perform this test.


[Regina] (38:34 - 38:42)
So translation, we didn't know how to do it, and we didn't want to learn how. Exactly.


[Kristin] (38:42 - 38:52)
Yes, I actually looked up their software, and it can run this test. So basically, they just didn't want to read the software manual. Granted, those things are really boring to read.


[Regina] (38:52 - 39:07)
Yeah, but no excuse at all. Oh, my gosh. Okay, to sum up, the peer review was not particularly, let's say, rigorous, like at all.


[Kristin] (39:07 - 39:12)
No rigor. Yep. That's right. And, Regina, this is the lens with which we need to interpret the results, which I now want to get to.


[Regina] (39:13 - 39:51)
Sounds good. Let's take a short break first. Welcome back to Normal Curves.


We're looking at a randomized trial of a megadose of vitamin D to help ultramarathoners' bones. And we were about to get to the results. Kristin, what did they find?


[Kristin] (39:51 - 41:15)
Right. So now let's look at the results from the 2025 paper on bones. Remember, they reported six biomarkers related to bones.


And five of those six were statistically different over time in the two groups. Let's look at two key results, Regina. They found significant differences in markers of bone resorption and bone formation, things we talked about earlier.


So one of those markers is CTX, and it's a marker of bone resorption, basically how fast a bone is being broken down. Previous studies show that CTX spikes after a major endurance event like an ultramarathon. But interestingly here, the placebo group barely changed, and the vitamin D group actually went down, suggesting that breakdown was reduced, not increased.


[Regina]
Well, that doesn't make any sense.


[Kristin]
It's a little contrary to what you might expect. They also looked at PINP, which is our marker of bone formation.


Other studies show that bone formation usually drops after an event like this. But in this study, the placebo group did not change, and the vitamin D group actually went up, suggesting that they were building bone. And Regina, I find these results a little hard to believe, because they imply vitamin D not only prevented bone damage, but somehow promoted bone growth while people were running 150 miles.


[Regina] (41:18 - 41:51)
Apparently, running 150 miles is good for you. Everyone should do it. Okay, so yes, they are finding positive results here in this study, but we are not sure how much to trust them, right?


Because of these weird, unexpected findings that don't really match with what we know about bones and races. And on top of that, we've got the paper being the equivalent of, what, an unshowered date.


[Kristin] (41:51 - 42:23)
And I'm going to add another red flag, Regina, which is they're ever-shifting sample size, because we have more shifts. So this paper claims to have analyzed data on 35 men, but when I look closely at the statistics, that does not seem to be true. In table two, they report something called degrees of freedom, which are directly related to sample size.


And from those, I was able to work back the sample size included in those analyses. Regina, can we take a quick statistical detour now to explain what degrees of freedom are?


[Regina] (42:24 - 43:24)
Oh, great idea. I love degrees of freedom, and students find these really mysterious. They find them completely mysterious, yes.


But really, they're just the number of independent pieces of information that you have when you're estimating something. That sounds weird, but kind of think of it like this. So suppose, Kristin, you and I and a group of statistician friends are all going out for pizza, and we all chip in for the pizza one by one.


And the last person to chip in, let's say it's you, is you don't get to choose how much to pay, because we've already thrown in our dollars, so you are stuck with paying whatever makes the total add up, right? So you have no freedom in the matter. The degrees of freedom are kind of like counting how many people actually had a choice in the matter, and it was all of us except for you.


[Kristin] (43:25 - 43:27)
Right, n minus one. That's how many degrees of freedom.


[Regina] (43:28 - 43:59)
Right, exactly, exactly. So the simplest case in stats is when you are estimating a mean, which is kind of like talking about the sum here. If you've got 10 people's test scores and you already know the average, then nine scores can vary.


They can change, but the tenth is locked in, just like we talked about the last person paying the amount to get us to the total for the pizza. So for a simple mean, the degrees of freedom are n minus one.


[Kristin] (43:59 - 44:34)
In this case, in this paper, it's not quite as simple as a mean and an n minus one because they're using something called the repeated measures ANOVA, and so the degrees of freedom are a little bit more complicated, but the principle is the same. Each effect in each interaction uses up some of the information, and that's reflected in the degrees of freedom which they reported here, and it's kind of like a statistical footprint. From those numbers, I can work backwards to the sample size, and I did the work, worked backwards to the sample size, and it tells me that there were 32 men included in those analyses, not 35.


[Regina] (44:34 - 44:41)
I love this, Kristin. Using degrees of freedom is such a simple, clever way to do statistical sleuthing.


[Kristin] (44:41 - 45:08)
Yeah, and it can be a little bit tricky. I have to admit, sometimes I don't remember exactly how the degrees of freedom work for a particular model, and I got to look it up again. A little cheat that people can use now, though, is you can go to AI, like ChatGPT, and explain the model that's being used, and ChatGPT or one of these AI can actually accurately tell you where the degrees of freedom came from and how to work backwards to the sample size.


So, anybody really can do this, and this is a great little clue in the papers.


[Regina] (45:09 - 45:21)
I love it. This is powerful. So, Kristin, getting back to this study, you are telling us that you found, yet again, the sample size is slippery and misreported.


[Kristin] (45:21 - 47:06)
Yeah, they claim to have analyzed data on 35 men, but it appears that they only analyzed data on 32 men. They dropped three people from these analyses, and they didn't tell us that they did that, and they didn't tell us why, right? Did the blood samples just not run for these three people?


We need to know that. Another suspicious detail related to degrees of freedom, Regina, is when I was looking through the 2021 to 2024 papers, the degrees of freedom in those four papers indicate that all 35 men were analyzed. Here's the strange part, though.


The degrees of freedom in every single analysis for every biomarker across those four papers, that was over 60 biomarkers, they are all identical. And the problem with that is it would mean that not a single data point was missing for any biomarker in any of the 35 men, and I find that highly implausible. Highly implausible.


[Regina]
Why?


[Kristin]
With biomarker data, you almost always lose at least a sample or two when an assay fails. You're not expecting perfect coverage, and that should be reflected in the degrees of freedom, a degree of freedom here is 32 rather than 33, but they're all identical.


So what happened here, you think? I suspect, actually, that they just cut and paste the degrees of freedom across all of the tables and didn't actually tell us the real degrees of freedom from each analysis. But again, that is just sloppy.


Regina, we also never get individual-level data, so the figures show means and standard deviations, but they do not show individual-level data points. I need to see that individual-level data, especially with a sample this small, and with biomarker data, which tends to be skewed and have extreme values, it's important for me to understand what's going on underneath the hood and to see those individual-level data.


[Regina] (47:07 - 47:19)
Yeah, not enough transparency here. So again, to come back to the big picture and the study, yes, we're seeing some positive results, but they're not really trustworthy, are they? Exactly, Regina.


[Kristin] (47:20 - 48:11)
So where are we at? Honestly, at this point, I really need to see the raw data from this study. And in their data availability statement, which many journals require, the authors do say that the data are available if you just email them.


So I did, I emailed.


[Regina]
Okay, and?


[Kristin]
I haven't heard back yet, but I want to be totally fair, I gave them a very short turnaround.


I found this study last week and I just emailed them, so I didn't give them much time to respond before taping this episode. So they still may get back to me, and if they do, I'm definitely going to give everyone an update when I look at that actual data. And Regina, this is exactly why open data are so critical.


It's not really good enough to say data are available upon request. You really should provide your data publicly. That helps me to actually trust your findings more if you've made your data available.


[Regina] (48:11 - 48:13)
Yeah, transparency, show us the receipt.


[Kristin] (48:14 - 48:41)
Exactly. All right, Regina, that wraps up our discussion of the paper, but I want to zoom out for a minute because I've actually worked on ultramarathon datasets myself, and I have a few lessons from my own experience that I can share and that apply to the paper we're looking at today. I have a good story to show why fishing expeditions like this are problematic, and I also want to point out a problem with the statistical tests that the authors of this 2025 paper chose to run on their data.


[Regina] (48:41 - 48:44)
Hmm, cool. Tell us more about the dataset you worked on then.


[Kristin] (48:44 - 49:29)
I analyzed data from the 2018 and 2019 Western States 100 miler. It was a very cool observational study, not randomized, but they went out and gave questionnaires to 123 participants. They also collected blood samples and did bone density scans in the field on some participants.


Here's the catch though, only 51 people had both blood samples collected and bone density data, 19 women and 32 men. And they were looking at hormones, so that's very different in men and women, so we needed to analyze those groups separately, leaving us with pretty small numbers. Of course, small numbers are always the case in ultramarathon studies.


You're lucky if you end up with a few dozen runners with complete data.


[Regina] (49:31 - 49:39)
Because how many people are crazy enough to run 100 miler in the first place, right?


[Kristin] (49:39 - 50:15)
Right. It's just an inherently limited group in terms of sample size.


Now, for this paper on the Western States data, the authors wanted to run a bunch of correlational analyses correlating different hormone levels to bone density at different sites in your body. But I didn't think this was a great idea because I didn't think those analyses were going to be very robust. So I advised them to instead focus on descriptive statistics, describing the population, like how many women had menstrual irregularities, how many people had had previous fractures.


Descriptive statistics, as we've talked about in the past, are really useful and important and sometimes they're just more appropriate than cranking out a bunch of p-values.


[Regina] (50:15 - 50:26)
Exactly. We talked about the importance of descriptive statistics in our male equipment size episode. So did they take your advice?


[Kristin] (50:26 - 51:21)
Yes, they published a descriptive paper. They did include a few exploratory correlations, but we clearly labeled those in the paper as exploratory and they weren't the focus of the paper. A fun thing happened a few years later, though, because they went out and got more data from the 2021 race. And two of my PhD students, Aubrey Roberts and Megan Roche, who are both runners themselves and I have to say, female runners make the best students, they wrote up a statistics column with me showing the problem with the kinds of fishing expeditions, exploratory analyses that we've been talking about today.


So what they did is they took this updated data set with the new data from 2021 added and they re-ran those correlational analyses that we had published in the original paper. And they showed just how much those correlations shifted. And the point of our column was to show that these exploratory analyses are unstable and we explained also why they're unstable.


[Regina] (51:22 - 51:29)
Wow, what a great column and a cool idea. Did you say Megan Roche? Isn't she a podcaster too?


[Kristin] (51:29 - 53:06)
Yeah, she is co-host with her husband on the podcast, Some Work, All Play. It is a great podcast because they do similar things to what we do in this podcast, Regina. They look at studies, but studies focused on performance in runners.


All of our runner listeners should go and listen to that podcast.


[Regina]
She and her husband are quite funny too.


[Kristin]
They are quite funny, yes.


The other important lesson that I want to point out here besides the problem with fishing expeditions is what's the appropriate kinds of tests for data like this? Because in the 2025 Boone paper that we looked at today, they used what are called parametric tests. And I'm not sure that those were appropriate for these data.


We covered in the healthy hookworms episode, Regina, the difference between parametric and non-parametric tests. So to remind everyone, parametric tests rely on assumptions about the data. If you have big samples, those assumptions often hold, but with small samples, especially when you have data that are skewed, that are not nice and bell-shaped, as biomarker data tend to be, these assumptions often don't hold.


That's why with small samples like this, I always default to non-parametric tests, which don't rely on those assumptions. So all of those correlational analyses we just talked about that we did for the Western states data, we only use non-parametric tests. Something called Spearman's rank correlation coefficients, which look at the ranks of the data rather than the underlying numbers.


But the authors of this 2025 vitamin D paper did not do that. They used parametric tests, and I doubt that the assumptions were met here, which makes their results even shakier.


[Regina] (53:06 - 53:23)
That's an excellent point, Kristin, about parametric and non-parametric tests, and just another thing to keep in mind when we're looking at the results and trying to figure out how trustworthy they are. So what do you think, Kristin? Are we ready to wrap up now and rate our strength of evidence for the claim?


[Kristin] (53:24 - 53:36)
Yes, I think we're ready, Regina. Okay, remind us of the claim. The claim is that taking a big dose of vitamin D right before an ultramarathon can reduce bone breakdown from the race.


[Regina] (53:37 - 53:58)
Okay, very specific claim there. And we are going to evaluate this with our smooch rating scale, one to five. One smooch means little to no evidence.


In favor of the claim, five means very strong evidence. And Kristin, what do you say? Are you kissing it or are you dissing it this time?


[Kristin] (53:58 - 54:25)
I'm dissing it, as you might guess. I'm going to go with one smooch. They do have positive results in this finding, but if we look at the totality of the evidence, the fact that they salami-sliced so many papers out of this data set, all of the red flags and shifting sample sizes that are not transparent and unexplained, I just don't trust these results.


I'm going to need to see the data myself. And so I'm going one smooch. What about you, Regina?


[Regina] (54:25 - 54:43)
I'm going to do one smooch, too, for all the reasons that you mentioned. It just does not feel very robust when you are looking at this one narrow study in the context of everything. So one smooch, not convinced.


[Kristin]
Once again, vitamin D is not the wonder drug everyone claims.


[Regina] (54:45 - 55:07)
What about a methodological moral?


[Kristin]
Yeah, I'm going to focus today on the salami slicing because this is one of the more blatant cases of salami slicing that I've ever seen. So here's mine.


Publishing the same study again and again with only the outcomes swapped is Mad Libs Science, better known as salami slicing. Oh, I like it. How about you, Regina?


[Regina] (55:08 - 55:25)
I'm going to pick on the degrees of freedom thing in here because that's kind of a new one for us. So here's my methodological moral. Degrees of freedom are the breadcrumbs in statistical sleuthing.


They reveal the sample size even when the authors do not.


[Kristin] (55:25 - 55:37)
Oh, I love that. That's such a good little trick for people to know. And again, even if you think I'm not up on my degrees of freedom, you can now go get a little help from AI and you can work back those sample sizes.


[Regina] (55:38 - 56:11)
The sorts of things we're talking about today, it's very easy to go pull the original paper like you did and look through it for clues. I think it's important because sometimes people feel, oh, I'm not a statistician, I'm not a scientist, I see this interesting thing, you know, in the popular media and I need to trust it. Or if I were to go pull the study, I wouldn't know how to read it.


But this is showing that, no, go ahead, pull the study, look at it, and there might be some things popping out at you and you don't need a stats degree.


[Kristin] (56:12 - 56:46)
No, and this is a perfect example, Regina. This was an open access paper, so anybody can go access the full text of this paper. And the kinds of problems that we found in this paper definitely do not require a degree in statistics.


Yes, the degrees of freedom is probably the most complicated thing, but things like there's stray text at the end of the discussion section where the references are misnumbered, or the sample size keeps changing, or there's actually six papers published on this data set that they didn't tell us about. That was easy to Google and find. You do not need a statistics degree to figure out that this study is not trustworthy.


[Regina] (56:47 - 56:55)
Yeah, I think this is the perfect example of that. And it feels like maybe the results were a little overhyped in the journalism piece.


[Kristin] (56:56 - 57:17)
Yeah, I'm going to critique that Runner's World article. Sorry, just like we critique researchers in this podcast, sometimes we might critique science writers. I think that author did not really do their homework and they were overly excited about the results and probably should have given the underlying full text paper a quick look.


[Regina] (57:18 - 57:33)
Yep, Kristin, I agree. So this whole thing has been very interesting and I think that I am now going to be able to restrain myself from taking 150,000 IUs of vitamin D before I go out for my next run.


[Kristin] (57:34 - 57:35)
Good, Regina, very good.


[Regina] (57:36 - 57:43)
Yes. So thank you, Kristin. This has been delightful as always.


[Kristin
Thanks, Regina, and thanks everyone for listening.