Feb. 23, 2026

Marathon Performance: Does high-carb fueling work?

Marathon Performance: Does high-carb fueling work?
Spotify podcast player iconApple Podcasts podcast player iconAmazon Music podcast player iconPocketCasts podcast player iconPodcast Addict podcast player iconCastbox podcast player iconOvercast podcast player iconYoutube Music podcast player iconYouTube podcast player iconRSS Feed podcast player icon
Spotify podcast player iconApple Podcasts podcast player iconAmazon Music podcast player iconPocketCasts podcast player iconPodcast Addict podcast player iconCastbox podcast player iconOvercast podcast player iconYoutube Music podcast player iconYouTube podcast player iconRSS Feed podcast player icon

How many carbs do you need to run your best marathon? Recent headlines suggest that 120 grams per hour is the magic number. But what’s the science behind that claim? To find out, we dug into the study fueling the hype — and were surprised by what we found. In this episode, we uncover numbers that mysteriously shift after peer review, figures that don’t match the text, and p-values that refuse to line up with their confidence intervals. Along the way, we swap bonking stories, revisit repeated-measures ANOVA, renew our antipathy for spreadsheets, and follow a trail of statistical termites to a surprisingly happy scientific ending.


Statistical topics

  • Article in press vs final publication
  • Data management and workflow
  • Multiple testing
  • P-values and confidence intervals
  • Repeated Measures ANOVA
  • Statistical sleuthing
  • Version control in research
  • Within-person study design


Methodological morals

  • “Everyone makes statistical mistakes, not everyone fixes them.”
  • “If the numbers aren't consistent, Excel is often part of the story.”
  • “If a p-value doesn't survive the trip from text to figure, there's a problem.”

Statistical Sleuthing Extended Notes


References


Kristin and Regina’s online courses: 

Demystifying Data: A Modern Approach to Statistical Understanding  

Clinical Trials: Design, Strategy, and Analysis 

Medical Statistics Certificate Program  

Writing in the Sciences 

...

00:00 - Intro

04:59 - Why carbs?

11:06 - History of carb recommendations for endurance events

22:23 - Most recent study setup

29:14 - Study stats

36:49 - Data issues in the study

42:33 - graph2table for data extraction

46:53 - More statistical sleuthing

52:07 - Sexy data processing

56:14 - What happened in the meeting with the research team

01:02:30 - Wrap-up and methodological morals

[Regina] (0:00 - 0:10)
So, after you told me about this, I went and I tried it, and you were right, it's kind of like magic. This is the magic boyfriend. It gives you everything you want.


[Kristin] (0:16 - 0:39)
Welcome to Normal Curves. This is a podcast for anyone who wants to learn about scientific studies and the statistics behind them. It's like a journal club, except we pick topics that are fun, relevant, and sometimes a little spicy.


We evaluate the evidence, and we also give you the tools that you need to evaluate scientific studies on your own. I'm Kristin Sainani. I'm a professor at Stanford University.


[Regina] (0:40 - 0:45)
And I'm Regina Nuzzo. I'm a professor at Gallaudet University and a part-time lecturer at Stanford.


[Kristin] (0:46 - 0:51)
We are not medical doctors, we are PhDs, so nothing in this podcast should be construed as medical advice.


[Regina] (0:51 - 0:56)
Also, this podcast is separate from our day jobs at Stanford and Gallaudet University.


[Kristin] (0:56 - 1:04)
Regina, today we are going to talk about fueling for marathons or for other grueling long-distance endurance events.


[Regina] (1:04 - 1:09)
I am super excited because finally an episode about pizza and French fries.


[Kristin] (1:10 - 1:15)
Those would actually be pretty hard to eat during a marathon, so we're talking more like energy drinks.


[Regina] (1:15 - 1:22)
I'm still on board, though, because carbohydrates are yummy, and if they help you run too many miles then it's still good.


[Kristin] (1:22 - 1:42)
Yeah, and energy drinks actually taste really good at like mile 16. Regina, there was a paper that came out online in November of 2025 recently in the Journal of Applied Physiology, and it made quite a splash. One of the headlines in the media coverage was the new high-carb study that's rocking the running world.


[Regina] (1:43 - 1:45)
Ooh, they get to my attention. Yeah.


[Kristin] (1:45 - 1:56)
So today we're going to look at the research on what's called high-carb fueling. And we're going to examine the claim that high-carb fueling during long-endurance events will help you race faster.


[Regina] (1:57 - 2:07)
So Kristin, what counts as a long-endurance event? Are we talking like all-night love fest? How many carbs do you really need to take in for a long night of lovemaking? Do we know that?


[Kristin] (2:07 - 2:28)
Good question, Regina, and none of the research we're going to talk about today addresses that question. Here we are talking about long-distance endurance races, marathons, long bike rides, long swims.


And by high-carb fueling, we're going to talk about a very specific number, 120 grams per hour of carbs.


[Regina] (2:29 - 2:35)
120 grams of carbs every hour is a lot of carbs. How many calories is that?


[Kristin] (2:35 - 2:43)
That is about 480 calories per hour, which is definitely more than I ever took in during a marathon. Yikes.


[Regina] (2:43 - 3:01)
So I looked up how many carbs are in a can of Coca-Cola, and it's just under 40 grams. So are we talking about three cans of Coke per hour?


[Kristin]
Yeah, I guess so.


[Regina]
For a four-hour marathon, that's a 12-pack of soda? That's a lot of soda.


[Kristin] (3:01 - 3:08)
That is a lot of soda. And I don't think you want to drink Coke during a marathon, Regina, unless it has been defizzed.


[Regina] (3:09 - 3:12)
Oh, yeah, yeah, yeah. No, tummy issues with all the bubbles.


[Kristin] (3:13 - 3:14)
Yeah, exactly.


[Regina] (3:15 - 3:21)
Okay, 120 grams of carbs per hour, where do they come up with this number? Is that what we're talking about in the paper today?


[Kristin] (3:22 - 3:48)
Well, there is a long line of research leading up to this 120-gram number, Regina. We're going to start today by reviewing some of that earlier research, and then we will jump into this specific 2025 marathon paper, which did also use 120 grams. Importantly, Regina, there are two versions of this paper floating around.


There is what's called the article in press, and then there is the final formatted version of the paper, and we're going to have to talk about both today.


[Regina] (3:48 - 3:58)
Oh, interesting. We have not talked about articles in press before on this podcast. Usually they're identical to the final version of the paper, but it's a big deal when they're not.


[Kristin] (3:59 - 4:18)
Yeah, exactly, and we'll unpack that later today. But before we jump in, I need to give a huge shout out to one of our listeners, Michelle Hummel. She's a mathematician at Sandia National Laboratories.


She's also an avid endurance athlete, and she is the one who tipped us off to this paper. She also did a lot of the statistical sleuthing that we're going to discuss today.


[Regina] (4:19 - 4:21)
Oh, statistical sleuthing. Love it.


[Kristin] (4:22 - 4:33)
Yes. Good old statistical sleuthing. And Regina, as you know, this episode has a twist and a happy ending.


I'm not going to give it away. Everyone will need to listen to the end to find out what happened.


[Regina] (4:34 - 4:44)
Right. All right. Suspense.


Always good. Okay, so we're talking for statistics today about articles in press, statistical sleuthing. What else?


[Kristin] (4:44 - 4:59)
We're also going to talk about repeated measures ANOVA, the correspondence between p-values and confidence intervals, data digitization from figures, and the importance of good data management.


[Regina]
This is all pretty juicy.


[Kristin]
Very juicy for nerdy people, yes.


[Regina] (4:59 - 5:24)
But let's start with some background on the carbs thing, though, Kristin, because I remember carbo-loading was the thing back in the day. So I was at Stanford in the late 90s, and I was doing these baby 10K runs with you and our boyfriends. I could swear that we would go eat a big spaghetti dinner the night before.


Am I imagining that? Is that the idea behind all of this?


[Kristin] (5:25 - 5:52)
That sounds right. I have a vague memory of that, too, Regina. And yes, that's the idea.


I was a competitive runner back in the 80s and 90s. And back then, when we talked about fueling, it was all about carb loading. That was all the rage.


It was a little more precise than just eating a bowl of spaghetti, though. You were actually supposed to eat very low carbs for a few days and then, like, binge on carbs before the race. And somehow this was supposed to help you store extra carbohydrates in your muscles.


[Regina] (5:52 - 5:57)
And the whole spaghetti ritual was more fun than the run, of course.


[Kristin] (5:59 - 6:47)
Yeah. Any excuse to eat carbs, absolutely. But the basic biology here, Regina, is that your body can only store a limited amount of carbohydrates.


Carbs are stored in your liver and muscles as glycogen, and the most you can store is about 500 to 600 grams. So the whole idea of carb loading was to try to optimize that and, you know, store a little bit of extra. The focus in the last few decades, though, has not been about storing more before the race, but actually eating more carbs during the race.


Because during a long endurance event, you are basically using up your stores. And if you don't eat carbohydrates during the exercise, if the event is long enough, eventually those stores are going to run low. And when that happens, your body shifts from burning mostly carbs to burning more fat for energy.


[Regina] (6:48 - 6:52)
Don't we want to burn fat, though? It helps you keep a healthy weight, right?


[Kristin] (6:53 - 7:17)
Yeah, it will help you lose weight, but unfortunately it will not help you go faster. So fat provides a lot of energy, but it takes longer to break down and use than carbs. So when you run low on carbs, you can't burn fat fast enough and you are going to slow down and you really suffer.


And this is what people know as bonking or hitting the wall. Have you ever experienced bonking, Regina?


[Regina] (7:18 - 7:58)
I have experienced bonking, as a matter of fact. One time I was doing this 50k run in D.C. and at the time I was making my own carb goo mixture to take in as fuel during the race. And for this one, I had severely underestimated how many carbs I was taking in.


And amazingly, I made it to mile 18, but then I completely bonked. All I remember is I just stopped, I bent over and put my hand on my knees and I just started bawling. I could not stop crying.


It felt like the whole world was ending. It felt like the apocalypse. It was crazy.


[Kristin] (7:58 - 8:11)
Regina, that is actually a marvelous description of bonking. Your body really feels in crisis.


Hitting the wall, also a very good descriptive term there.


[Regina] (8:12 - 8:24)
Oh, it was horrible. It was so horrible.


[Kristin]
Regina, what was your goo made of? Basically cornstarch, soy protein and oil. And I mixed all of it into a paste that was the consistency of cake batter.


[Kristin] (8:24 - 8:29)
That sounds gross and no wonder you bonked. That sounds too horrible to eat, actually.


[Regina] (8:30 - 8:39)
It tasted good at the time and usually it worked. I just mixed it wrong that one time, changed my recipe and that's when I bonked. Oh my goodness.


[Kristin] (8:40 - 9:10)
All right. So the idea of carb fueling during a race is that you don't want to run out of carbs. And the amount that you store in your body alone is not enough for really long races like a marathon or ultra marathon, meaning you need to take in carbs during the race.


And there are good studies from the 80s and 90s showing that if you give people carbs during endurance events, it does improve performance over a placebo. Of course, Regina, most people aren't relying on their own homemade goo. Really, this has led to an entire marketplace of fueling products.


[Regina] (9:11 - 9:18)
And some of these off-the-shelf products are pretty helpful, I have to say, but Kristin, what about you? Do you use any of these in your runs?


[Kristin] (9:18 - 9:43)
Well, I don't run enough anymore to need to use these. And back in my competitive days, things like gels were only starting to come online. So in my races, I only took in energy drinks like Gatorade.


I probably wasn't taking in more than like 25 grams of carbs the entire race. Maybe I would have been faster if I'd had more carb fueling. But I was doing marathons, not ultra marathons, so shorter duration than you.


[Regina] (9:44 - 9:51)
You know, for some of the ultra runs I've done at the aid stations, they offer things like mashed potatoes and jelly sandwiches.


[Kristin] (9:52 - 9:56)
Oh, mashed potatoes and jelly sandwiches? That's kind of hilarious. And it sounds really hard on the stomach.


[Regina] (9:56 - 10:11)
Kristin, it's all about the pace, though. You got to remember this.


If you're going more like a brisk hike, which is what I was doing, rather than what, like a six-minute pace, like what you were doing, it's a whole lot easier to get some jelly sandwiches down.


[Kristin] (10:12 - 10:30)
Yeah, I think you're right. Because if you are running really at a fast clip, it is hard to open a package. It's hard to eat anything.


And I think that's why the gels and drinks and chews and tablets have become popular, because it is easier to get them down and also to keep them down than, say, a jelly sandwich.


[Regina] (10:31 - 10:39)
But now you've got to try to convince people that your goo, right, if you're a manufacturer, your goo works better than everyone else's goo out there.


[Kristin] (10:39 - 11:02)
Yes. And that means there is a conflict of interest with a lot of this research. And there is a conflict of interest with the specific paper that we're talking about today.


So the research was funded by a grant from Science in Sport, and one of the authors is a consultant for Science in Sport. And this is a company that makes these products. So they definitely have an interest in showing that their formula stands out from the pack.


[Regina] (11:03 - 11:05)
Good to keep in mind all those conflicts of interest.


[Kristin] (11:06 - 11:32)
We got to keep that in mind. OK, Regina, now on to the numbers. So before the early 2000s, researchers thought that there was basically a hard ceiling on how much carbohydrate you could use from food during exercise.


They thought it was about one gram per minute or 60 grams per hour. And the idea was that if you took in more than that, it wouldn't actually get absorbed and burned fast enough to help you out. So it was just wasted.


[Regina] (11:32 - 11:37)
Wasted. Where did all those carbs go then, like straight into your butt and thighs and stomach?


[Kristin] (11:38 - 11:51)
Not exactly. The thinking was that it mostly just hangs out in your gut, so it will eventually get absorbed, but it's just not absorbed quickly enough to help you, say, during your run. And because it's sitting around there, it might even cause you GI distress.


[Regina] (11:52 - 11:57)
OK. Got it. Not good.


So how did they figure all these numbers out?


[Kristin] (11:58 - 12:08)
So these studies are really cool, actually. They used carbohydrates that have been labeled with a tracer. The sugars that they used have some carbon-13 in them.


[Regina] (12:08 - 12:11)
Carbon-13. Is that like radioactive?


[Kristin] (12:12 - 12:48)
No, it's not radioactive. It's just an isotope of carbon. And let's do a little quick chemistry review here, Regina.


Most carbon in nature has a mass of 12, 6 protons and 6 neutrons. Carbon-13 has an extra neutron, so it's just a little heavier, and that makes it detectable. So if you put carbon-13 labeled sugar in the drink, then you can trace what happens to that sugar.


Because when the body breaks the sugar down, it creates carbon dioxide, which you then breathe out. So what they do is they put a mask on the athletes during their exercise, and they measure the carbon dioxide that they're breathing out.


[Regina] (12:49 - 13:00)
Oh, that is so cool. So if some of that carbon dioxide that they're breathing out contains carbon-13, then you know the person did use some of the energy drink for fuel.


[Kristin] (13:00 - 13:13)
Yeah, exactly. Isn't that cool? It tells you that the carbohydrates they burned were not just coming from internal stores.


And you can quantify very precisely exactly how many carbs from the drink that the athletes used during the run.


[Regina] (13:13 - 13:15)
I love science. That is so nifty.


[Kristin] (13:16 - 13:41)
Yeah. Regina, when they first did these studies, they noticed that if you fed people more than 60 grams per hour of carbs, you didn't get any benefits. You were still burning only one gram of carbs from the drink per minute, even if you took in more than that.


So they believed there was this kind of ceiling, and there were formal guidelines published at the time that said athletes should take in 30 to 60 grams of carbs per hour if they were exercising for more than an hour.


[Regina] (13:41 - 13:47)
But 30 to 60 grams of carbs, that's a pretty big difference from that 120 that we just talked about.


[Kristin] (13:47 - 14:03)
Exactly. Because science has progressed since then. In the early 2000s, researchers discovered that if you mixed different types of sugars together, you could actually burn more than one gram of sugar per minute.


In other words, you could break through that ceiling that they had thought existed.


[Regina] (14:03 - 14:07)
Oh, so if you mix, what, like glucose and fructose together?


[Kristin] (14:07 - 14:11)
Exactly. Those studies combined glucose and fructose.


[Regina] (14:11 - 14:15)
And glucose is what? White rice, right?


[Kristin] (14:15 - 14:46)
Yeah, white rice is basically a bunch of glucose molecules stuck together, yes. And fructose is the sugar found in fruit. So these studies found that if you fed athletes 90 grams of carbs per hour as a combination of glucose and fructose, then the athletes could break through that one gram per minute ceiling.


90 grams per hour, that translates to 1.5 grams per minute, and they didn't quite burn everything they were taking in, but they were able to get up to about 1.3 grams per minute using this mix. So above that 1.0.


[Regina] (14:46 - 14:57)
Hmm. So now we're going from recommending 60 grams per hour to 90 grams per hour as long as it's a mix of sugar, so white rice and bananas.


[Kristin] (14:57 - 15:18)
Yeah. And it's kind of cool biology because they think that the reason this works is that glucose is absorbed in the gut by specific transporters, and they think that those transporters get saturated at about one gram per minute. But fructose is absorbed by different transporters, so by layering on fructose, you can absorb more.


[Regina] (15:18 - 15:31)
Mmm, fructose layers, yummy. Okay, so 90 grams of carbs, which is a lot. That's 360 calories, and that would be, what, two Cokes an hour?


[Kristin] (15:31 - 15:35)
Yeah, or like three gels or three energy drinks per hour.


[Regina] (15:35 - 15:49)
Okay, so getting up to 90 grams of carbs per hour means you can absorb, you can use more of the carbs from the energy drinks or gels, but is there any data showing that this helps anything? This helps performance, the thing we actually care about?


[Kristin] (15:49 - 15:59)
Yes, there is, but I want to be cautious here, Regina. The evidence saying that any carbs versus no carbs, that that improves performance, that is very strong.


[Regina] (15:59 - 16:04)
Okay, but that's any versus none. What about 90 versus 60?


[Kristin] (16:04 - 16:34)
Right, the evidence there for performance is actually much weaker, even though everyone believes this. In fact, there were really just a handful of studies that involved like 8 to 10, usually male cyclists. And actually, one of the studies that people point to as evidence, when I looked it up, I realized that there's actually a problem with it.


It doesn't show a performance benefit between the 90 and 60 conditions, even though people think it does. And Regina, you're going to love this. This study used magnitude-based inference.


[Regina] (16:35 - 16:47)
Oh, no, you're kidding. Your nemesis, magnitude-based inference. We talked about this one, this very flawed method in the p-values episode and thoroughly debunked it, I think.


[Kristin] (16:48 - 17:10)
Yeah, and people should go and listen to that episode for more detail. But if you actually look carefully at the data in this paper, there's no statistically significant difference between the 90 and 60 conditions for performance. But because magnitude-based inference lowers the bar for what counts as a positive result, these authors were able to claim a positive result when in fact the evidence there is extremely weak.


[Regina] (17:10 - 17:18)
Oh, no. OK, so we are saying that the evidence for performance going from 60 to 90 is, let's say, not rock solid.


[Kristin] (17:18 - 17:34)
Right. There is some evidence, but it's not as solid as people think for those performance outcomes. But in 2016, the official guidelines for athletes were updated to say that if you are exercising longer than 2.5 hours, you should aim for 90 grams of carbs per minute.


[Regina] (17:35 - 17:48)
So wait a minute, 90 grams per hour, that's the target for, like, what, 10 years now? Yep. That's interesting.


So 90 grams per hour, that's one and a half grams of carbs per minute. You're saying that's the goal?


[Kristin] (17:48 - 18:05)
Exactly, if you're exercising long enough. And that leads us up to the current moment, Regina, because now it's all the rage to try to push this even further. The idea being that maybe you can eke out even more benefit at 120 grams per hour of this mixture.


[Regina] (18:06 - 18:18)
Oh, you're pushing it even further. Okay, 120 grams per hour, that's two grams per minute. And oh, this is what they mean by high-carb fueling, because now it's even higher, right?


[Kristin] (18:18 - 18:51)
Yep. And they've shown that if you push up to this 120 grams per hour, you can indeed burn more carbs from food. So remember, with 90 grams, we were able to get up to around 1.3 grams per minute. But with 120, you can burn somewhere around 1.7 grams per minute from food. But, Regina, none of these studies showed that high-carb fueling, the 120 grams, could improve performance. They were just focused on carbohydrate metabolism.


So some of the hype around this in blogs and podcasts is actually overstepping the evidence.


[Regina] (18:52 - 19:01)
Oh, interesting. So this high-carb fueling might be one of these things that sounds totally reasonable in theory, but the data just aren't there yet?


[Kristin] (19:01 - 19:17)
Yeah, exactly. In fact, there was one study in 2013 that tested a bunch of different levels of carbohydrate intake, 10, 20, 30, 40, 50, all the way up to 120. And that study found no clear benefits for performance above about 80 grams per hour.


[Regina] (19:17 - 19:23)
Oh, interesting. So maybe somehow it's not translating to real-life performance gain.


[Kristin] (19:24 - 20:10)
Yeah. And the reason this is an important question, Regina, is that there is a trade-off, right? Why not just take as many carbs as possible?


Well, the trade-off is, especially in running, it can cause stomach problems to eat too much. And Regina, stomach problems can really do you in in a race.


[Regina]
Oh, yeah.


[Kristin]
Actually, fun story about stomach problems. So I am a Dartmouth running alum, and another Dartmouth running alum, Bob Kempinen, won the 1996 U.S. Olympic marathon trials. But that race is infamous.


[Regina]
Oh, why?


[Kristin]
He vomited multiple times in the final miles, very dramatically, in the TV coverage. He still won the race.


And of course, we don't know if he vomited because of high-carb fueling, probably not back then.


[Regina] (20:10 - 20:16)
So did he stop, or was he just running and then turned his head to the side and yak it all out?


[Kristin] (20:16 - 20:26)
He kept running and just turned his head to the side, yeah. So very dramatic television coverage. Oh, you should go to YouTube if you want to see it.


But I'm sure it's in the archives somewhere there.


[Regina] (20:27 - 20:31)
Oh, I can't wait. All right. That's gross.


And we're going to put a link in the show notes.


[Kristin] (20:32 - 20:49)
It is an important moment in marathon history, Regina. Actually, funny story, Kempinen, he went on to become a doctor, but he worked in the lab of one of my biology professors at Dartmouth before I was there. But my professor was telling me how he was very spacey in the wet lab.


[Regina] (20:49 - 20:53)
Oh, maybe he was just low in glucose and just bonking all the time.


[Kristin] (20:53 - 21:05)
Probably, from all of that running, yes. All right. On that gross note, I say we take a quick break before we get into the details of the recent study that we're focusing on today, which was on marathoners.


[Regina] (21:26 - 21:32)
♪♪♪ Welcome back to Normal Curve. Today we are talking about high-carb fueling in endurance races.


[Kristin] (21:32 - 22:25)
All right. Let's talk about the paper now. Again, this was published online in early November of 2025.


And what was the study design? What was the sample size? So like many of the studies before this, this was just an eight-person study.


[Regina]
Oh, that's small.


[Kristin]
Yeah. This entire line of research, as I mentioned earlier, it's really been like eight to ten athletes per study.


Regina, actually, I saw a funny post on LinkedIn a few months ago that I've been dying to use on the podcast. So I'm going to get it in here. This was Zacharias Papadakis, and he says, N equals seven is not a sample size. It's a dinner party.


And I felt like that's a Normal Curve's methodologic moral right there. So this is kind of a dinner party study, N equals eight.


To be fair, though, they used what's called a within-person design, which means every runner did all three conditions and served as their own control.


[Regina] (22:25 - 22:48)
Which is a very important distinction because actually that within-person design really increases your statistical power. And that's because there is so much variability from person to person. So if everyone is their own control, it means we can really see how much things change for each individual person.


[Kristin] (22:48 - 23:27)
Exactly. So actually, N equals eight isn't so bad here because of that within-person design. Also, keep in mind that this group of athletes was very select. So probably hard to find people for this study because they were all elite male runners who had run under 2:30 in a marathon or under 1:13 in a half marathon.


No biggie. I do that every day. Actually, Regina, I have run under a 1:13 for a half marathon.


I did a 1:11 once, but that does come with a caveat because it was on a course with a substantial downhill portion. So I don't know if that counts for getting into this study.


[Regina] (23:28 - 23:30)
Oh, my goodness. Okay. That's in my book, at least. That's crazy.


[Kristin] (23:31 - 23:46)
It was also like 30 years ago, so that probably also hurts my chances. All right.


This was a double-blind, randomized crossover trial with three different conditions where each runner did three different runs on a treadmill, and each time they received a different drink.


[Regina] (23:48 - 24:03)
You and I love those terms, Kristin, but let's unpack some of the jargon. First of all, double-blind means both the participants and the researchers did not know which drink they were getting each time, so that is going to reduce bias.


[Kristin] (24:03 - 24:19)
Yes. Excellent. And crossover means the participants crossed over and experienced all the different drinks, again, serving as their own control, so we love that.


Random means that the order in which they got the drinks in the different trials was random, which helps prevent systematic differences between the trials.


[Regina] (24:19 - 24:25)
So, three different drinks on three different days now. Tell us about the drinks.


[Kristin] (24:25 - 24:40)
The three drinks contained 60 grams of just glucose per hour, 90 grams of a glucose-fructose mix per hour, or 120 grams of that mix per hour, and the sugars, the carbs here, were labeled with carbon-13.


[Regina] (24:41 - 24:48)
Ah, and that's the one where you breathe out into a mask, and the researchers can see how much of the carbs they're using from these drinks.


[Kristin] (24:48 - 24:57)
Exactly. And they had a very strict protocol before each of the three sessions. They had to eat a prescribed high-carb diet in the 24 hours before the trial.


[Regina] (24:58 - 25:01)
Oh, nice. So, they all had the same amount of starting fuel each time.


[Kristin] (25:02 - 25:26)
Exactly. You want to reduce any variability in the amount of fuel that you have stored, exactly. All right, the actual run, each trial was a two-hour treadmill run.


The first and last 15 minutes were run at a fairly hard but controlled pace. For the men, on average, that was a 6:19 per mile, and in the middle 90 minutes, they had to run just lower than marathon pace, and on average, that was 5:53 per mile.


[Regina] (25:27 - 25:30)
Okay, these adjectives and these numbers do not line up.


[Kristin] (25:30 - 25:30)
Oh.


[Regina] (25:31 - 25:41)
Fairly hard but controlled pace, 6:19 per mile. Okay, so that is very quick, let's just say that, and they did this for two hours straight.


[Kristin] (25:41 - 26:20)
Yes, exactly. Regina, I think I might have been able to survive this protocol when I was in my absolute best shape, because I once did do a 16-mile training run on a hilly course in 5:56 pace, so I might have been able to hold under 6 pace on a treadmill for 90 minutes, except it wasn't quite as easy as just running. So, the runners had to do all this other stuff in the middle of running, and this is where I literally would have flown off the back of the treadmill. Every 15 minutes, they had to drink four ounces of the drink, and they also had to don a mask for three minutes, so the researchers could measure the oxygen that they were breathing in and the carbon dioxide that they were breathing out.


[Regina] (26:21 - 26:24)
Okay, mask means they could not have thrown up like your friend.


[Kristin] (26:24 - 26:32)
Oh, that would have been a problem, yes, definitely. It's kind of amazing, though, that they could run with this mask on and still hold 5:53 pace.


[Regina] (26:32 - 26:35)
Okay, so they're wearing the mask, but how did they drink the drink then?


[Kristin] (26:36 - 27:03)
Right, so I'm thinking that every 15 minutes, they had them, like, drink first and then put on the mask, or vice versa. So, sequential, not simultaneous. Also, every 30 minutes, they got a finger prick blood test, and they had to step off the treadmill and be weighed, and they also had to answer a series of questions about how their stomachs felt, how sweet the drink tasted, and whether they wanted to keep drinking, meaning they had to be able to hold this pace and keep talking.


[Regina] (27:04 - 27:14)
Keep talking, it makes sense at the same time. So, this is a lot of variables. The researchers really were quite detailed on the things that they were measuring, weren't they?


[Kristin] (27:14 - 27:17)
This is very complicated and a lot of work, yes.


[Regina] (27:17 - 27:22)
A lot of work, yeah. Okay, so of all of those variables, which one was the primary outcome?


[Kristin] (27:22 - 27:57)
So they don't clearly define this in the paper, but we can guess from the context that their primary outcome was the rate at which the athletes were burning the fuel from the drink. They were trying to see, again, how high could they get that number above the one gram per minute that we talked about earlier. They also had a lot of other outcomes, like how many total carbs they were burning per minute, both from internal stores and from the drink, and how much fat they were burning per minute, and also things like oxygen uptake.


And that one is important because a lot of the press coverage focused on the oxygen uptake.


[Regina] (27:58 - 28:01)
What does oxygen uptake tell you? Like, is that a big deal?


[Kristin] (28:02 - 28:17)
Yeah, so if you're holding the same pace, but using less oxygen, that means that you're running more efficiently, and they also call this running economy, and it's really the closest that we get to a performance measure in this study, and that's why people focused on it.


[Regina] (28:17 - 28:28)
Oh, I love that word, running economy. Excellent. Okay, so that is the experimental setup, but what about the stats?


How did they analyze all this data?


[Kristin] (28:28 - 28:40)
They used repeated measures ANOVA, and this is a test that's used to compare numeric outcomes. When you have the same person measured both under different conditions and at different time points.


[Regina] (28:40 - 28:51)
Right, these are correlated observations, because as we've said, it's the same person, and you need to handle correlated observations correctly. We talked about this in the pheromones episode.


[Kristin] (28:51 - 29:21)
Exactly, and here, repeated measures ANOVA does handle that issue.


[Regina]
But let's be clear about what the ANOVA itself is doing.


[Kristin]
Right, ANOVA lets you compare all three conditions at once.


So let's say you are looking at running economy, for example. From the ANOVA, you might get an overall p-value of 0.02, statistically significant. This would tell you that at least one of the conditions differed on running economy, but it doesn't tell you specifically which conditions differ.


[Regina] (29:21 - 29:32)
Right, so if you want to conclude something like the 120 condition is significantly different from the 90 condition, that would need to be a separate step beyond the ANOVA.


[Kristin] (29:33 - 29:46)
To make those kinds of specific claims, you are going to have to compare each pair. And there are three comparisons then, right? There's 60 versus 90, 90 versus 120, and 60 versus 120.


So you're doing multiple testing, three tests.


[Regina] (29:47 - 30:02)
We talked about multiple testing and the multiple testing dude in our Stats Reunion episode. And with multiple testing, you increase the risk of getting a false positive in your conclusions. Not good.


[Kristin] (30:03 - 30:25)
Right, so you need to adjust for that. And fortunately, most statistical software already has this built in. It automatically does the adjustment for you.


So in this case, they use what's called a Holm adjustment. And what that adjustment does is it increases the p-value, which makes it harder to reach statistical significance. In other words, it's being more stringent to help guard against false positives.


[Regina] (30:25 - 30:41)
Right, but that only handles testing of multiple conditions within a single variable, within running economy, whatever. It doesn't account for the fact that they tested a lot of different outcome variables, all those variables you mentioned.


[Kristin] (30:42 - 30:58)
Yeah, this is a very important point, Regina, because they looked at the rate at which you were burning carbs from the drink, the rate at which you were burning carbs overall, the rate you were burning fat, heart rate, glucose, lactate, running economy, and so on. And they didn't account for this source of multiple testing, the multiple outcomes.


[Regina] (30:58 - 31:18)
Right, it's a subtle difference, really. But it means when you're reading a paper and you see them say, we adjusted for multiple comparisons, we don't want to be fooled into thinking that the multiple testing problem, the big overall problem, is automatically totally taken care of.


[Kristin] (31:18 - 31:21)
Exactly, always want to keep that in the back of your mind.


[Regina] (31:21 - 31:25)
Okay, that's the stats. Check, what about the results, what did they find?


[Kristin] (31:25 - 31:39)
All right, they reported that as you go from 60 to 90, and as you go from 90 to 120 grams per hour of carbohydrates, you do indeed burn more carbohydrate from the drink, and you rely less on burning internal fat for fuel.


[Regina] (31:40 - 31:46)
Okay, but this is not a surprise, right? Isn't that what they found in earlier studies in cyclists?


[Kristin] (31:46 - 32:12)
Yeah, that's right, but this is the first study in runners, and fueling is more challenging in runners than in cyclists, so I still think it's an important finding. But Regina, the reason I think this got so much press is that some media outlets strayed a bit from the actual findings in the paper. Some of the media coverage claimed that as you went from 90 to 120, you were also improving running economy, but that was not quite right.


[Regina] (32:12 - 32:14)
Oh, explain what they got wrong there.


[Kristin] (32:14 - 32:58)
Yeah, so in the body of the paper and in the abstract, the authors were clear that running economy was significantly better in the 120 condition compared only with the 60 condition, but the 120 and 90 conditions in terms of running economy, those were not statistically distinguishable. But it's actually not surprising to me that the media got a little bit confused because the title of this paper is misleading, and I'm going to read it to you. The title is, Carbon-13 Labeled Glucose Fructose Shows Greater Exogenous and Whole Body Carbohydrate Oxidation and Lower Oxygen Cost of Running at 120 Versus 60 and 90 Grams per Hour in Elite Male Runners.


[Regina] (32:59 - 33:00)
That is a mouthful of a title.


[Kristin] (33:01 - 33:02)
That is complicated, yes.


[Regina] (33:02 - 33:30)
Really long title, so let's parse some of that apart. So we are going to focus on the part where they're talking about lower oxygen cost of running because that's running economy, right? And they are saying lower oxygen cost of running, so better running economy at 120 versus 60 and 90, but you just told me that is not true.


[Kristin] (33:30 - 33:42)
Yeah, that title says that the 120 condition beat the 90 condition for running economy, but the data in the paper do not show that. So someone screwed up on the title and that may have confused the media.


[Regina] (33:43 - 33:54)
Huh, and this is the part that was most exciting and novel, right? So it looks like the press latched onto the sexiest finding and it turns out just to be a phantom.


[Kristin] (33:55 - 34:06)
Exactly. The other thing is that the paper showed that there was more stomach discomfort in the 120 gram condition compared with the 90 condition, at least on a few of the stomach variables.


[Regina] (34:07 - 34:16)
Stomach problems, not good. So overall, it's feeling like the benefits of this 120 grams per hour, not entirely clear.


[Kristin] (34:16 - 34:38)
Yeah, I wouldn't rush out of this paper and say, OK, now everybody should be taking 120 grams per hour. But Regina, there's more to the story and this is where Michelle Hummel comes in, our mathematician friend, because she noticed some inconsistencies in the paper and she contacted us and that led us both to do some extensive statistical sleuthing which is what we're going to talk about next.


[Regina] (34:39 - 35:08)
Oh, excellent, but let's take a short break first.


Welcome back to Normal Curves. Today we're discussing a study on high carb fueling in marathoners.


[Kristin] (35:08 - 35:33)
All right, now we're going to talk about some data issues in the 2025 paper. And again, these were first noticed by one of our listeners, Michelle Hummel. She's a mathematician at Sandia National Laboratories and she did a lot of the initial statistical sleuthing.


So what she noticed is that the Article in Press, which was published early last November, that differed from the final formatted manuscript that was published later in 2025.


[Regina] (35:34 - 35:44)
OK, let's stop and talk for a moment, though, about Articles in Press, Kristin, and why it would be surprising to find differences between an Article in Press and the final version of the paper.


[Kristin] (35:45 - 36:00)
Yeah, I mean, an Article in Press is just an early release of the paper. After the paper has been accepted for publication, the journal may release an unformatted version of the paper online so people can access it without waiting for the official version.


[Regina] (36:00 - 36:14)
Right, the official version is formatted for publication. It looks pretty, but it's not supposed to be different in content from the Article in Press because that Article in Press version is the version that was peer-reviewed and accepted.


[Kristin] (36:15 - 36:22)
Exactly. You shouldn't be changing anything substantive post-peer review without calling out what was changed.


[Regina] (36:22 - 36:34)
OK, so Michelle found differences between the two versions of the paper, which is unusual and worrisome. And tell us about what actually changed then.


[Kristin] (36:34 - 37:09)
Michelle noticed that the running economy statistics completely changed between the Article in Press and the final version, and what's more, the numbers within each separate version were not internally consistent, meaning that the paper gave different numbers in the abstract than in the main text, and those numbers, in turn, did not match the values displayed in the figures. I independently verified what Michelle had noticed. She totally nailed it.


And that led me to spend a whole weekend checking all the statistics in the paper, and I found additional inconsistencies.


[Regina] (37:09 - 37:23)
OK, so this is pointing to some problems with their statistical workflow. And we'll put the full details in the show notes, but how about for now, Kristin, you just give us some quick, simple examples of the problems.


[Kristin] (37:23 - 37:43)
Sure. Let's start with running economy. So the p-value comparing the 60-gram condition to the 120-gram condition for running economy, that p-value changed between the article in Press abstract to the article in Press text to the final paper abstract to the final paper text.


[Regina] (37:44 - 37:52)
OK, so Kristin, four different p-values. Yes. For what should be just the same number, one single number.


[Kristin] (37:53 - 38:04)
Exactly. All the p-values were statistically significant at 0.05, but the fact that there are four different p-values for the same thing tells us that they are having trouble keeping track of their data.


[Regina] (38:04 - 38:17)
Oh, yeah, not good at all. Not only is it sloppy, but the thing is we have no idea now what is right, and we're kind of losing trust in the integrity of the study at this point.


[Kristin] (38:18 - 38:32)
Another issue I noticed that is a little more subtle, Regina, is that the p-values reported did not always match the confidence intervals reported. And I want to point this out because it's another important tool for the statistical sleuthing toolkit.


[Regina] (38:32 - 38:48)
Right. Oh, this is fascinating because p-values and confidence intervals, they both come from the same underlying math. They're essentially two sides of the same math coin.


So they absolutely need to correspond.


[Kristin] (38:48 - 38:57)
Yeah, exactly. And you can check if they do. So you can take a confidence interval and mathematically work back what the corresponding p-value should be.


[Regina] (38:57 - 39:08)
And even if you cannot do the math yourself or maybe you just don't want to do the math yourself, your favorite generative AI chatbot can happily do this for you very easily.


[Kristin] (39:08 - 39:31)
Yes, AI is really good at this. And this is a great internal check. So for example, in this paper, one of the confidence intervals for running economy was given as negative 2.85 to 10.23. That confidence interval corresponds to a p-value of about 0.22. But they reported a p-value for this comparison of 0.097.


[Regina] (39:31 - 39:43)
Ouch, big difference. So Kristin, we tried to backtrack and figure out, oh, was this just a problem that came from multiple testing adjustment that we had talked about before? But we could not find any scenario that would explain it.


[Kristin] (39:43 - 40:04)
Yeah, exactly. Regina, now I want to talk about another type of inconsistency that I noticed and another thing that you should check when you are statistical sleuthing. So I extracted the coordinates of every data point from the figures in the paper.


And I checked those numbers against the numbers reported in the text. And I found that in many cases here, they didn't match.


[Regina] (40:04 - 40:08)
Ooh, that is a problem. How about you give us a concrete example?


[Kristin] (40:09 - 40:50)
Sure, so let's look at running economy again. As we talked about, running economy is measured as oxygen uptake or oxygen cost. The units are milliliters of oxygen per kilogram of body weight per kilometer.


In the text of the article in press, they report that the 90 and 120 gram conditions differ by 3.2 on this measure. But in the figure in the article in press, that difference is half of that, just 1.6. Then in the text in the final paper, they report a difference for that same comparison of 4.42, while the corresponding figure has a value of 4.30. So these don't match even if you account for like rounding error.


[Regina] (40:52 - 41:02)
But Kristin, can you explain now for listeners how you can get such precise values just from what, are you looking at the graph?


[Kristin] (41:03 - 41:17)
No, Regina, I wasn't just looking at the graph. My eyes are not that precise that they can eyeball to the second decimal place. Also, I didn't just take out a ruler, which I'm sure all of you have done before, right?


You can take out a ruler and try to go back to the x and y-axis. I didn't do that either.


[Regina] (41:17 - 41:35)
I think a lot of people might not know their tools that you can use that actually allow you to extract these very precise values, tools that is not just a ruler. Yes. So let's take a little statistical detour here and explain what these cool tools are all about.


[Kristin] (41:35 - 41:53)
Yeah, there are tools on the computer that are super important in the Statistical Fluids Toolkit. And here I used two different tools, Web Plot Digitizer and graph2table. Both tools figure out the x and y coordinates of every data point on a graph, basically giving you access to the underlying data.


[Regina] (41:54 - 42:02)
Right, the underlying data, which we often need and want, but unfortunately the authors don't always provide. Right, but this is a sneaky way to get it anyway.


[Kristin] (42:02 - 42:27)
Now, Web Plot Digitizer is fantastic and I've been using it for years, but it requires some manual work. So it does take time and I do get slightly different answers depending on how careful I am being in a given day. But for this paper, there were a lot of figures to check and I want it to be really precise.


So I tried a new tool called graph2table and Regina, it was like love at first sight.


[Regina] (42:29 - 42:31)
Kristin, that is strong language for you.


[Kristin] (42:32 - 42:37)
Regina, I am officially dumping. WebPlotDigitizer, I have a new statistical sleuthing boyfriend.


[Regina] (42:38 - 42:41)
Oh, this might be your soulmate right here.


[Kristin] (42:41 - 43:21)
This tool is unbelievable. Okay, it uses AI and you can upload a whole bunch of figures at once. You don't have to do this one at a time.


There is zero manual work and not only does it extract the data automatically, it also extracts the variable names from the access labels so it makes a complete ready-to-go dataset which you can just download and immediately analyze. Also, I stress tested it. I uploaded different versions of the same figure multiple times and it was incredibly consistent from run to run.


So it takes out all the guesswork and Regina, you have no idea. This is going to save me so much time in my statistical sleuthing.


[Regina] (43:21 - 43:23)
You know, I can just hear the love in your voice.


[Kristin] (43:25 - 43:26)
I'm very excited about this.


[Regina] (43:27 - 43:48)
You are very excited. I think this is the threshold now that many need to aspire to want to be with you. Can you top this statistical tool?


Okay, so after you told me about this, I went and I tried it and you were right. It's kind of like magic. This is the magic boyfriend.


It gives you everything you want.


[Kristin] (43:50 - 44:03)
For our listeners, these tools are both available online. Webplotdigitizer, just as it sounds. WebplotDigitizer, all one word.


graph2table is graph, the numeral two, and then table, all one word.


[Regina] (44:04 - 44:16)
And let's just talk about money. So Kristin, Webplotdigitizer, it's more work, but it's totally free. Whereas the graph2table gives you some free uses, but then they do charge a fee.


[Kristin] (44:17 - 44:46)
Yeah, unfortunately, no boyfriend is perfect. In this case though, Regina, I had so many figures to check that I did end up spending a few dollars, but it was totally worth it. And Regina, I was so enamored of graph2table that I actually reached out to the software's creators.


And they were so nice and agreed to give a discount for our listeners, 20% off. I am super excited about this. We'll put this on our website and in the show notes, but the discount code is normalcurves20


That's all lowercase.


[Regina] (44:47 - 45:09)
Full disclosure before we go on, we are super excited. The graph2table is now an affiliate partner of Normal Curves. So if you use this code, you get a 20% discount and we get a cut.


And anything from the cut is just gonna help us keep the lights on around here. But this is kind of an organic affiliation, I have to say.


[Kristin] (45:09 - 45:11)
Absolutely. This is just a win-win, Regina.


[Regina] (45:12 - 45:21)
So back to the paper, Kristin, away from the magic boyfriend. This is how you realized that the numbers in the text were different than the values in the figures.


[Kristin] (45:21 - 45:47)
Exactly. And the issue wasn't limited to just the running economy variable. So for example, they looked at the total number of carbohydrates burned in the run measured in grams.


In the text, the mean values for the three conditions are reported as 250, 295, and 368 grams. But the means displayed in the corresponding figure are 286, 336, and 414 grams.


[Regina] (45:48 - 45:51)
Okay, those are really different, which is really worrisome.


[Kristin] (45:52 - 46:06)
Yeah, you can pick that one up by eyeballing, by the way. Also, they highlight in the abstract that the 120 condition was burning 1.68 grams per minute of carbohydrates from the drink during the final hour of the run.


[Regina] (46:07 - 46:15)
Which is the most important variable, right? We talked about it earlier. It's the primary outcome.


That's what we're trying to improve by giving runners more carbs.


[Kristin] (46:15 - 46:16)
Exactly.


[Regina] (46:16 - 46:28)
But in the figure, the value is 1.64 grams per minute, not 1.68. And you've ruled out that this could be just like a rounding error or data extraction error.


[Kristin] (46:29 - 46:44)
Yeah, I'm pretty sure that this is an actual error. Now, 1.64 versus 1.68, that might seem like a trivial difference, but that's the primary outcome. And when it's different in two different places, that tells you that something went wrong in their workflow.


[Regina] (46:45 - 46:54)
Kristin, in the past, we've called these kinds of errors statistical cockroaches. Why? Because very often when there is one, there is many.


[Kristin] (46:54 - 47:16)
Yes, this paper unfortunately had cockroaches. But Regina, this is not at all unique to this paper. These kinds of errors show up all the time.


And I want to talk about how situations like this arise because there are real lessons here. And that's what is going to lead us into our favorite part of this episode, the happy ending.


[Regina] (47:17 - 47:20)
The favorite part of everyone's episode, the happy ending.


[Kristin] (47:21 - 47:24)
Oh, you did not go there, Regina. I'm talking fairy tales, not sex.


[Regina] (47:25 - 47:27)
Not that kind of happy ending.


[Kristin] (47:29 - 47:43)
But thank you for bringing the sex into this episode because we had not done that yet.


[Regina]
It had been a little while. It had been a little while.


[Kristin]
The happy ending here, though, Regina, is that they didn't just ignore the cockroaches and let them chew through the house until everything collapsed.


[Regina] (47:44 - 47:48)
I think you might be mixing your metaphors, Kristin, at termites.


[Kristin] (47:51 - 48:28)
You're right. I'm thinking of termites. Oh my gosh, sorry.


Right. Cockroaches don't collapse your house. They don't eat the wood.


That's termites. They just make you want to move out. Maybe we should call them statistical termites, though, for this episode, because that metaphor, I think, works better here.


[Regina]
Oh, I like it. Statistical termites.


[Kristin]
Now, the journal where this was published, the Journal of Applied Physiology, I am going to call them out because they chose to ignore the statistical termites or cockroaches because Michelle Hummel, our mathematician friend, she sent a letter to the editor pointing out some of these errors, but the journal rejected that letter.


[Regina] (48:29 - 48:47)
So, this is awful, but it really does happen because journals don't always want to deal with errors or admit mistakes, especially, I mean, really, when some of these issues should have been caught during peer review. So, I agree. The journal has some responsibility.


[Kristin] (48:48 - 49:02)
Oh, absolutely. And in this case, the journal essentially buried its head in the sand, but I am happy to say that the authors did not because after I documented these inconsistencies, I reached out directly to the corresponding author and I shared my findings.


[Regina] (49:02 - 49:08)
Your findings, which were very complete and very detailed in a multi-tabbed spreadsheet.


[Kristin] (49:09 - 49:32)
Yes, I am nerdy, yes. And Regina, this is probably not the most fun email that this corresponding author has ever received, but instead of ignoring me or getting defensive, which is often what happens in these kinds of situations, instead, he wrote back right away, was incredibly gracious, and we set up an online meeting with their team.


[Regina] (49:33 - 49:40)
This kind of gracious response is rare. They're the exception and we love this, frankly. We do, yeah.


[Kristin] (49:40 - 50:10)
So, when Michelle and you and I talked with them, what we learned happened is something that we believe happens in a lot of applied fields like sports science. You know, the researchers were very focused on figuring out the correct statistical tests for their data and just getting those tests to run. And by focusing too much on that piece, they neglected the most important part of a data analysis, which is what comes before formal statistical tests.


And this is data entry, processing, cleaning, and checking.


[Regina] (50:11 - 50:25)
All of that before stuff, it sounds kind of dull and trivial and certainly a whole lot less exciting than getting, whoo, a p-value, but it's actually the foundation of good data analysis.


[Kristin] (50:26 - 50:35)
Yeah, Regina, when I'm doing primary data analysis, data processing and checking is about 90% of what I do. It's like 90% of the code I write.


[Regina] (50:35 - 50:53)
I always think of it like those TV shows, you know, where they do forensic science and they have the DNA answer in like 10 minutes after the crime. And it all seems glamorous and easy, but in reality, that process probably takes like weeks of careful, boring attention to detail.


[Kristin] (50:54 - 51:08)
That is a great analogy, Regina. And now I want a TV show featuring statisticians where someone hands us the data and we go, voila, aha, here's the answer. And it's all exciting and it makes statistics seem really glamorous.


[Regina] (51:09 - 51:28)
And then we pop open the champagne and then Brad Pitt comes over and he gently takes off our glasses. Oh, so gently. And he says, wow, staring intently into our eyes.


I never realized how incredibly beautiful you are.


[Kristin]
I see where your mind is going.


[Regina] (51:29 - 51:42)
Okay, I'll stop.


[Kristin]
But, you know, I bet those kinds of forensics TV shows got a lot of people to go into the field of forensics. We could do the same for statistics, Regina.


[Regina] (51:44 - 51:49)
Sorry, Kristin, don't want to burst your bubble, but I don't think anyone's going to steal this one. Nope.


[Kristin] (51:51 - 52:17)
Hollywood's not going to come chasing after us. Okay, truly, yes, statistics doesn't exactly make for exciting TV. I multiplied the expired oxygen by the wrong constant. Cue Brad Pitt. Bottom line though, Regina, is that careful data processing and checking is where most of the work happens and it might not be exciting, but it is where things often go wrong.


[Regina] (52:17 - 52:25)
Kristin, when we talked with the researchers, we learned that they had used Excel for both data entry and processing.


[Kristin] (52:26 - 52:40)
Right, and Excel is not the appropriate tool for data management. It lets you overwrite values, duplicate files, and lose track of what changed and when. And you can accidentally end up with more than one version of the dataset running around.


[Regina] (52:41 - 52:56)
Right, and the best workflow really is to start by using a program that is designed for all of this, for collecting study data and storing it and tracking it. Programs like RedCap or Qualtrics, they make sure that you only have one version of your data.


[Kristin] (52:57 - 53:14)
Yeah, and you should also never process data in Excel. I'm talking about things like making new variables from the raw data, because that involves a lot of writing formulas in Excel cells and cutting and pasting and moving things across tables or spreadsheets, and it's really easy to introduce errors.


[Regina] (53:14 - 53:25)
Yep, for processing data, really the best thing to do is do this in code in a statistical analysis program, something designed for this like R or SAS.


[Kristin] (53:25 - 54:12)
There are so many advantages to doing this in code, Regina. You are much less likely to inadvertently introduce an error. You have a clear record of everything that was done.


So if an error is ever found, it's easy to identify the source of the error and to fix it. Also, it just makes your life easier. Let's say you have to change something, like add new data or update a raw data point or change a formula you're using.


If everything is done in code, this update takes like a second. If you did everything manually in Excel, you're gonna have to repeat everything manually, and that takes time and also gives you another opportunity to introduce errors. Also, data checking is really hard to do in Excel, but easy in a statistical analysis program.


So you're much more likely to miss errors if you're working in Excel. Bottom line, never use Excel for any steps in data handling or processing.


[Regina] (54:12 - 54:25)
That's just kind of a good rule of thumb right there. Actually, Kristin, in our clinical trials class on Stanford Online, we spent a whole unit talking about proper data management using these kinds of tools we've just mentioned.


[Kristin] (54:25 - 54:32)
Yeah, that's a great course, Regina, if we say so ourselves. But I do want to refer people there if they want more details for how to do all of this correctly.


[Regina] (54:33 - 54:44)
Right. So getting back to our meeting with the research team, they were very candid that they felt intimidated by statistics and wish they had more formal training in the area.


[Kristin] (54:45 - 55:07)
You know, Regina, they are not alone. I think this describes a lot of teams in applied areas like sports science. Side note, Regina, I can totally put myself in the shoes of people who are trying to do their own statistics without adequate training because lately, I have been trying to make minor repairs to my house myself and I lack adequate training in that.


[Regina] (55:07 - 55:09)
Oh my, like what, for example?


[Kristin] (55:09 - 55:20)
Wiring fire alarms into the ceiling, attaching towel racks to the wall. And I feel dumb and frustrated and I swear a lot when I have to do these tasks.


[Regina] (55:20 - 55:22)
Oh, and you rarely swear, Kristin.


[Kristin] (55:23 - 55:44)
True, but because I have no training in these things, I don't have a good intuition on like how things fit together or how the tools work or what to do if things go wrong. So I really feel for people who are doing statistics like I'm doing house repairs. It's frustrating and I am intimidated by drill bits and wire caps.


[Regina] (55:44 - 56:09)
Yeah, we have empathy. But I think one unfortunate thing that happens when people are intimidated by stats is that they stop trusting their basic intuition about numbers. Like they don't feel confident in double checking the results or interpreting the results.


And Kristin, you have an example of a funny thing happening with that here.


[Kristin] (56:09 - 56:28)
Yeah, in this case, a reviewer asked the researchers to add effect sizes to the paper, which is very reasonable. But for the gut distress data, those were categorical data. So the researchers decided to add odds ratios to the paper and they ended up reporting odds ratios of infinity in the paper.


[Regina] (56:28 - 56:42)
So you can't have an odds ratio of infinity. That's just telling you that some of the categories you're looking at have zero people in them. And it just means you can't estimate the odds ratio, not that it's infinity.


[Kristin] (56:43 - 56:56)
And Regina, I'm blaming the peer reviewer for this one because they asked for those effect sizes and they're the ones who should have checked in the revision and should have noticed those values of infinity and told the authors to take them out.


[Regina] (56:56 - 57:09)
Yeah, yeah, absolutely. But Kristin, you know, I really enjoyed meeting with the authors of this paper. It was really enjoyable.


It was. First of all, they said they liked the podcast, right? Which didn't hurt.


[Kristin] (57:09 - 57:11)
Well, flattery always works, Regina.


[Regina] (57:12 - 57:38)
Yeah, but even more than that, they were very transparent, I think, and curious. And they genuinely seemed to want to understand how to fix this current paper and how to improve in the future. And, you know, the corresponding author admitted that he was quite nervous and he said he was wearing two layers so we couldn't see that he had completely sweated through the first layer.


[Kristin] (57:39 - 58:33)
Which makes complete sense because being told that you have errors in a published paper is incredibly nerve-wracking. And I want to give them an A++ for bravery and for owning their errors. Regina, happy ending here is that after talking with us, they went back through all of their data.


It took some digging because they indeed had multiple versions of the data floating around in different places, but they did their own sleuthing and figured out what happened. They had calculated some of their values incorrectly somewhere back in the data processing chain, but they didn't realize they had errors until around the time that the Article in Press version came out. And I think in their haste to fix things before the final version came out, they only corrected the running economy variable when in fact it turned out that most of the key variables in the paper needed to be fixed.


[Regina] (58:34 - 58:54)
Right. These variables are interrelated. They're all derived from the same underlying measurements, right, of oxygen in and carbon dioxide out.


So fixing just one variable will still leave a bunch of these other derived variables still messed up and incorrect.


[Kristin] (58:54 - 59:31)
Exactly. So, Regina, they did go back to the original individual level data as it was collected, and they re-computed everything from scratch. So we can trust the numbers now. And a lot of the numbers did change, but fortunately the main findings did not change in any material way.


The 120 was better than 90 for burning more carbs from the drink, but the 120 condition did not beat the 90 condition in terms of running economy. And Regina, they are going to formally correct the paper with the journal, and they are also planning to fix their whole statistics workflow to avoid these kinds of mistakes in the future.


[Regina] (59:32 - 59:39)
Right, which is super important for many reasons, but especially because they have another paper that's in the works that's on women.


[Kristin] (59:40 - 59:55)
Right. There have been almost no studies on this topic in women. So this is very exciting.


It's a paper on women marathoners, and we cannot wait to find out how this turns out. But, Regina, I am sure that that paper is going to be double-checked and triple-checked and it's going to be rock solid.


[Regina] (59:56 - 1:00:11)
Absolutely. Kristin, I think that overall, we can count this one as another Cinderella story. The researchers, they really grow and they turn into wonderful things at the end, just like in the Red Dress episode.


Same thing.


[Kristin] (1:00:11 - 1:00:47)
Absolutely. And Regina, you know, I did feel a little bad having to deliver this bad news to them, but I'd like to think that maybe we saved them from a worse fate in the long run, right? So this kind of thing has a way of coming back to bite you.


I'll just give one case in point. The president of Stanford University in my backyard, Mark Tessier Lavigne, a few years ago, he had to resign his presidency because of errors spotted in papers that he had written decades before. So, Regina, as I always tell my kids, it's never the mistake that gets you, it's the cover-up.


So, you know, don't lie to mom.


[Regina] (1:00:49 - 1:01:01)
Okay, mom. Mom, I think we're ready to wrap up then. And rate the strength of evidence for the claim here.


Kristin, can you repeat the claim that we're evaluating? Right.


[Kristin] (1:01:01 - 1:01:23)
The claim was that high-carb fueling during long-distance endurance events will help you race faster. And how do we rate the strength of evidence for claims in this podcast with our highly scientific one to five smooch rating scale, where one smooch means little to no evidence for the claim and five smooches means very strong evidence for the claim. So, Regina, kiss it or diss it?


[Regina] (1:01:23 - 1:01:33)
So, Kristin, I think I'm going to have to give this one two smooches, just because it seems like there's not a whole lot of evidence on this performance thing.


[Kristin] (1:01:34 - 1:02:20)
Yeah, Regina, I'm with you on this. There just isn't a lot of performance data. I think that's the bottom line.


A lot of the studies have focused on carbohydrate metabolism. It is biologically plausible that if you can take in more carbohydrates and use them during your run or swim or bike, that that might lead to better performance. But it hasn't really been demonstrated with evidence.


Even this study here didn't have any performance metrics. And the closest thing running economy, there wasn't a difference between the 120 condition and the 90 condition. So, we need more data.


I'm going to go though 2.5 smooches, Regina, because I think it has some biological plausibility. And also, I'm throwing in an extra half smooch bonus point here because of how well they handled the whole situation.


[Regina] (1:02:21 - 1:02:24)
Okay, sounds good. What about methodological moral?


[Kristin] (1:02:25 - 1:02:42)
Well, Regina, I already kind of gave one, though it wasn't specific to statistics. And that was that it's never the mistake that gets you, it's the coverup. I'll add one along the same lines for statistics.


Everyone makes statistical mistakes, not everyone fixes them.


[Regina] (1:02:42 - 1:02:48)
Ooh, I love that one. It is both descriptive and prescriptive at the same time.


[Kristin] (1:02:49 - 1:02:50)
It is, yes.


[Regina] (1:02:50 - 1:02:50)
Yeah, beautiful.


[Kristin] (1:02:51 - 1:02:52)
How about you, Regina?


[Regina] (1:02:52 - 1:03:12)
Hmm, will you give me two on this one, Kristin?


[Kristin]
Oh, yeah, of course. I kind of had two.


[Regina]
Okay, thank you. Because I feel like there's so much to learn from this episode. First one I want to talk about is about Excel.


If the numbers aren't consistent, Excel is often part of the story.


[Kristin] (1:03:12 - 1:03:15)
Oh, that is so true. Yes, very good.


[Regina] (1:03:16 - 1:03:27)
Okay, second one talks about that whole internal inconsistency. So, this one. If a p-value doesn't survive the trip from text to figure, there's a problem.


[Kristin] (1:03:28 - 1:03:40)
I love that, Regina. And now I'm picturing in my head, like, p-values fainting and dying along the way as they migrate from the abstract to the text to the figure. They didn't make it.


[Regina] (1:03:41 - 1:04:00)
Oh, it's like an exodus, the incredible journey. So, this was a lot of fun, Kristin. I'm not sure this is going to change how I fuel when I run my six-minute-mile pace and marathons next week.


[Kristin] (1:04:02 - 1:04:10)
Regina, I feel like we've got to wait and see what happens with the female paper that's coming out, because that might apply to us more than the paper in men.


[Regina] (1:04:11 - 1:04:21)
I'm hoping it comes out and says you need a lot of pizza and french fries before, during, and after the race. Chocolate cake. And then maybe just skip the race, right?


[Kristin] (1:04:23 - 1:04:27)
Chocolate cake is easier to digest than pizza or french fries, you know, actually, while you're running. Just so you know.


[Regina] (1:04:27 - 1:04:32)
You need a layer. Just like you said, you know, you're laying the fructose. I'm going to layer.


[Kristin] (1:04:33 - 1:04:45)
Frosting. I swear, those gels, they're basically just frosting. So, just take a jar of frosting.


You can probably cram in a lot of carbs from frosting, but frosting might have fat too. I'm not sure. Maybe.


Might not be all carbs.


[Regina] (1:04:45 - 1:04:46)
I'm hungry.


[Kristin] (1:04:46 - 1:04:49)
All right. This has been fun, Regina.


[Regina] (1:04:49 - 1:04:51)
Thanks, Kristin. Thanks everyone for listening.