The Batman Effect: Do weird surprises make people nicer?
Description
Nobody expects Batman—but when he shows up in a crowded subway car, are people suddenly more likely to help a passenger in need? This week on Normal Curves, we unpack a recent quasi-experimental field study involving a caped superhero costume, a prosthetic pregnancy belly, and some puzzled Italian commuters. Along the way, we demystify three common ways of describing effects for binary outcomes—risk differences, risk ratios, and odds ratios—and explain what they actually mean in plain language. We also do some statistical sleuthing, uncover a major problem hiding in the paper’s numbers, and debate what really counts as an effective Batman outfit.
Statistical topics
- absolute vs relative effects
- binary outcomes
- coding errors
- data errors and quality control
- effect size interpretation
- field experiments
- odds
- odds ratios
- percentage differences
- quasi-experimental studies
- risk differences
- risk ratios
- statistical sleuthing
Methodological morals
- “We love an uncluttered paper, but when it's missing the basics, it's like an empty fridge. Clean, yes, but dinner is not happening.”
- “Before you make a fancy model, make sure the numbers in the table in the text match.”
References
- Pagnini F, Grosso F, Cavalera C, et al. Unexpected events and prosocial behavior: the Batman effect. Npj Ment Health Res. 2025;4(1):57. Published 2025 Nov 3. doi:10.1038/s44184-025-00171-5
- PubPeer. Comments on “Unexpected events and prosocial behavior: the Batman effect.” Accessed December 2025.
- Sainani KL. Understanding odds ratios. PM R. 2011;3(3):263-267. doi:10.1016/j.pmrj.2011.01.009
- Nuzzo RL. Communicating measures of relative risk in plain English. PM R. 2022;14(2):283-287. doi:10.1002/pmrj.12761
- Sainani KL. How statistics can mislead. Am J Public Health. 2012;102:e3-4.
Kristin and Regina’s online courses:
Demystifying Data: A Modern Approach to Statistical Understanding
Clinical Trials: Design, Strategy, and Analysis
Medical Statistics Certificate Program
Epidemiology and Clinical Research Graduate Certificate Program
Programs that we teach in:
Epidemiology and Clinical Research Graduate Certificate Program
Find us on:
Kristin - LinkedIn & Twitter/X
Regina - LinkedIn & ReginaNuzzo.com
- (00:00) - Intro
- (03:42) - Why would Batman make people nicer?
- (07:33) - How they ran the experiment
- (17:06) - Did Batman save the day? Different ways to answer that
- (22:16) - What are odds and odds ratios?
- (29:16) - Where people get it wrong
- (34:08) - The plot twist: big numerical errors
- (40:36) - Did men or women give up their seat more often?
- (43:05) - Wrap-up and methodological morals
00:00 - Intro
03:42 - Why would Batman make people nicer?
07:33 - How they ran the experiment
17:06 - Did Batman save the day? Different ways to answer that
22:16 - What are odds and odds ratios?
29:16 - Where people get it wrong
34:08 - The plot twist: big numerical errors
40:36 - Did men or women give up their seat more often?
43:05 - Wrap-up and methodological morals
[Regina] (0:00 - 0:04)
I think we need to first explain what an odds actually is.
[Kristin] (0:04 - 0:43)
Right, because the only place we usually see odds in real life is in gambling. And Regina, do you gamble?
[Regina]
Just with my love life.
[Kristin]
Welcome to Normal Curves. This is a podcast for anyone who wants to learn about scientific studies and the statistics behind them. It's like a journal club, except we pick topics that are fun, relevant, and sometimes a little spicy.
We evaluate the evidence, and we also give you the tools that you need to evaluate scientific studies on your own. I'm Kristin Sainani. I'm a professor at Stanford University.
[Regina] (0:44 - 0:49)
And I'm Regina Nuzzo. I'm a professor at Gallaudet University and part-time lecturer at Stanford.
[Kristin] (0:50 - 0:55)
We are not medical doctors. We are PhDs, so nothing in this podcast should be construed as medical advice.
[Regina] (0:55 - 1:00)
Also, this podcast is separate from our day jobs at Stanford and Gallaudet University.
[Kristin] (1:01 - 1:24)
Regina, this week we're doing a very recent study from November 2025. It was what's called a quasi-experimental field study, which is just a fancy way of saying they experimented on real people in the real world. And here's the setup.
A woman with a fake pregnant belly and Batman walk into a crowded subway car in Italy.
[Regina] (1:25 - 1:31)
Which sounds like the start of a bad joke, so please tell me. And then they run into a bartender.
[Kristin] (1:33 - 1:36)
Sadly, no, but we do run into some data.
[Regina] (1:37 - 1:41)
Okay, a data bartender. That should be a thing.
[Kristin] (1:41 - 2:01)
It should be.
Regina, though, I think it might be fun to start here with some of the headlines, because this study did get some headlines, as you might imagine. And here's one that I found. It says, Seeing Batman on a Milan train doubled acts of kindness.
A new study shows surprise can wake attention and nudge people to help.
[Regina] (2:01 - 2:56)
Who can resist a headline like that? I mean, come on. Kristin, I like this one from the Daily Express.
Batman inspires grumpy subway passengers to be better people. Appearance of the Dark Knight had a remarkable impact on the usually testy commuters. And actually, as someone who frequently rides the metro in our nation's fine capital, I can definitely attest to commuters being testy and grumpy.
[Kristin]
I bet you can.
[Regina]
We're going to talk about the whole strange experiment in a moment, Kristin, but first let's state the claim that we'll be looking at today, which is the presence of an unexpected event, like a person dressed as Batman, will make people behave more altruistically, like offer their seat to a pregnant woman on a crowded metro.
[Kristin] (2:56 - 3:03)
Basically, the claim is that weird, unexpected things in public spaces make people nicer.
[Regina] (3:03 - 3:08)
Nicer. We could all use a bit of nicer, I think, right now, couldn't we?
[Kristin] (3:08 - 3:09)
We absolutely could.
[Regina] (3:09 - 3:18)
Okay, in addition to checking out this clickbait feel-good news, we are also wading into some classic statistics.
[Kristin] (3:18 - 3:25)
Oh, yeah. Odds versus risks, odds ratios versus risk ratios, and a good old-fashioned chi-square test.
[Regina] (3:25 - 3:42)
Kristin, don't forget the plot twist, though, because it's not just Batman that's unexpected. The numbers themselves do something weird and unexpected in this paper. But we'll save the tea on that one for when we get to the results.
Yeah, don't give it away yet, Regina.
[Kristin] (3:42 - 3:52)
Okay, first some background. Let's talk about why anyone would even think that Batman on the metro would make people nicer to a pregnant woman.
[Regina] (3:52 - 4:31)
Right? I find this kind of hilarious. But there's actually a lot of interesting studies out there on the idea that a person's environment can nudge them to behave differently, even if they don't notice the environmental cues.
And I love these studies. So, first of all, there's the classic eyeball study. I love this one, where people suddenly remembered to pay for their office coffee when there was a poster of eyeballs on the wall versus a picture of flowers or something.
And it turns out they contributed three times as much to the coffee when they had these eyeballs staring straight at them.
[Kristin] (4:31 - 4:53)
That is a great study. There's also a really interesting one from a wine shop where they played French accordion music. And guess what?
People bought more French wine. But when they played German Oompa Band music, people bought more German wine, even though no one noticed the music.
[Regina] (4:53 - 5:16)
How could people not notice the music there?
So I like the ones where they found that people acted more generously with strangers when they were primed with romantic love symbols. Like there was one where they got them walking down a street in France that's called Valentine Street versus a normal street. And on Valentine Street, people helped strangers pick up things they dropped more often.
[Kristin] (5:17 - 5:23)
That's kind of a cool study, Regina. But I have to say, Batman, not a romantic cue, I don't think.
[Regina] (5:23 - 5:48)
Oh, no, no, no. Come on, Kristin. You know that there is a site for this somewhere on the internet.
And Batman Boyfriend. This is absolutely not hard to find. It'd be like a second.
I mean, think about Batman. He's got a cape. He's got tights.
He's got a weird pointy black mask. He hangs out on rooftops. And money.
Batman is loaded.
[Kristin] (5:48 - 5:51)
Well, I guess that might do it for some people. Yeah, I can see that.
[Regina] (5:52 - 6:09)
Some people. Not saying who. But you're right.
No, this paper is not about romantic cues and Batman being hot, sadly. It is about Batman being what the researchers called, quote, a novel unexpected event.
[Kristin] (6:10 - 6:33)
Right. The idea here is pretty simple. The idea is that something unexpected might pull people out of their routine.
And that momentary disruption might make them more aware of their surroundings and more likely to notice the needs of other people. So like you're on your usual commute and your head is down and you're looking at your phone and then, bam, Batman walks onto the train.
[Regina] (6:34 - 6:42)
And suddenly you notice, oh, whoa, there's a pregnant woman standing right in front of me and I'm just sitting here and I'm manspreading.
[Kristin] (6:42 - 6:51)
I'm taking up all the room. Right. So Batman is kind of a wake-up call.
It's like a jolt out of your normal, everyday self-absorption.
[Regina] (6:51 - 7:06)
Yeah, but I can't help but wonder, Kristin, why Batman? Right. Is that really the best example of something novel and unexpected on a Metro?
Like, how'd they choose that? Why not like, I don't know, a clown or a guy in a Corolla costume?
[Kristin] (7:07 - 7:16)
Well, I think the authors had a reason for Batman. They suggest that a superhero might trigger ideas about chivalry or traditional gender roles.
[Regina] (7:16 - 7:32)
OK, that makes sense. My personal hypothesis, though, is that someone in the lab had a Batman costume leftover from Halloween, right, and they're sitting around one evening and, you know, a couple bottles of wine later, they're like, hey, guys, you know, what would be a fun experiment?
[Kristin] (7:33 - 8:19)
You might be right. I can totally picture some grad students drinking wine and coming up with epic ideas for their theses and this would fit right in. Yes.
Yep. All right, Regina. So let's talk about what they actually did in this study.
So first of all, it took place on the Milan Underground Metro system. The Metro car had to be full, no seats available, but not too full. Specifically, no more than five people standing because they didn't want it to be so chaotic that you couldn't even see the pregnant woman or Batman.
[Regina]
Which is very Goldilocks. It has to be just right.
[Kristin]
Exactly.
And there were two conditions, control and Batman, and they ran both conditions at the same time on the same train, but in different cars and different spots on the platform.
[Regina] (8:20 - 8:34)
Right. Now, the control condition, female researcher boarded that crowded Metro car pretending to be pregnant, right, she wore that prosthetic belly and a sweater that made the belly pretty obvious.
[Kristin] (8:34 - 8:54)
And at the same time, an observer got on the car and watched to see whether anyone gave up their seat for her. That is the outcome variable, whether or not someone gave up their seat. If someone did give up their seat, the observer guessed their age and sex and wrote that down.
And then they also went up to that person and basically asked, why did you do that?
[Regina] (8:54 - 9:00)
Which is super bold to like nab someone on the Metro before you do an exit interview.
[Kristin] (9:01 - 9:07)
Absolutely. So that was the control condition. The experimental condition was identical except for one detail.
[Regina] (9:07 - 9:15)
Batman. Kristin, I feel like we need the music here for full dramatic effects from the TV show.
[Kristin] (9:15 - 9:32)
Yeah, but Regina, I found out last time when you asked me to splice in Jeopardy music that if the music is copyrighted, you can't do it. So we're just going to have to imagine. So at the same time that the pregnant woman boarded through one door, Batman boarded the same car, but through a different door.
Right.
[Regina] (9:32 - 9:45)
And that observer, again, watched to see if anyone gave up their seat and asked them why if they did. But in the Batman condition, the observer asked one extra question. Hey, did you notice Batman?
[Kristin] (9:46 - 9:56)
Exactly. And Batman stood at least nine feet away from the woman, presumably so it wouldn't look like they were together, which would really complicate the story.
[Regina] (9:56 - 10:03)
Right. Because if Batman is your bodyguard, that's a different thing.
[Kristin] (10:03 - 10:10)
Absolutely. Okay. So the observer collects data on whether or not anyone gave up their seat and then details about the chivalrous person.
[Regina] (10:11 - 10:24)
But Kristin, here's what's bothering me, right? It's the logistics of the whole thing. Because field studies, like we're talking about, these are messy.
They're happening in real time. It's not a lab. You can't plan it out and control it.
[Kristin] (10:24 - 10:42)
Right. This is very tricky. The paper says that for an observation to be considered valid, there had to be no open seats and fewer than five people standing.
And I assume that the observer makes that judgment call, but that's a lot to register in a crowded subway car very quickly.
[Regina] (10:43 - 11:02)
And that opens the door to bias, right? Because, like, imagine the observer sees six people standing, not five. But then someone gives up their seat to a pregnant woman in the Batman condition and they're excited.
They might think, you know, it's close enough and count that one. But if they were in the control condition, they might be stricter and not count it.
[Kristin] (11:03 - 11:18)
Yeah, there aren't a lot of details in the paper about quality control. So we don't know how consistent these observers were. For example, they didn't tell us how many trials ended up being valid and invalid in both of the conditions.
And I'd like to know that.
[Regina] (11:18 - 11:30)
And it happened fast. So each time the team had to get off the train at the next stop, which was just two to four minutes later, and get back on the platform and then repeat the whole thing again.
[Kristin] (11:30 - 11:45)
Yeah, this is a lot going on at once. Boarding, scanning the car, deciding whether the situation counts as a valid observation, watching for someone to give up a seat, writing this down, interviewing the person, and then quickly jumping off the train and doing it all over again.
[Regina] (11:45 - 11:53)
It honestly almost sounds kind of fun, right? A little like creating a podcast. Just as chaotic.
[Kristin] (11:53 - 11:59)
Regina, you know, I think we need to do a field experiment as part of the podcast sometime. That might be fun.
[Regina] (12:00 - 12:09)
That would be fun. Okay, but Kristin, we have to talk now about the most important thing, and that is what Batman actually looked like here.
[Kristin] (12:10 - 12:16)
Right, because they included a photo in the paper, and I have to say it was kind of anticlimactic.
[Regina] (12:16 - 12:17)
Yeah.
[Kristin] (12:17 - 12:23)
Black cape, kind of cool. Possibly a Batman logo, but you can't really see it.
[Regina] (12:23 - 12:39)
Yeah, so his top, it was this black, I don't know, quilted, leathery-looking thing, and a utility belt, maybe? Couldn't really tell. Now, black jeans and regular black and white athletic shoes, which is not how I picture Batman.
[Kristin] (12:40 - 12:49)
Yeah, and he had a mask, but he didn't wear it. He held it against his chest because the author said they didn't want to alarm passengers.
[Regina] (12:50 - 12:53)
Hmm. I admit, I had higher hopes for a sexy Batman boyfriend.
[Kristin] (12:54 - 12:54)
Yeah.
[Regina] (12:55 - 13:20)
So, this guy's getup was missing, I'm going to paint the picture for you, the muscle shirt that's clinging to his pecs and some black bikini underwear that goes over the tights. I looked this up, by the way, online. This exists, and let's just say that black bikini underwear is often quite tight, leaves very little to the imagination, and this was not that.
[Kristin] (13:20 - 13:24)
This was definitely not that. No, no, much more PG.
[Regina] (13:25 - 13:32)
And this guy's getup, for the real experiment, this would not at all stand out on the D.C. Metro, no.
[Kristin] (13:32 - 13:43)
You know, probably even less so in San Francisco, Regina, and I have to say, I don't ride the subway often, but my impression is that there are a lot of weird things in San Francisco, so this probably would not stand out at all.
[Regina] (13:43 - 13:53)
Oh, yeah. So, I'm wondering if maybe in Milan this felt more novel, because they dress pretty nicely in Italy. Maybe it's the boring sneakers that stood out.
[Kristin] (13:53 - 13:58)
Yeah, he wasn't wearing fancy Italian shoes, and that could be the unexpected event, yes.
[Regina] (13:59 - 14:19)
Oh, no. Okay, let's talk about the population for this study. It's a field experiment, so no one actually consented to participate ahead of time.
So, technically, the population is everyone who is riding those Metro cars at that moment.
[Kristin] (14:20 - 14:45)
Yeah, and it's interesting from a statistical standpoint, because the unit of observation is not a person anymore, it's everyone on the train car. And the outcome is about the group, not about one person. It's just whether anyone in the car gave up their seat.
And in calculating sample size, it wasn't about how many people they needed, but rather about how many valid train stops they needed to play out the scenario in.
[Regina] (14:45 - 15:00)
Right, and by the way, they did do a sample size calculation ahead of time, and it told them that they would need 136 observations total, so 68 per condition, and they needed that to make this study sufficiently robust.
[Kristin] (15:00 - 15:17)
Regina, I'm impressed that they did a sample size calculation ahead of time, and I'm also impressed that they pre-registered their study in clinicaltrials.gov. And in their protocol, they listed their sample size target as 68 per group, so they did indeed stick to their protocol.
[Regina] (15:18 - 15:26)
Right, and we've talked about pre-registration before. It locks in your protocol, so later you can't go back and cherry pick your results.
[Kristin] (15:26 - 15:43)
Yes, and they also in this paper had a clear single primary outcome, which after the paper we did in the last episode, frankly, Regina, makes me feel pretty good. The outcome here was just someone gave up their seat or didn't, so it's a totally monogamous study, and there was no multiple testing dude involved.
[Regina] (15:44 - 15:58)
Sorry, multiple testing dude, no fun here. Right, so they actually hit their sample size target, which we are excited about. They ended up with 68 Batman trials and 70 control trials.
[Kristin] (15:58 - 16:28)
If you know, Regina, I really wondered about the 70 control trials because their target was 68 and they were running the two conditions simultaneously, so how did they end up with two extra control trials? They were just having too much fun to stop. But I would have thought they were having more fun in the Batman condition and would have had the extras there, but okay, 68 and 70, maybe they just lost count, who knows.
And Regina, that's the study design, and now we are ready to look at the results.
[Regina] (16:29 - 16:32)
Which is going to be fun, but let's take a short break first.
[Kristin] (16:54 - 17:06)
Welcome back to Normal Curves. Today, we're examining the claim that weird unexpected things like Batman in public spaces can make people nicer, and we were about to talk about the results.
[Regina] (17:06 - 17:23)
The moment of truth, did Batman save the day? Kristin, another nice thing about this paper, as we're talking about the results, is that the results themselves are fairly simple. It's not like we've got a ton of numbers running around that we need to keep track of.
[Kristin] (17:24 - 17:42)
You know, Regina, I would describe the paper as wonderfully uncluttered when it comes to the results section. And let's start with some basic numbers that they report in the text. They say that in the control, the non-Batman condition, someone gave up their seat to the pregnant woman about 38% of the time.
[Regina] (17:42 - 17:48)
Thirty-eight percent. Not even half. This is, like, not great humanity.
Not looking good.
[Kristin] (17:49 - 17:57)
Yeah, what happened to chivalry? But, Regina, in the Batman condition, someone gave up their seat about 67% of the time.
[Regina] (17:58 - 17:59)
Which is a real jump there.
[Kristin] (18:00 - 18:04)
Right, so at face value, Batman seems to make a big difference.
[Regina] (18:05 - 18:06)
There's the cape. Capes make people nicer.
[Kristin] (18:07 - 18:15)
Maybe. And, Regina, they also report that the difference between the groups was statistically significant or what you like to call statistically discernible.
[Regina] (18:16 - 18:26)
All right. And their p-value was less than .001. And a p-value like that usually attracts a lot of oohs and aahs, Regina.
[Kristin] (18:26 - 18:27)
People love their small p-values.
[Regina] (18:28 - 18:44)
Ooh, aah. Sounds kind of dirty when I say it that way. I should not do that.
Okay, but now let's talk about how big it was. I mean, the magnitude of the effect.
[Kristin] (18:45 - 18:57)
Right. So the outcome here, to remind everybody, was dichotomous, also called binary, because on each metro car, either someone gave up their seat or they didn't. So it's a yes, no, or one, zero variable.
[Regina] (18:58 - 19:04)
Right, and when the outcome is binary, we actually have a lot of different ways to describe how big the effect is.
[Kristin] (19:05 - 19:24)
Yeah, but let's start with the most obvious way to report the data. 67% of the time, someone gave up their seat in the Batman condition versus 38% of the time in the Control condition. 67% minus 38% is a 29 percentage point difference between the groups.
That's big. That's an absolute increase of 29%.
[Regina] (19:25 - 19:38)
Right, I like to frame this as saying, okay, if you've got 100 metro car rides, when Batman was present, about 29 more times people gave up their seat to the pregnant woman.
[Kristin] (19:38 - 19:49)
I like that framing, Regina. And Regina, I think they could have just stopped there with their results section. Those data tell the whole story.
And actually, this is the most transparent way to present the results.
[Regina] (19:50 - 20:09)
I agree. They could have also talked about the results in terms of the relative difference in those percentages. So 67% is almost double 38%.
So Batman almost doubled the chance that someone would give up their seat. Exactly.
[Kristin] (20:09 - 20:18)
And, you know, Regina, the journalists figured this out. Remember in one of the headlines I read, it said that seeing Batman doubled acts of kindness.
[Regina] (20:18 - 20:46)
It's not quite double, because 67% divided by 38% is 1.8. So just to be pedantically precise, Batman increased acts of kindness by 1.8 fold, or 80%, which doesn't make quite as nice a headline. But how did I get that 80% increase? 1.8 minus one is 0.8, and that's an 80% increase.
[Kristin] (20:46 - 20:58)
And we call this 1.8. It's a risk ratio or chance ratio. And you get it just by dividing the chances or risks in the two groups. And I think this measure is pretty intuitive.
[Regina] (20:58 - 21:12)
Yeah, but weirdly, Kristin, the authors did not present a chance ratio or risk ratio like this. Instead, they presented something called an odds ratio, which is not intuitive as we are going to see.
[Kristin] (21:13 - 21:25)
Not at all. And the reason an odds ratio appears like Batman out of nowhere in this paper is because they used a statistical technique called logistic regression.
[Regina] (21:25 - 21:54)
Which, again, is just weird because there was no reason they needed to do a logistic regression. They could have gotten a p-value by comparing two simple percentages, 67% versus 38%, with this old-fashioned chi-squared test, which is very simple. Kristin, but I'm wondering if maybe they were like, you know, reviewer two is going to respect me more if I say the word logistic regression in my paper.
I sound smarter.
[Kristin] (21:54 - 22:15)
Yes. Yeah, it was totally unnecessary to run a logistic regression. But I think you're right, Regina, they got sucked into that academic trap where they felt they couldn't just be simple and direct.
They had to make it sound more fancy and complicated. And logistic regression, that model, spits out this mysterious statistic called the odds ratio.
[Regina] (22:16 - 22:28)
Odds ratios. Kristin, I think it's time for a statistical detour on logistic regression and odds ratios because these are ubiquitous in the health and medical literature.
[Kristin] (22:28 - 23:07)
They are, and we haven't unpacked these before on this podcast. So let's do this now. I actually teach a whole class on logistic regression, but let me give you just like the 30-second version.
So logistic regression, the outcome variable is binary. Yes or no, one or zero. In this case, it was simply, did someone give up their seat or not?
And the problem with binary outcomes is that a pile of ones and zeros does not behave nicely if you want to fit a line to the data. And we really like fitting lines. So what we do is we take those ones and zeros and we transform them into something more useful that behaves more nicely called a log odds.
[Regina] (23:07 - 23:23)
Log odds, which is just the logarithm of the odds of the outcome, is one of those mathematically beautiful things that makes absolutely zero intuitive sense when you have to interpret them. And my students hate this.
[Kristin] (23:23 - 24:08)
Yeah, let me give you a little feel for why this is mathematically beautiful. So if you have a binary variable, you just have zero and one, and that's very limiting. If you then take an odds, it turns out that you bring back in all the positive numbers.
If you then take a log of the odds, you bring back in all the negative numbers. So now you can span all numbers instead of just zero and one. So that's why it's mathematically beautiful.
The bottom line is though, this model ends up producing this thing called odds ratios. And I'll tell you, there's a lot more that you can get out of a logistic regression model, not just odds ratios. But I think because the computer kind of spits out odds ratios and people are used to them, they end up often just dumping odds ratios into their paper without much thought.
[Regina] (24:08 - 24:10)
Yeah, without thinking it through.
[Kristin] (24:10 - 24:25)
You know, Regina, my sister who is an astrophysicist, one time she happened to be teaching a science for jocks type of class, and she was covering some general science, including medical studies. So she called me up and she was like, hey, what is this stupid odds ratio measure?
[Regina] (24:27 - 24:39)
I love that even astrophysicists are like, I can do Big Bang and heat death of the universe. I don't know, whatever they study. No problem, but there's odds ratio thing.
Too weird, hard pass.
[Kristin] (24:40 - 24:55)
Exactly. And I, of course, was like, yeah, it is kind of a silly metric. Now, math people and statisticians love it because it has beautiful mathematical properties, but for actual humans trying to interpret results, it can be really confusing.
[Regina] (24:55 - 25:04)
I've had some dates like that, actually. The guy is theoretically great on paper, but in real life, he's just kind of confusing and weird.
[Kristin] (25:05 - 25:11)
Yes, odds ratios are the amazing on paper, weird in person of statistics.
[Regina] (25:11 - 25:23)
Okay, Kristin, now I think we get to explain this mysterious Batman-like odds ratios, but in order to do that, I think we need to first explain what an odds actually is.
[Kristin] (25:23 - 25:58)
Right, because the only place we usually see odds in real life is in gambling. And Regina, do you gamble?
[Regina]
Just with my love life.
[Kristin]
Well, you know, actually, Regina, that works. I can make concrete examples here for illustration about your love life. Sure.
[Regina]
Uh-oh.
[Kristin]
Okay, first of all, odds. An odds is just the probability of something happening divided by the probability of it not happening.
So, for example, let's say that there's a 60% chance that a man will be good in bed.
[Regina] (25:58 - 26:08)
Oh, I cannot believe that you went there with this one. Okay, 60% chance, hmm, I'm not sure. Maybe overly optimistic.
[Kristin] (26:10 - 26:38)
Given my new situation, we're going to call it hopeful. Hopeful, Regina. Okay, but 60% chance, what does that translate to in odds?
So, if there's a 60% chance that a random man my age is good in bed, then there's a 40% chance that he's not because 100% minus 60% gives us 40%. So, the odds are 60 to 40 or 6 to 4, which we can simplify further to 3 to 2.
[Regina] (26:38 - 26:51)
3 to 2 odds. So, for every three guys who are good in bed, I'm going to get two guys who are not so good in bed. Okay, I think I can live with those odds.
[Kristin] (26:52 - 27:14)
Not bad, right? But, Regina, what if we were taking the more pessimistic and possibly, sadly, more realistic view? What if there was only a 20% chance that a random man our age would be good in bed?
20% chance good in bed means 80% chance not good in bed. So, the odds are 20 to 80 or simplified 1 to 4.
[Regina] (27:14 - 27:27)
So, that means the sex is good one time for every four times. It's not, which, oh my God, Kristin, what kind of bitterness are we channeling and wallowing in right now?
[Kristin] (27:28 - 27:35)
Yeah, that is kind of depressing. And since we're just making up numbers anyway, I'm going to pretend that our first set of numbers was correct.
[Regina] (27:36 - 27:45)
Oh, please, yes. And you're eventually going to have to collect some actual empirical data on this. And report back which one's more accurate.
[Kristin] (27:46 - 28:21)
Yeah, I think we're going to need some data, Regina. All right, so now let's go from odds to odds ratio. And an odds ratio, it's just the ratio of two odds.
Two odds from different groups, like Batman group, control group. And here's the important thing in this paper. We don't need a fancy logistic regression model to calculate the odds ratio for the Batman study.
We can just use those raw percentages that they gave us. So in the Batman condition, 67% of the time someone gave up their seat and 33% of the time they didn't. So the odds are 67 to 33, which is about two to one.
[Regina] (28:22 - 28:40)
Two to one. That means for every two Batman metro car sessions where someone gave up their seat, there was one session where no one did. So two nice seat givers for every one selfish seat hogger.
[Kristin] (28:40 - 29:16)
I like the way you put that. Right. Now in the control condition, it was 38% of the time someone gave up their seat and 62% of the time they didn't.
So the odds are 38 to 62, which is about three to five. The odds ratio is a ratio of those two odds. We just divide the odds in the two groups.
So remember in the Batman group, it was two to one. In the control group, it was three to five. Now when we're actually solving this mathematically, two divided by one is just two and three divided by five is just 0.6. So we're actually dividing two by 0.6 and that gives us a value of 3.3. So the odds ratio is about 3.3. Right.
[Regina] (29:16 - 29:40)
Which means in the Batman condition, there was a bit more than a threefold increase in the odds that someone would give up their seat compared to the control condition. Or we can also say there's a 230% increase in the odds of someone giving up their seat because 3.3 minus one is 2.3 and that's 230%.
[Kristin] (29:40 - 29:54)
Right. Which sounds big and impressive. Now the problem is that people sometimes look at the odds ratio of 3.3 and say, oh wow, a more than threefold increase in the chance or likelihood of someone giving up their seat.
[Regina] (29:55 - 30:00)
Which is wrong because odds and chance and likelihood are not interchangeable.
[Kristin] (30:00 - 30:11)
Right. We already saw that there is a 1.8 fold or 80% increase in the chance of someone giving up a seat. It's not a tripling.
It's not a 230% increase in the chance.
[Regina] (30:12 - 30:44)
Sometimes in casual everyday English, we use odds and chance interchangeably. You know, we say the chances of rain are 75% or the odds of rain are 75%. But really it should be the odds of rain are 75 to 25 or three to one.
But no one says that. They don't go around saying the odds of rain today are three to one. So if you want to annoy a statistician, this is a great trick.
Just say the odds of rain are 75%. It'll drive them crazy.
[Kristin] (30:45 - 30:56)
Yeah, that would annoy me, Regina, for sure. But notice, Regina, if someone confuses odds and chance here in interpreting this odds ratio of 3.3, they end up greatly exaggerating the effect.
[Regina] (30:57 - 31:16)
Right. 230% versus 80%. Right.
This is very different. It looks much more impressive with Batman if we're using odds rather than risk. And as we already said, though, there was no reason they needed to use a logistic regression or present an odds ratio for their data.
[Kristin] (31:17 - 31:42)
There was absolutely no reason here. But Regina, this happens a lot in the medical literature. There is a particular study design where you do have to present odds ratios.
But in the vast majority of cases where people are presenting odds ratios, they could have presented something much more intuitive. And again, the problem with odds ratios is that they often mislead people into thinking that the effect is much bigger than it actually is.
[Regina] (31:42 - 31:53)
You know, Kristin, the cynical side of me thinks that researchers do this on purpose. They present the odds ratio because they want to make the results look more dramatic and exciting.
[Kristin] (31:53 - 32:04)
Oh, I think that's sometimes true, Regina. Yes. So, Regina, do we have time, though, for a quick story that illustrates just how badly odds ratios can mislead people when people focus on them too much?
[Regina] (32:05 - 32:08)
Oh, cautionary tales. We always have time for cautionary tales.
[Kristin] (32:08 - 32:40)
OK, so there's a study that I use a lot in teaching. It was a quasi-experimental field study just like this one. Happened a few years ago.
And in the control condition of that study, about 93% of teens bought a sugary drink at a convenience store. That's a lot. Yeah.
And the study had several interventions. And one of the interventions was that the researchers posted a sign in front of the store that said something like, did you know it would take 50 minutes of jogging to burn off the calories in one can of soda?
[Regina] (32:41 - 32:43)
That would definitely make me reconsider the soda.
[Kristin] (32:43 - 33:05)
Yeah. And it did make some of the teens reconsider the soda because when the sign was up, 86% of teens bought a sugary drink. So that's a drop from 93% to 86%, about seven percentage points.
But in a video that the researchers released to the public, they said that the sign reduced sugary drink purchases by 50%.
[Regina] (33:06 - 33:13)
50%, like cut in half. Yep. But half of 93% is definitely not 86%.
[Kristin] (33:13 - 33:18)
It is definitely not. And I think we can all do that math in our heads, Regina.
[Regina] (33:18 - 33:23)
Let me guess what happened. They confused odds ratios and risk ratios.
[Kristin] (33:24 - 34:03)
Exactly. So just like in the Batman paper, rather than paying attention to those nice raw numbers I just told you, they ran a logistic regression and reported odds ratios. And let's talk about what the odds ratio would look like here.
So if 93% of teens buy soda, that means the odds are about 13 to one in favor of buying soda. But if 86% buy soda, the odds are six to one. We get the odds ratio again by dividing those two numbers.
So six divided by 13, that's roughly half or about 0.5. So the sign dropped the odds of a purchase by 50%.
[Regina] (34:03 - 34:08)
But that is corresponding to just a few percentage points drop in likelihood.
[Kristin] (34:08 - 34:56)
Right. It is really incorrect here to say that that was a 50% drop in the chance or likelihood of buying a sugary beverage. That is a total misinterpretation of an odds ratio.
And this is a perfect example of how odds ratios can sound dramatic and how they can mislead people if you casually translate them into language about risk. Okay, Regina, do you think we're ready for the big reveal now?
[Regina]
The plot twist.
[Kristin]
Yes. Because when I was preparing for this podcast, I wanted to bypass the logistic regression and just calculate the P value from a simple chi-square test. But to do that, I needed the actual numbers.
You need the actual counts to do a chi-square, not just the percentages. So I went to table one to get those numbers, those counts, and I started calculating some things. And then I immediately started texting you.
[Regina] (34:57 - 35:05)
Which was pretty funny. You said, am I crazy or did the authors really screw this up? And I said, yeah, I think the authors really screwed this up.
[Kristin] (35:06 - 35:35)
Yeah. Because table one says that someone gave up a seat in 41 out of 68 Batman trials and in 29 out of 70 control trials. But 41 divided by 68 is not 67%.
It's about 60%. And 29 divided by 70 is not 38%. It's about 41%.
So I was like, wait, where did they get the 67 and 38% that they report in the text?
[Regina] (35:37 - 36:20)
So I think it's good here to point out that so far in this episode, we've been rounding those percentages that they reported to 67% and 38%. But the paper actually gives very precise values that are going to be helpful when we go through our little statistical sleuthing here. And those exact values were 67.21% and 37.66%. Right. So we did some very basic reverse engineering. And we said, okay, let's assume that the numerators are correct. What denominators would give us exactly those numbers?
67.21 and 37.66%.
[Kristin] (36:20 - 36:41)
Right. And it turned out that if we used a denominator of 61 rather than 68 for the Batman group, we would get exactly the 67.21% that they reported in the text. And for the control group, if we used 77 as the denominator rather than 70, we get exactly that 37.66%.
[Regina] (36:41 - 37:21)
Right. So the simplest explanation that we can come up with is that this was a coding error. That they had 68 Batman trials and 70 control trials, like they said. But it looks like when they went to do the logistic regression, notice there's a seven count difference in both of those groups.
It looks like seven of those Batman trials ended up being coded incorrectly as control trials. And that's what would give those incorrect denominators of 61 and 77. And that's what would yield these very precise percentages that we saw in the text.
[Kristin] (37:23 - 37:54)
I mean, we don't have the raw data in front of us, but this appears to be what happened. I don't think it's coincidental that we can get back those exact percentages to the second decimal place if we use those alternate denominators. So we think the percentages in the table are correct, but the percentages in the text are not.
And this seems to be, as you said, Regina, a case of miscoding the data. But it's kind of bad that they didn't catch it here because it's a pretty glaring and obvious error. It's not like it's buried in the data set or the statistical code and hidden from sight.
[Regina] (37:55 - 38:04)
Right, they're sitting there in table one, the correct percentages. And they completely do not match the text. So you're right, it's surprising no one caught this.
[Kristin] (38:04 - 38:26)
And Regina, we can kind of guess what might have happened in terms of how things got miscoded, because sometimes binary variables are coded as ones and zeros. And it's a common mistake to mix up your ones and zeros. So what might've happened is that someone accidentally coded seven of the Batman conditions as zeros when they were supposed to be coded as ones, something like that.
[Regina] (38:26 - 38:41)
These sort of coding mistakes, coding reversals are pretty common. My favorite is I read a paper correction where there was a minus time reversal and they blamed it on, quote, gremlins intervening during the preparation of the paper.
[Kristin] (38:42 - 38:47)
Um, if gremlins is code for human error, sure, yeah, we'll buy that.
[Regina] (38:48 - 39:05)
So you and I recalculated the results using what we think are the actual observations, what we see in table one. And it turns out that the difference in seat giving is about 19 percentage points. So going from roughly 41% to 60%.
[Kristin] (39:06 - 39:11)
So the new numbers still show an increase, but just not as dramatic as the original paper.
[Regina] (39:12 - 39:23)
Right, and we think the real risk ratio then is about 1.5, 60 divided by 41%. So now it's 50% increase in seat giving when Batman was present, not a near doubling.
[Kristin] (39:23 - 39:49)
Right, and the recalculated odds ratio is about 2.1, not 3.3. And the p-value is still significant, but it's a lot less dramatic. It's around 0.03 instead of less than 0.001. So everything just looks less impressive, especially when you combine that with the fact that the study design left a lot of room for judgment calls about which observations even counted as valid. So at this point, the results don't exactly scream bulletproof.
[Regina] (39:49 - 40:19)
Mm-hmm, they do not. And Kristin, we're not the only ones who noticed this. You and I were wondering, should we contact the authors or write a letter to the editor?
And we checked on PubPeer, which is a really cool site where anyone can post comments on published papers. And we noticed that several people had already flagged errors in this paper, including the ones we caught. And one of them was someone we know, a guy named Gideon Meyerowitz-Katz.
[Kristin] (40:19 - 40:32)
Yeah, he is one of the statistical sleuths that we like to follow. And he's often flagging these types of errors and sometimes posts little quizzes on X. Can you spot the error in this table type of thing?
And I like to steal those for class.
[Regina] (40:32 - 40:36)
Yeah, he has a nice subsect too. It's called Health Nerd. I like to read it.
[Kristin] (40:36 - 40:46)
Yeah. Okay, so, Regina, they got their numbers wrong. Doesn't totally negate the results, but definitely makes them less impressive.
There were a few other interesting things in the results that we should talk about, though.
[Regina] (40:46 - 40:51)
I like. Guess which sex was most likely to give up their seat?
[Kristin] (40:51 - 40:57)
You would think it would be men. I mean, chivalry and Batman and all. Mm-hmm, nope.
[Regina] (40:58 - 41:04)
Surprisingly, two-thirds in both conditions, women. Women give up their seat more often.
[Kristin] (41:04 - 41:26)
Well, the average age of the seat-giver uppers was 41, which does make me think maybe they may have been women who had been pregnant themselves at one time, so they were more sympathetic, maybe.
[Regina]
Or maybe women are just nicer.
[Kristin]
I'd buy that.
But, Regina, what happened to chivalry? I mean, okay, I have to go back into the dating world to this? Are you kidding me?
[Regina] (41:26 - 41:31)
I am afraid, Kristin, that chivalry is dead. Dead. I'm sorry.
[Kristin] (41:32 - 41:52)
Okay, Regina, one more detail that was interesting. They only managed to interview 32 out of the 41 seat-givers in the Batman condition. I'm gonna forgive them for that because I imagine it was really hard to grab people and interview them in the middle of the subway.
Interestingly, though, almost half of that 32, 14 out of that 32, said they didn't even notice Batman.
[Regina] (41:52 - 41:58)
Kristin, they would have noticed Batman, all right, if he'd been wearing the tight-fitting costume that I have in my head.
[Kristin] (41:59 - 42:07)
You're probably right. And, you know, Regina, then maybe the women would have given up their seats just to get a little closer to him.
[Regina] (42:08 - 42:12)
Maybe accidentally fallen into his arms or lap. Oopsie.
[Kristin] (42:14 - 42:30)
Yeah. The authors interpreted this result to indicate that maybe seeing Batman might be contagious.
Like, someone who sees Batman, they act more nicely to those around them and that transmits to the person who eventually gives up their seat, something like that.
[Regina] (42:30 - 42:32)
Okay, sure.
[Kristin] (42:32 - 42:57)
I'm not sure how plausible that is, but there is actually other famous research on social contagion that does suggest that niceness and generosity can spread through social networks. And I buy this to some extent, Regina. I mean, I always think, like, if the checker at the grocery store is super nice, that kind of spreads, like everyone around is more jolly rather than being impatient and grumpy in line.
[Regina] (42:57 - 43:04)
It's true. I buy that one. And if there's research on this, I definitely now want to do an episode on this whole idea.
[Kristin] (43:05 - 43:10)
All right, I'm putting it in the queue, Regina. But, Regina, I think we are now ready to wrap up the Batman Effect paper.
[Regina] (43:11 - 43:18)
The claim here is that weird, unexpected things in public spaces make people nicer.
[Kristin] (43:18 - 43:34)
And how do we rate the strength of evidence on this podcast? With our highly scientific, trademarked one-to-five smooch rating system. One smooch meaning little to no evidence for the claim, and five meaning strong evidence for the claim.
So, Regina, do you want to kiss Batman?
[Regina] (43:34 - 43:36)
Oh, very much so. Yes, please.
[Kristin] (43:38 - 43:42)
Oh, interesting. I'm not into the superhero phenotype myself, Regina.
[Regina] (43:42 - 43:48)
Oh, my God. Did you just say phenotype? You know, you are such a nerd. I can't believe it.
[Kristin] (43:48 - 43:54)
I am nerdy, but hello, pot calling kettle black.
[Regina] (43:56 - 44:01)
All right, fair enough, phenotype though. Okay, how many smooches are you giving this one, Kristin?
[Kristin] (44:01 - 44:10)
I actually think there is some plausibility to the claim. So are we basing this on general research on the topic or just this specific paper, Regina?
[Regina] (44:10 - 44:15)
Just this paper, specifically the Batman Effect, not the unexpected effect.
[Kristin] (44:15 - 44:52)
In that case, I'm going to have to go one smooch. I like that they kept the paper simple, but if you can't keep your numbers straight, especially when there are only two main numbers, that is a red flag. I'm also concerned about the logistics.
It seems like a really chaotic study to pull off, and it would have been easy for some bias to creep in. So, for example, they don't tell us how many valid and invalid trials there were in each condition. So how do we know that they didn't just throw out some of the Batman trials if no one gave up their seat with the excuse that, hey, the conditions weren't quite right?
So it's going to be one smooch for me. How about you, Regina?
[Regina] (44:52 - 45:13)
That seems reasonable. So I agree with you. I think the paper was nice and simple.
It was pre-registered. It had a sample size calculation, clear primary outcome, but I am with you in spirit on the problem. That was a pretty big number screw up there.
I'm going to, however, give it two smooches just because I want to kiss Batman one more time.
[Kristin] (45:14 - 45:15)
Fair enough, fair enough.
[Regina] (45:16 - 45:19)
All right, what about methodological morals for this one?
[Kristin] (45:19 - 45:39)
All right, Regina. So I do like the fact that this paper was pretty spare. They didn't over clutter it, but we were missing some pretty key details about the logistics.
So mine is: We love an uncluttered paper, But when it's missing the basics, it's like an empty fridge. Clean, yes, but dinner is not happening. Which is my house a lot.
[Regina] (45:40 - 45:41)
Yeah, now I'm hungry.
[Kristin] (45:41 - 45:42)
What about you, Regina?
[Regina] (45:43 - 45:49)
Okay, here's mine. Before you make a fancy model, make sure the numbers in the table in the text match.
[Kristin] (45:49 - 46:15)
Yeah, that one is just so basic. I really can't believe that that one got missed. It's really, really glaring.
All right, Regina. So that wraps up this episode. And I want to let everyone know that we are about to go on our holiday break.
So this is the official last episode of season one of Normal Curves. Now over the holidays, we will be replaying some of our favorite episodes from season one with some bonus added commentary. So I hope everyone listens to them.
[Regina] (46:15 - 46:20)
That's going to be fun. So look out for these and we will see you with new episodes in January.
[Kristin] (46:21 - 46:23)
Happy holidays, everyone.
[Regina]
Happy holidays.