# Severity and statistical evidence

## Notes on Tour 1 of “Statistical Inference as Severe Testing”

Anyone who’s had any contact with statistical methods recently knows that there’s a battle being fought over the future of statistical methods. Actually, more than one; the big ones are significance testing vs confidence intervals and Bayes vs frequentism. The so-called “replication crisis” in the various sciences has provided an opportunity for people to advocate various solutions to the issues that plague statistical practice. These issues are real, and stakes high: bad choices could mean another 40 years wandering in the desert of bad methodology, as opposed to cleaning up some of the mess in various fields.

I was happy to snag the very last copy of Mayo’s “Statistical Inference as Severe Testing: How to get beyond the statistics wars” at the recent Royal Statistical Society conference in Cardiff. Mayo’s goal, telegraphed in the subtitle, is ambitious: Can we really get beyond the wars that have been raging for decades? *Particularly* at a time when the opportunities for the various actors to shape the future are so great?

But now is the most critical time to get beyond these “wars”. What we need is a discussion at a level above the nuts and bolts of statistical theory: we need philosophy. Only with an expansive view of the landscape of statistical inference in science can we be sure that we don’t harm science while trying to save it. This is the departure point for Mayo’s book.

I am currently reading the text, and I will try to blog with some notes as I finish sections of it. This post is about the first “Tour”, *Beyond probabilism and performance*.

## Beyond Probabilism and Performance

The title of the section is immediately refreshing. Those familiar with Mayo’s work will know she advocates a frequentist perspective on statistical inference; those familiar with my work will know I generally advocate a Bayesian perspective. “Probabilism” refers to one way of understanding Bayesian inference, and “Performance” refers to one way of understanding frequentist inference. Bayesians are traditionally skeptical of a focus on performance, and Mayo offers us a peace offering if we are willing to take it: our suspicions of the performance viewpoint are indulged. The cost of this peace offering is a willingness to be skeptical of our own probabilism in turn.

The key to making this work is a reliance on meta-statistical principles: philosophy. We need to explore our intuitions about what scientific evidence. What makes for a strong scientific/statistical inference? What do we want from statistics? Answering this question *outside of any particular statistical theory* is important, because many battles in the “statistics wars” are fought over evaluating one statistical theory by the standards of another, when they should be considered on higher ground.

Mayo gives us a principle we can work with: *severity. *Her weak severity principle is something that few scientists, I think, would object to.

One does not have evidence for a claim if nothing has been done to rule out ways in which the claim may be false.

You might be right about a claim, but your responsibility, if you say you have *evidence* for a claim, is to show that you’ve tried to rule out ways in which it can be false. A good test of a claim — a *stringent* test — is one which has a strong possibility of ruling something out, if it is false. Mayo’s strong severity principle makes the connection between stringency and severity:

We have evidence for a claim…just to the extent that it survives a stringent scrutiny.

Mayo’s position is that neither probabilism nor performance goals adequately capture the severity perspective. The long-run performance perspective says nothing about how well-tested any particular claim is, and frequentists might even deny that such a goal is interesting: they just want to control *overall *error rates. Likewise, the Bayesian perspective focuses on coherence and the move from prior to posterior; there is nothing formal in Bayesian statistics that requires severity.

This does not mean that scientists applying statistical methods — Bayesian or frequentist — don’t fill the gap. Mayo introduces the idea of “decoupling”: that methods can become disconnected from the philosophy which originally spawned them. We are invited, then, to ask *how our methods meet the requirements for severe testing*, regardless of whether those methods are Bayesian, frequentist, or other. This appears to be Mayo’s roadmap away from the statistics wars, which she will outline in Tour 2.