It isn't that p values need a clearly-defined alternative to compare to: they don't (this is how nonparametric models are used). What they do need is to be defined in terms of a data dimension against which to be sensitive. One easy way of defining this is embedding a hypothesis into a larger space that offers an explicit alternative, but it isn't the only way.
When viewed this way, it is obvious what is wrong with Cohen's example (and why it shows that he might not have actually understand significance tests). Data being rare/unlikely is not the important thing, and nobody who understands significance tests would say it is. Cohen's example is as if he decided to take LARGE p values, instead of small p values, as evidential. But this makes no sense, because it ignores that we choose a test statistic for its sensitivity. His "test statistic" (X is president) is maximally insensitive for testing the chosen hypothesis! He's not identified an issue with the significance test; he's shown why they need to be built in a particular way (Neyman wrote very clearly on this in his 1952 book, “Lectures and conferences on mathematical statistics and probability” - I should note that I made a similar mistake as Cohen in this blog post, where I outline Neyman's point: https://bayesfactor.blogspot.com/2015/03/the-frequentist-case-against.html).
But all statistical tools are meant for a specific purpose, and that's ok. Understanding *what* the purpose of a particular p value is, what it is sensitive to, and what it is not, is crucial to using them properly. By understanding the p value as a critique of a particular inference, rather than a way of drawing an inference, we make that explicit - in much the same way that bringing up, say, survivor bias is another *specific* critique.