The p<0.05 Problem
When a drugs company has developed a drug to the point where it has passed enough of the safety qualifications, it is common to set up a double-blind placebo trial. This is a clinical trial in which a number of volunteer test subjects are divided into two groups; all are told they are helping to test a possible drug treatment (not necessarily for any condition related to any existing diagnosis of the test subjects) with the goal of observing any effects or side-effects. One group ("test group") receives the drug being tested; the other group ("control group") receives something of identical appearance but which does not contain the drug being tested ("placebo"). No researcher, assistant, staff member, or participant (subject) is told who is getting which.
Everyone reports their observations, which are compiled as data. The data from the test group are compared to those from the control group using various statistical techniques. Because test subjects were assigned randomly, there might be luck (or "bad luck") causing results that look interesting, but are really just a product of random chance.
To deal with that, statistical hypothesis testing is used. Measures of correlation are calculated, and for each such measure, the probaility p is computed; the p value is the probability that equivalent data from two identical groups (i.e. "placebo groups") would result in the same (or higher) correlation measure.
Based on this probability p and a threshold (often 0.05), a decision is made as to whether to bring the drug to another (usually larger or more ambitious) clinical trial.
Because of the nature of the calculation of p, it is likely that out of a large number of such double-blind trials, at least 5% of the correlation measures in the studies will have a p less than 0.05. If the drugs company has 100 candidate drugs and performs such trials on all of them, and if only the p-value from a single correlation measure computed in each trial were used to decide if that drug advances to another trial phase, one could expect that 5 drugs will advance, independently of any actual effects the drugs might have. Put another way, if N out of the 100 drugs advance to the next phase based only on a single p-value decision, there is at least a 50% change that N will be at least 5%, and at most (N-5)% of the drugs that advance to the next trial actually have an effect.
SIGNIFICANT xkcd by Randall Munroe
This work of Randall Munroe is licensed under a
Creative Commons Attribution-NonCommercial 2.5 License.
How to Challenge Experts Outside Your Field
There are other criticisms of the use of p-values, and of hypothesis testing in general. It is easy for non-experts to write papers based on a criticism of statistical methods, because knowledge of statistical methods and of their flaws is broad, and spreads across many disciplines.
In order to address such criticisms, and provide evidence-based counter-argument, requires experts in the field. For example, in the drugs-qualification example, effectively falsifying the trial procedures requires analysis of the procedures used to address flaws in hypothesis testing. This would require going inside the institutions responsible (hospitals, universities, drugs companies, etc.) and looking at the details.
If one is not an expert, one has other options: politics, or investment.
The political approach is simply to try to convince other people that the issue is important. With enough support, politicians can be solicited to help inspect the questioned research procedures, and challenge them using public funding (e.g. by having a ministry undertake an investiation).
One can also bypass the political route by investing a large amount of one's own money to set up a foundation for the purpose of conducting independent review.
I wrote a related article about Cross-Discipline Original Research, or "Why I Do Not Solve Physics".
This page was written in the "embarrassingly readable" markup language RHTF, and was last updated on 2019 Feb 19. s.11