What is a p-value? Scientists are regularly tasked with the job of demonstrating that their experimental results are significant and to do so they test for statistical significance (and insert things that look like this in their studies: p < 0.05) but what does that mean?

If you are running an experiment and take a bunch of measurements, those measurements are called 'significant' if the p-value is below 0.05. What does this mean? And how can this selection of a threshold (p<.05) for reaching 'significance' affect a scientist's experiments?

A p-value below 0.05 means that if you tasked 100 monkeys with producing a "measurement" then your experiment's measurements (p<0.05) are better than 95 of those monkeys' measurements. If your measurements are just significant, say around p=0.05, then your measurements are actually worse (ie. more similar to no effect or thought of another way very similar to what is expected from random chance) than what we should expect from the 5 best monkeys of the 100 monkeys producing random measurements.

As you can imagine, not beating the 5 best monkeys out of the 100 producing random results is nothing we scientists should be getting excited about!

But "statistical significance" holds a special allure to scientists because demonstrating statistical significance is significantly correlated with the success of scientific studies in the peer review process (p<.05). Since a scientist's career progress is intimately linked with the scientist's publication record, and since anonymous peer reviewers continue insisting that studies include p-values, scientists will continue finding their own work 'significant' en masse.

Interestingly, performing statistical testing with the commonly used t-test was first proposed by a statistician working in a beer factory. Ironic, seeing as how statistical testing, being poorly understood, has a tendency to make scientists seem a little tipsy (alcohol does have a tendency to inflate one's sense of one's own 'significance').