P-hacking

The use of p-values has long been subject to criticism, one of which is its ability to be ‘hacked’. P-hacking is when a researcher tries lots of analyses and data treatments until they get the result they want (i.e., p<.05). For example, this might be fishing for p-values in a dataset, excluding outliers, transforming the data, analysing many measures but only reporting those with p<.05 – all represent potential selection decisions by the researcher. As Coase said, if you torture the data long enough it will confess. On 7 March 2016 the American Statistical Association (ASA) published a statement on the use of p-values (see Nature and the Oxford Internet Institute blog for background and commentary on the p-value problem). At least one journal has introduced a ban on the reporting of p-values.

The outcome of over-reliance, misinterpretation, and misuse of p-values is that much reported social science research is not reproducible – anywhere between 50% and 80% (also see John Ioannidis’ pioneering article, “Why Most Published Research Findings are False”).

For further detail Screen Shot 2016-03-09 at 13.50.01on reproducibility of research see the Ioannidis video.

To find out how to p-hack (and how to prevent it) see the video by Neuroskeptic.

See the “dance of the p-values” video to see how unreliable p-values can be.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s