(Two-minute read. An eternity to apply)
A few thoughts on the nature of hypothesis testing. For those who do not know or have forgotten, hypothesis testing is at the foundational core of science and, some would say, ‘truth’. It is the discipline, the science, the craft, the art, the philosophy of experimental construction and assessment as to whether something is ‘true’ or ‘false’. Interestingly it is impossible to prove that something is true or false1 in our physical reality; in so-called spacetime. One can only prove to an ever-increasing asymptotic probability that something is true or false, within the constructs of a particular experiment. Remembering also that there is no such thing as a 100% closed experimental system. This is interesting2.
From a scientific perspective it is essential to start hypothesis testing with a 100% sceptical belief, certainty even, that something is false, in the good traditions of Saint Thomas3 and Descartes. Otherwise, as a scientist, one is simply propagating bad science and should go and do something more useful. Like watch Dune II. Or play football. Or just get out more. This sceptical starting point is what is known as the null hypothesis, often abbreviated as H0. The subject at question one is attempting to test for and to generate a signal for is known as the Alternative Hypothesis. This is often abbreviated as HA, though I prefer the term H1 as this gives rise to other hypotheses such as H2 etc4.
This brings us to the nature of so-called false-positives and false-negatives. We can use school-level statistics to work out what we think is the probability that the null hypothesis is more or less likely then a defined probability point and confidence level. And one’s confidence will build with data. The more data one has, the greater the confidence level. Which is why Big Data and its progeny such as Claude, Grok and Gemini are so powerful. And so useful. And so dangerous. The probability that the null hypothesis is true is called the p value, with p standing for probability, and is normally quoted as a fraction of 1 (0 means that the null hypothesis is impossible and 1 means that the null hypothesis is certain). So a p value of for example 0.05 indicates that the calculated probability of the null hypothesis being true is 5% (This is really really useful for horse-racing by the way). But a p value based on just a small amount of data – a small data sample – is going to be subject to a huge margin of error whereas a p value based on a thousand or ten million data points is going to be more robust with a smaller margin for error. And the nice thing about this is that it does not matter, within constraints, how noisy the data is, due to a mathematical law known as mean reversion and the nature of Gaussian noise.
So, what p-value will we use for a particular experiment and set of tests? Well that depends. If the hull hypothesis is that ‘This plane will crash on this flight’ and the alternative hypothesis is ‘No it wont’ then I personally would be looking for something like 10-10. If I was looking at the chance that it might rain on the Common later when I am out with my dog, I’d be happy with a p value of 0.2, because I can invoke contingency via a coat and also, more importantly, because I can take shelter in a pub. If the test is about whether Tottenham Hotspur will win the league in the next ten years then I would ignore the mathematically calculated results. It is clearly an impossibility.
Time for breakfast.
References and selected bibliography
For those who would like to run better experiments then these links covering the basics of hypothesis testing, the nature and dangers of false positives and false negatives, and how to build experimentally robust statistical and explanatory power strike me as good as of this now5:-
Khan Academy: “Hypothesis testing and p-values” Last accessed 7th April 2024
StatQuest: “Hypothesis Testing and The Null Hypothesis, Clearly Explained” Last accessed 7th April 2024
BMJ: “Definitions and formulae for calculating measures of test accuracy” Last accessed 7th April 2024
Wikipedia: “Power of a test” Last accessed 7th April 2024
Wikipedia: “Mean reversion (finance)” Last accessed 7th April 2024
Wikipedia: “Gaussian noise” Last accessed 7th April 2024.
For those would like a deep dive into the wider aspects touched upon in this short note then my post “Rutherford and Shannon, and the art of Public Relations” is a reasonable starting point. But too conceptual. And a hard read. Hoffman (2024) provides a really good three page summary that starts to expose the essential mathematics and physics.
Notes
1See Gödel
2The Second Law of Thermodynamics deals with time, space and matter, and its entropy is one-way; entropy can fluctuate up or down within sub-domains, but the net direction is always an increase in entropy across the universal set. The entropy of Shannon Information is two-way. It fluctuates across dynamic information space across whatever information dimensions are open and relevant and normalises to unity across any information dimension; it is probability space. The nature of dimensional bifurcation and collapse across and within Hilbert spaces is interesting.
3 I originally meant the apostle Thomas here. The doubting Thomas. But Saint Thomas of Aquinas works too and has a richer and deeper biography than the doubter.
4See Cantor. Conceptually and philosophically as many as one likes. Practical constraints apply.
5The concept of the now requires some thought. The now of classical physics is a singularity and it is fractal. And the classical conception of time is as a one-way linear arrow, albeit with the modifications of General Relativity. In Shannon Information probability space-time, now is different, let’s call it Shannon Now. A nice way to think about and visualise ‘new’ now is as Brownian motion and its integrals and differentials. No, not the minister’s son, the motion of particles in an aether noticed by Robert Brown in 1827. But whereases Brown’s motion was of pollen seeds in water with the invisible hands being water molecules, Shannon Now exists in information space. Now is a node, a singularity, that is both a source and a sink. It clusters and disperses and links to other ‘nows’ as an entity [within dynamic multi-dimensional information probability space]. A neural net of pure information giving rise to network effects. I personally find the words ‘node’ and ‘link’ a little lacking, so prefer the terms ‘filament’ and ‘mesh’. And where are these nows? They exist as individual and collective consciousnesses in our grey matter. And manifest as beautiful mathematics and art and life and poetry and love. I call this model of information and physics Quantum Information Reality (QIR).