Given some reasonable postulates regarding the formulation of explanatory hypotheses, it can be mathematically demonstrated that a probability distribution over all possible explanations will be biased toward simpler explanations — in an overall way the simpler explanations will be more probable than the more complex ones, although there may be individual exceptions.
We make the following postulates:
1) The explanatory hypotheses are described by a language that has a finite number of different words, and each hypothesis is expressed by a finite number of these words. That this allows for natural languages such as English, but also for computer programming languages and so on. The proof in this post will be valid for all cases. This is a reasonable assumption since human beings do not use any infinite languages, nor do they use an infinite number of words to make a point.
2) A complexity measure is assigned to the hypotheses in such a way that there are or may be some hypotheses which are as simple as possible, and these are assigned the complexity measure of 1, while hypotheses considered to be more complex are assigned higher integer values such as 2, 3, 4, and so on. Note that apart from this, we can define the complexity measure in any way we like, for example as the number of words used by the hypothesis, or in another way, as the shortest program which can output the hypothesis in a given programming language. Many other definitions would be possible. The proof is valid for all definitions that follow the conditions laid out, even the ones which would be intuitively somewhat distant from the idea of something simple. This again is a reasonable assumption given what we mean by simplicity — we do not think it is possible to make a thing infinitely simpler, but there is always something simplest.
3) The complexity measure is also defined in such a way that there are a finite number of hypotheses given the measure of 1, a finite number given the measure of 2, a finite number given the measure of 3, and so on. Note that this condition is not difficult to satisfy; it would be satisfied by either of the definitions mentioned in condition 2, and in fact by any reasonable definition of simplicity and complexity. If there are an infinite number of hypotheses that are supposedly absolutely simple (with the measure of 1), and we describe these hypotheses in English, an infinite number of them will not be able to be described without using at least 10,000 words, or without using at least 100,000 words, and so on. This seems very remote from the idea of a simple explanation.
With these three conditions the proof follows of necessity. To explain any data, in general there will be infinitely many mutually exclusive hypotheses which could fit the data. Suppose we assign prior probabilities for all of these hypotheses. Given condition 3, it will be possible to find the average probability for hypotheses of complexity 1 (call it x1), the average probability for hypotheses of complexity 2 (call it x2), the average probability for hypotheses of complexity 3 (call it x3), and so on. Now consider the infinite sum “x1 + x2 + x3…” Since all of these values are positive (and non-zero, since we consider each hypothesis to be at least possible), either the sum converges to a positive value, or it diverges to positive infinity. In fact, it will converge to a value less than 1, since if we had multiplied each term of the series by the number of hypotheses with the corresponding complexity, it would have converged to exactly 1, since the sum of all the probabilities of all our mutually exclusive hypotheses should be exactly 1.
Now, x1 is a finite real number. So in order for this series to converge, there must be only a finite number of terms in the series equal to or greater than x1, and therefore some last term which is equal to or greater than x1. There will therefore be some complexity value, y1, such that all hypotheses with a complexity value greater than y1 have an average probability of less than x1 (the average being taken over the hypotheses with the same complexity value, as above). Likewise for x2: there will be some complexity value y2 such that all hypotheses with a complexity value greater than y2 have an average probability of less than x2. Leaving the derivation for the reader, it would also follow that there is some complexity value z1 such that all hypotheses with a complexity value greater than z1 have a lower probability than any hypothesis with a complexity value of 1, some other complexity value z2 such that all hypotheses with a complexity value greater than z2 have a lower probability than any hypothesis of complexity values 1 or 2, and so on.
From this it is clear that as the complexity tends to infinity, the probability of the hypothesis will tend toward zero in the limit. This will happen in such a way that for any particular probability (e.g. one in a billion), there will be some degree of complexity such that every hypothesis at least that complex will be less probable than the chosen probability (e.g. less than one in a billion.)