Mathematics and the Laws of Nature

In his essay The Unreasonable Effectiveness of Mathematics in the Natural Sciences, Eugene Wigner says, “The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve.” But in reality, it can be proved that a physical world — a world which has an order of place, with one part beside another, and an order of time, with one thing before another — must of necessity either follow mathematical natural laws, or it must be more or less intentionally designed in order to avoid this.

For example, suppose we attempt to determine how long it takes a ball to fall a certain distance. We do not need any particularly exact method to measure distances; for example, we could be measuring a fall of ten feet, taking foot in the presumably original sense of “the length of an adult human foot,” despite the noisiness of this measure. Nor do we need any particularly exact method to measure time; we could for example measure time in blinks. Something took 10 blinks if it took so long that I blinked 10 times before it was over. This would be even noisier than measuring in feet. But the point is that it does not matter how exact or inexact the measures are. If we have a world with place and time in it, we can find ways to make such measurements, even if they are inexact ones. Nor again do we need a way to get an extremely precise measure in blinks or in feet or in whatever of the physical quantity we are measuring; it is enough if we get a best estimate.

Now suppose we repeatedly measure, in some such way, how long it takes for a ball to fall a certain distance. After we have made many measurements, we can add them together and divide by the total number of measurements, getting an average amount of time for the fall. The question that arises is this: as we increase the number of measurements indefinitely, will that average converge to a finite value? or will it diverge to infinity or go back and forth infinitely many times?

Evidently it will not diverge to infinity. It is difficult to see any reason in principle why it could not go back and forth infinitely many times, for example the average fall time might tend toward 1/4 of a blink for a long time, then start tending toward 1/5 of a blink for a long time, and then go back to 1/4, and so on. But we should notice the kind of pattern that is necessary in order for this to happen. Suppose the average is 1/4 of a blink after 100 measurements. In order to get the average to 1/5, there must be a great many measurements 1/5 or below, or at least many measurements which are very much below 1/5. And the more measurements we have taken to get the average, the more such especially low measures are needed. So if we are at an average of 1/4 of a blink after 1,000,000 measurements, this average will be very stable, and it will require an extremely long series, more or less continuous, of especially low measurements in order to get the average down to 1/5 again. And the length of the “especially low” or “especially high” series which is needed to move the average will be increasing each time we want to move it again. In other words, in order to get the average to go back and forth infinitely many times, we need to have a rather pathological series of measurements, namely one that looks like it was designed intentionally to prevent the series from converging to an average value.

Thus the “natural” result, when things are not designed to prevent convergence to an average, is that such measures of distance and time and basically anything else we might think of measuring, like “how much food does an adult eat in a year”, will always converge to an average value as we increase the number of measurements indefinitely. Given this result it follows that it is possible to express the behavior of the physical world using mathematical laws.

Several things however do not necessarily follow from this:

It does not follow that such laws cannot have “exceptions”, since they are only statistical laws from the beginning, and thus are only expected to work approximately. So it is not possible to rule out miracles in the way supposed by David Hume.

It also does not follow that such laws have to be particularly simple. A simpler law will be more likely than a more complex one, for the reasons given in a previous post, but theoretically the laws governing a falling body could have 500 variables, which would be simpler than ones having 50,000 variables. In practice however this does not tend to be the case, or at least we can find extremely good approximate laws with very few variables. It may simply be the case that in order to have a world with animals in it, the world needs to be fairly predictable to them, and this may require that fairly simple laws work at least as a good approximation. But a mathematical demonstration of this would be extremely difficult, if it turns out to be possible at all.

Advertisements

Simplicity and Probability

Given some reasonable postulates regarding the formulation of explanatory hypotheses, it can be mathematically demonstrated that a probability distribution over all possible explanations will be biased toward simpler explanations — in an overall way the simpler explanations will be more probable than the more complex ones, although there may be individual exceptions.

We make the following postulates:

1) The explanatory hypotheses are described by a language that has a finite number of different words, and each hypothesis is expressed by a finite number of these words. That this allows for natural languages such as English, but also for computer programming languages and so on. The proof in this post will be valid for all cases. This is a reasonable assumption since human beings do not use any infinite languages, nor do they use an infinite number of words to make a point.

2) A complexity measure is assigned to the hypotheses in such a way that there are or may be some hypotheses which are as simple as possible, and these are assigned the complexity measure of 1, while hypotheses considered to be more complex are assigned higher integer values such as 2, 3, 4, and so on. Note that apart from this, we can define the complexity measure in any way we like, for example as the number of words used by the hypothesis, or in another way, as the shortest program which can output the hypothesis in a given programming language. Many other definitions would be possible. The proof is valid for all definitions that follow the conditions laid out, even the ones which would be intuitively somewhat distant from the idea of something simple. This again is a reasonable assumption given what we mean by simplicity — we do not think it is possible to make a thing infinitely simpler, but there is always something simplest.

3) The complexity measure is also defined in such a way that there are a finite number of hypotheses given the measure of 1, a finite number given the measure of 2, a finite number given the measure of 3, and so on. Note that this condition is not difficult to satisfy; it would be satisfied by either of the definitions mentioned in condition 2, and in fact by any reasonable definition of simplicity and complexity. If there are an infinite number of hypotheses that are supposedly absolutely simple (with the measure of 1), and we describe these hypotheses in English, an infinite number of them will not be able to be described without using at least 10,000 words, or without using at least 100,000 words, and so on. This seems very remote from the idea of a simple explanation.

With these three conditions the proof follows of necessity. To explain any data, in general there will be infinitely many mutually exclusive hypotheses which could fit the data. Suppose we assign prior probabilities for all of these hypotheses. Given condition 3, it will be possible to find the average probability for hypotheses of complexity 1 (call it x1), the average probability for hypotheses of complexity 2 (call it x2), the average probability for hypotheses of complexity 3 (call it x3), and so on. Now consider the infinite sum “x1 + x2 + x3…” Since all of these values are positive (and non-zero, since we consider each hypothesis to be at least possible), either the sum converges to a positive value, or it diverges to positive infinity. In fact, it will converge to a value less than 1, since if we had multiplied each term of the series by the number of hypotheses with the corresponding complexity, it would have converged to exactly 1, since the sum of all the probabilities of all our mutually exclusive hypotheses should be exactly 1.

Now, x1 is a finite real number. So in order for this series to converge, there must be only a finite number of terms in the series equal to or greater than x1, and therefore some last term which is equal to or greater than x1. There will therefore be some complexity value, y1, such that all hypotheses with a complexity value greater than y1 have an average probability of less than x1 (the average being taken over the hypotheses with the same complexity value, as above). Likewise for x2: there will be some complexity value y2 such that all hypotheses with a complexity value greater than y2 have an average probability of less than x2. Leaving the derivation for the reader, it would also follow that there is some complexity value z1 such that all hypotheses with a complexity value greater than z1 have a lower probability than any hypothesis with a complexity value of 1, some other complexity value z2 such that all hypotheses with a complexity value greater than z2 have a lower probability than any hypothesis of complexity values 1 or 2, and so on.

From this it is clear that as the complexity tends to infinity, the probability of the hypothesis will tend toward zero in the limit. This will happen in such a way that for any particular probability (e.g. one in a billion), there will be some degree of complexity such that every hypothesis at least that complex will be less probable than the chosen probability (e.g. less than one in a billion.)

More on Induction

Using the argument in the previous post, we could argue that the probability that “every human being is less than 10 feet tall” must increase every time we see another human being less than 10 feet tall, since the probability of this evidence (“the next human being I see will be less than 10 feet tall”), given the hypothesis, is 100%.

On the other hand, if tomorrow we come upon a human being 9 feet 11 inches tall, in reality our subjective probability that there is a 10 foot tall human being will increase, not decrease. So is there something wrong with the math here? Or with our intuitions?

In fact, the problem is neither with the math nor with the intuitions. Given that every human being is less than 10 feet tall, the probability that “the next human being I see will be less than 10 feet tall” is indeed 100%, but the probability that “there is a human being 9 feet 11 inches tall” is definitely not 100%, but much lower. So the math here updates on a single aspect of our evidence, while our intuition is taking more of the evidence into account.

But this math seems to work because we are trying to induce a universal which includes the evidence: if every human being is less than 10 feet tall, so is each individual. Suppose instead we try to go from one particular to another: I see a black crow today. Does it become more probable that a crow I see tomorrow will also be black? We know from the above reasoning that it becomes more probable that all crows are black, and one might suppose that it therefore follows that it becomes more probable that the next crow I see will be black. But this does not follow, since this would be attempting to apply transitivity to evidence. The probability of “I see a black crow today”, given that “I see a black crow tomorrow,” is certainly not 100%, and so the probability of seeing a black crow tomorrow, given that I see one today, may increase or decrease depending on our prior probability distribution – no necessary conclusion can be drawn.

On the other hand, we would not want in any case to draw such a necessary conclusion: even in practice we don’t always update our estimate in the same direction in such cases. If we know there is only one white marble in a bucket, and many black ones, then when we draw the white marble, we become very sure the next draw will not be white. Note however that this depends on knowing something about the contents of the bucket, namely that there is only one white marble. If we are completely ignorant about the contents of the bucket, then we form universal hypotheses about the contents based on the draws we have seen. And such hypotheses do indeed increase in probability when they are confirmed, as was shown in the previous post.

Hume’s Error on Induction

David Hume is well known for having argued that it is impossible to find reasonable grounds for induction:

Our foregoing method of reasoning will easily convince us, that there can be no demonstrative arguments to prove, that those instances, of which we have had no experience, resemble those, of which we have had experience. We can at least conceive a change in the course of nature; which sufficiently proves, that such a change is not absolutely impossible. To form a clear idea of any thing, is an undeniable argument for its possibility, and is alone a refutation of any pretended demonstration against it.

Probability, as it discovers not the relations of ideas, considered as such, but only those of objects, must in some respects be founded on the impressions of our memory and senses, and in some respects on our ideas. Were there no mixture of any impression in our probable reasonings, the conclusion would be entirely chimerical: And were there no mixture of ideas, the action of the mind, in observing the relation, would, properly speaking, be sensation, not reasoning. ‘Tis therefore necessary, that in all probable reasonings there be something present to the mind, either seen or remembered; and that from this we infer something connected with it, which is not seen nor remembered.

The only connection or relation of objects, which can lead us beyond the immediate impressions of our memory and senses, is that of cause and effect; and that because ’tis the only one, on which we can found a just inference from one object to another. The idea of cause and effect is derived from experience, which informs us, that such particular objects, in all past instances, have been constantly conjoined with each other: And as an object similar to one of these is supposed to be immediately present in its impression, we thence presume on the existence of one similar to its usual attendant. According to this account of things, which is, I think, in every point unquestionable, probability is founded on the presumption of a resemblance betwixt those objects, of which we have had experience, and those, of which we have had none; and therefore ’tis impossible this presumption can arise from probability. The same principle cannot be both the cause and effect of another; and this is, perhaps, the only proposition concerning that relation, which is either intuitively or demonstratively certain.

Should any one think to elude this argument; and without determining whether our reasoning on this subject be derived from demonstration or probability, pretend that all conclusions from causes and effects are built on solid reasoning: I can only desire, that this reasoning may be produced, in order to be exposed to our examination.

You cannot prove that the sun will rise tomorrow, Hume says; nor can you prove that it is probable. Either way, you cannot prove it without assuming that the future will necessarily be like the past, or that the future will probably be like the past, and since you have not yet experienced the future, you have no reason to believe these things.

Hume is mistaken, and this can be demonstrated mathematically with the theory of probability, unless Hume asserts that he is absolutely certain that future will definitely not be like the past; that he is absolutely certain that the world is about to explode into static, or something of the kind.

Suppose we consider the statement S, “The sun will rise every day for at least the next 10,000 days,” assigning it a probability p of 1%. Then suppose we are given evidence E, namely that the sun rises tomorrow. Let us suppose the prior probability of E is 50% — we did not know if the future was going to be like the past, so in order not to be biased we assigned each possibility a 50% chance. It might rise or it might not. Now let’s suppose that it rises the next morning. We now have some new evidence for S. What is our updated probability? According to Bayes’ theorem, our new probability will be:

P(S|E) = P(E|S)P(S)/P(E) = p/P(E) = 2%, because given that the sun will rise every day for the next 10,000 days, it will certainly rise tomorrow. So our new probability is greater than the original p. It is easy enough to show that if the sun continues to rise for many more days, the probability of S will soon rise to 99% and higher. This is left as an exercise to the reader. Note that none of this process depends upon assuming that the future will be like the past, or that the future will probably be like the past. The only way out for Hume is to say that the probability of S is either 0 or infinitesimal; in order to reject this argument, he must assert that he is absolutely certain that the sun will not continue to rise for a long time, and in general that he is absolutely certain that the future will resemble the past in no way.

Evidence and Implication

Evidence and logical implication can be compared; we can say that logical implication is conclusive evidence, or that evidence is a sort of weak implication.

Evidence is commutative. If A is evidence for B, B is evidence for A. But logical implication is not; if A implies B, B does not necessarily imply A. However, even in the case of implication we can say that if A implies B, B is evidence for A.

Implication is transitive. If A implies B, and B implies C, then A implies C. We might be tempted to think that evidence will be transitive as well, so that if A is evidence for B, and B is evidence for C, A will be evidence for C. But this is not necessarily the case; this sort of thinking can lead to believing that the evidence can change sides. Attempting to make evidence transitive is like trying to draw a conclusion from a syllogism without any universal terms; if A is evidence for B, then some B cases are A cases, but not necessarily all of them; if every B case were an A case, then A would imply B, not merely be evidence for it. So if A is evidence for B, and B for C, then some C cases are B cases, and some B cases are A cases; but we cannot conclude that any C cases are A cases. A and C may very well be entirely disjoint. Thus the theory of evolution, taken as given, is evidence that transitional fossils between man and ape can be found; and the finding of such a transitional fossil is evidence for the (completely implausible) theory that some fossils have been preserved from every kind of organism that has ever inhabited the earth. But the theory of evolution taken as given does not provide evidence for the completely implausible theory; rather, the theory of evolution and the completely implausible theory would refute one another, at least if we are also given a little bit of background knowledge.

Ray Kurzweil’s Myth of Progress

I have taken an optimistic view of progress on this blog, but it is possible to take any position to an extreme. Ray Kurzweil’s position on progress, as expressed in his 2005 book The Singularity is Near, is wild enough to seem almost a caricature.

He constantly asserts that nearly every kind of change is accelerating exponentially:

The key idea underlying the impending Singularity is that the pace of change of our human-created technology is accelerating and its powers are expanding at an exponential pace. Exponential growth is deceptive. It starts out almost imperceptibly and then explodes with unexpected fury— unexpected, that is, if one does not take care to follow its trajectory. (pp. 7-8)

We are now in the early stages of this transition. The acceleration of paradigm shift (the rate at which we change fundamental technical approaches) as well as the exponential growth of the capacity of information technology are both beginning to reach the “knee of the curve,” which is the stage at which an exponential trend becomes noticeable. Shortly after this stage, the trend quickly becomes explosive. Before the middle of this century, the growth rates of our technology— which will be indistinguishable from ourselves— will be so steep as to appear essentially vertical. From a strictly mathematical perspective, the growth rates will still be finite but so extreme that the changes they bring about will appear to rupture the fabric of human history. That, at least, will be the perspective of unenhanced biological humanity. (p. 9)

In other words, Kurzweil believes that the ends of the ages have come upon us, although in a new, secular way. However, he cannot say that we have reached the “explosive” point quite yet, because if that were true, everyone would notice. In order to explain the fact that people haven’t noticed it yet, he has to say that we are just about to reach that point. It should be noted that this was written 10 years ago, so it is pretty reasonable to say that it has already been falsified, since we still haven’t noticed any explosion.

He uses the “exponential” idea to make definite claims about how much progress is made or will be made in various periods of time, as for example here:

Most long-range forecasts of what is technically feasible in future time periods dramatically underestimate the power of future developments because they are based on what I call the “intuitive linear” view of history rather than the “historical exponential” view. My models show that we are doubling the paradigm-shift rate every decade, as I will discuss in the next chapter. Thus the twentieth century was gradually speeding up to today’s rate of progress; its achievements, therefore, were equivalent to about twenty years of progress at the rate in 2000. We’ll make another twenty years of progress in just fourteen years (by 2014), and then do the same again in only seven years. To express this another way, we won’t experience one hundred years of technological advance in the twenty-first century; we will witness on the order of twenty thousand years of progress (again, when measured by today’s rate of progress), or about one thousand times greater than what was achieved in the twentieth century. (p.11)

If this is not clear, he is claiming here that the amount of change in the world that was made between the year 1900 and the 2000 was the same as the amount of change between the year 2000 and the year 2014. It is possible for him to make this claim because he was writing in the year 2005 and expected impossible changes in the next 10 years. But if someone in the year 1900 were to use a time machine to travel to the year 2000 and then to the year 2014, there is simply no way they would claim that the two periods contained an equal amount of change. I’m not sure how one would mathematically formalize this but Kurzweil’s claim here would be a lot like saying that the difference between blue and pink is about the same as the difference between two shades of pink.

He is quite definite about what he expects to happen by various dates:

The current disadvantages of Web-based commerce (for example, limitations in the ability to directly interact with products and the frequent frustrations of interacting with inflexible menus and forms instead of human personnel) will gradually dissolve as the trends move robustly in favor of the electronic world. By the end of this decade, computers will disappear as distinct physical objects, with displays built in our eyeglasses, and electronics woven in our clothing, providing full-immersion visual virtual reality. Thus, “going to a Web site” will mean entering a virtual-reality environment— at least for the visual and auditory senses— where we can directly interact with products and people, both real and simulated. (pp. 104-105)

“By the end of this decade” refers to the year 2010.

The full-immersion visual-auditory virtual-reality environments, which will be ubiquitous during the second decade of this century, will hasten the trend toward people living and working wherever they wish. Once we have full-immersion virtual-reality environments incorporating all of the senses, which will be feasible by the late 2020s, there will be no reason to utilize real offices. Real estate will become virtual. (p. 105)

This is not yet completely disproved but there is not much more time for the “ubiquitous” virtual reality environments.

Returning to the limits of computation according to physics, the estimates above were expressed in terms of laptop-size computers because that is a familiar form factor today. By the second decade of this century, however, most computing will not be organized in such rectangular devices but will be highly distributed throughout the environment. Computing will be everywhere: in the walls, in our furniture, in our clothing, and in our bodies and brains. (p. 136)

No comment on this prediction is necessary. Along the same lines:

Early in the second decade of this century, the Web will provide full immersion visual-auditory virtual reality with images written directly to our retinas from our eyeglasses and lenses and very high-bandwidth wireless Internet access woven in our clothing. These capabilities will not be restricted just to the privileged. Just like cell phones, by the time they work well they will be everywhere. (p. 472)

Apart from particular predictions, there are obvious general problems with his claims about exponentially accelerating change. A day lasts 24 hours, and that isn’t going to change. It takes a human being 18 years (or more, depending on how you define it) to grow to adulthood, and that isn’t going to change.  Ray apparently believes that such things make no difference:

Each example of information technology starts out with early-adoption versions that do not work very well and that are unaffordable except by the elite. Subsequently the technology works a bit better and becomes merely expensive. Then it works quite well and becomes inexpensive. Finally it works extremely well and is almost free. The cell phone, for example, is somewhere between these last two stages. Consider that a decade ago if a character in a movie took out a portable telephone, this was an indication that this person must be very wealthy, powerful, or both. Yet there are societies around the world in which the majority of the population were farming with their hands two decades ago and now have thriving information-based economies with widespread use of cell phones (for example, Asian societies, including rural areas of China). This lag from very expensive early adopters to very inexpensive, ubiquitous adoption now takes about a decade. But in keeping with the doubling of the paradigm-shift rate each decade, this lag will be only five years a decade from now. In twenty years, the lag will be only two to three years. (p. 469)

In the first place, his description of what happened with cell phones is not historically accurate. The use of cell phones in the USA in 1995 was indeed rare, but it was already quite a bit more common in Europe, and certainly did not indicate that someone must be wealthy or powerful (e.g. in 1996 one of my many European acquaintances who possessed cell phones was a teen-age girl from a broken family.) In general he shortens various actual time frames in this way in order to cause the appearance of greater acceleration; the actual process of cell phone adoption would be better assigned a 20 year period. It is also a fallacy to confuse movement which people see as being within a single technology, e.g. from cell phones in general to smart phones, with the adoption of a new technology. And it doesn’t matter here whether or not there is really a new technology or not; what matters is whether people see it as adopting something new, because people are much more unwilling to adopt a new technology than to upgrade a presently used one. As one example, Ray was right to predict the coming of virtual reality technologies, even though his time frame was wrong, and these are currently being developed. But according to him, their wide-spread adoption should take less than five years, and it is already obvious that this will turn out to be entirely false.

Ray’s book is hundreds of pages long, and one could easily write an entire book in refutation. In addition to being wrong in his specific expectations, there are many religious, philosophical, and moral issues that are raised by many of his proposals, which I have not discussed. However, it should be noted that for the most part his predictions are probably physically possible, although exaggerated, and may actually happen sooner or later, with the exception of some of the more extreme claims. But I suspect that there is more going on than simply exaggerating and predicting that various technologies will arrive sooner than they actually will.

Rather, it seems that Ray’s motives are quasi-religious; as I stated above, he believes that the end of the ages is upon us. He discusses this comparison himself:

George Gilder has described my scientific and philosophical views as “a substitute vision for those who have lost faith in the traditional object of religious belief.” Gilder’s statement is understandable, as there are at least apparent similarities between anticipation of the Singularity and anticipation of the transformations articulated by traditional religions. But I did not come to my perspective as a result of searching for an alternative to customary faith. The origin of my quest to understand technology trends was practical: an attempt to time my inventions and to make optimal tactical decisions in launching technology enterprises. Over time this modeling of technology took on a life of its own and led me to formulate a theory of technology evolution. It was not a huge leap from there to reflect on the impact of these crucial changes on social and cultural institutions and on my own life. So, while being a Singularitarian is not a matter of faith but one of understanding, pondering the scientific trends I’ve discussed in this book inescapably engenders new perspectives on the issues that traditional religions have attempted to address: the nature of mortality and immortality, the purpose of our lives, and intelligence in the universe. (p. 370)

But the fact that someone recognizes the possibility of undue influences upon his beliefs and claims to be free of them does not mean that he is actually free of them. Ray Kurzweil is currently 67 years old. He will likely die within 25 years. It is perfectly clear that one of his main preoccupations is how to prevent this from happening:

Biotechnology will extend biology and correct its obvious flaws. The overlapping revolution of nanotechnology will enable us to expand beyond the severe limitations of biology. As Terry Grossman and I articulated in Fantastic Voyage: Live Long Enough to Live Forever, we are rapidly gaining the knowledge and the tools to indefinitely maintain and extend the “house” each of us calls his body and brain. Unfortunately the vast majority of our baby-boomer peers are unaware of the fact that they do not have to suffer and die in the “normal” course of life, as prior generations have done— if they take aggressive action, action that goes beyond the usual notion of a basically healthy lifestyle. (p. 323)

And this is not merely some vague hope, but a belief that he acts upon:

I have been very aggressive about reprogramming my biochemistry. I take 250 supplements (pills) a day and receive a half-dozen intravenous therapies each week (basically nutritional supplements delivered directly into my bloodstream, thereby bypassing my GI tract). As a result, the metabolic reactions in my body are completely different than they would otherwise be. Approaching this as an engineer, I measure dozens of levels of nutrients (such as vitamins, minerals, and fats), hormones, and metabolic by-products in my blood and other body samples (such as hair and saliva). Overall, my levels are where I want them to be, although I continually fine-tune my program based on the research that I conduct with Grossman. Although my program may seem extreme, it is actually conservative— and optimal (based on my current knowledge). Grossman and I have extensively researched each of the several hundred therapies that I use for safety and efficacy. I stay away from ideas that are unproven or appear to be risky (the use of human-growth hormone, for example). (pp. 211-212)

In other words, it is very likely that Kurzweil’s predictions are as ridiculous as they are because he insists on a time frame to his “Singularity” that will allow him personally to avoid death. It won’t help him, for example, if all his predictions come to pass, but everything happens after he is dead.

But it won’t work, Ray. You are going to die.