Statistical Laws of Choice

I noted in an earlier post the necessity of statistical laws of nature. This will necessarily apply to human actions as a particular case, as I implied there in mentioning the amount of food humans eat in a year.

Someone might object. It was said in the earlier post that this will happen unless there is a deliberate attempt to evade this result. But since we are speaking of human beings, there might well be such an attempt. So for example if we ask someone to choose to raise their right hand or their left hand, this might converge to an average, such as 50% each, or perhaps the right hand 60% of the time, or something of this kind. But presumably someone who starts out with the deliberate intention of avoiding such an average will be able to do so.

Unfortunately, such an attempt may succeed in the short run, but will necessarily fail in the long run, because although it is possible in principle, it would require an infinite knowing power, which humans do not have. As I pointed out in the earlier discussion, attempting to prevent convergence requires longer and longer strings on one side or the other. But if you need to raise your right hand a few trillion times before switching again to your left, you will surely lose track of your situation. Nor can you remedy this by writing things down, or by other technical aids: you may succeed in doing things trillions of times with this method, but if you do it forever, the numbers will also become too large to write down. Naturally, at this point we are only making a theoretical point, but it is nonetheless an important one, as we shall see later.

In any case, in practice people do not tend even to make such attempts, and consequently it is far easier to predict their actions in a roughly statistical manner. Thus for example it would not be hard to discover the frequency with which an individual chooses chocolate ice cream over vanilla.

Telephone Game

Victor Reppert says at his blog,

1. If the initial explosion of the big bang had differed in strength by as little as one part in 10\60, the universe would have either quickly collapsed back on itself, or
expanded [too] rapidly for stars to form. In either case, life would be impossible.
2. (An accuracy of one part in 10 to the 60th power can be compared to firing a bullet at a one-inch target on the other side of the observable universe, twenty billion light years away, and hitting the target.)

The claim seems a bit strong. Let x be a measurement in some units of the strength of “the inital explosion of the big bang.” Reppert seems to be saying that if x were increased or decreased by x / (10^60), then the universe would have either collapsed immediately, or it would have expanded without forming stars, so that life would have been impossible.

It’s possible that someone could make a good argument for that claim. But the most natural argument for that claim would be to say something like this, “We know that x had to fall between y and z in order to produce stars, and y and z are so close together that if we increased or decreased x by one part in 10^60, it would fall outside y and z.” But this will not work unless x is already known to fall between y and z. And this implies that we have measured x to a precision of 60 digits.

I suspect that no one, ever, has measured any physical thing to a precision of 60 digits, using any units or any form of measurement. This suggests that something about Reppert’s claim is a bit off.

In any case, the fact that 10^60 is expressed by “10\60”, and the fact that Reppert omits the word “too” mean that we can trace his claim fairly precisely. Searching Google for the exact sentence, we get this page as the first result, from November 2011. John Piippo says there:

1. If the initial explosion of the big bang had differed in strength by as little as one part in 10\60, the universe would have either quickly collapsed back on itself, or expanded [too] rapidly for stars to form. In either case, life would be impossible. (An accuracy of one part in 10 to the 60th power can be compared to firing a bullet at a one-inch target on the other side of the observable universe, twenty billion light years away, and hitting the target.)

Reppert seems to have accidentally or deliberately divided this into two separate points; number 2 in his list does not make sense except as an observation on the first, as it is found here. Piippo likewise omits the word “too,” strongly suggesting that Piippo is the direct source for Reppert, although it is also possible that both borrowed from a third source.

We find an earlier form of the claim here, made by Robin Collins. It appears to date from around 1998, given the statement, “This work was made possible in part by a Discovery Institute grant for the fiscal year 1997-1998.” Here the claim stands thus:

1. If the initial explosion of the big bang had differed in strength by as little as 1 part in 1060, the universe would have either quickly collapsed back on itself, or expanded too rapidly for stars to form. In either case, life would be impossible. [See Davies, 1982, pp. 90-91. (As John Jefferson Davis points out (p. 140), an accuracy of one part in 1060 can be compared to firing a bullet at a one-inch target on the other side of the observable universe, twenty billion light years away, and hitting the target.)

Here we still have the number “1.”, and the text is obviously the source for the later claims, but the word “too” is present in this version, and the claims are sourced. He refers to The Accidental Universe by Paul Davies. Davies says on page 88:

It follows from (4.13) that if p > p_crit then > 0, the universe is spatially closed, and will eventually contract. The additional gravity of the extra-dense matter will drag the galaxies back on themselves. For p_crit, the gravity of the cosmic matter is weaker and the universe ‘escapes’, expanding unchecked in much the same way as a rapidly receding projectile. The geometry of the universe, and its ultimate fate, thus depends on the density of matter or, equivalently, on the total number of particles in the universe, N. We are now able to grasp the full significance of the coincidence (4.12). It states precisely that nature has chosen to have a value very close to that required to yield a spatially flat universe, with = 0 and p = p_crit.

Then, at the end of page 89, he says this:

At the Planck time – the earliest epoch at which we can have any confidence in the theory – the ratio was at most an almost infinitesimal 10-60. If one regards the Planck time as the initial moment when the subsequent cosmic dynamics were determined, it is necessary to suppose that nature chose to differ from p_crit by no more than one part in 1060.

Here we have our source. “The ratio” here refers to (p – p_crit) / p_crit. In order for the ratio to be this small, has to be almost equal to p_crit. In fact, Davies says that this ratio is proportional to time. If we set time = 0, then we would get a ratio of exactly 0, so that p = p_crit. Davies rightly states that the physical theories in question cannot work this way: under the theory of the Big Bang, we cannot discuss the state of the universe at t = 0 and expect to get sensible results. Nonetheless, this suggests that something is wrong with the idea that anything has been calibrated to one part in 1060. Rather, two values have started out basically equal and grown apart throughout time, so that if you choose an extremely small value of time, you get an extremely small difference in the two values.

This also verifies my original suspicion. Nothing has been measured to a precision of 60 digits, and a determination made that the number measured could not vary by one iota. Instead, Davies has simply taken a ratio that is proportional to time, and calculated its value with a very small value of time.

 

There is a real issue here, and it is the question, “Why is the universe basically flat?” But whatever the answer to this question may be, the question, and presumably its answer, are quite different from the claim that physics contains constants that are constrained to the level of “one part in 1060.” To put this another way: if you answer the question, “Why is the universe flat?” with a response of the form, “Because = 1892592714.2256399288581158185662151865333331859591, and if it had been the slightest amount more or less than this, the universe would not have been flat,” then your answer is very likely wrong. There is likely to be a simpler and more general answer to the question.

Reppert in fact agrees, and that is the whole point of his argument. For him, the simpler and more general answer is that God planned it that way. That may be, but it should be evident that there is nothing that demands either this answer or an answer of the above form. There could be any number of potential answers.

Playing the telephone game and expecting to get a sensible result is a bad idea. If you take a statement from someone else and restate it without a source, and your source itself has no source, it is quite possible that your statement is wrong and that the original claim was quite different. Even apart from this, however, Reppert is engaging in a basically mistaken enterprise. In essence, he is making a philosophical argument, but attempting to give the appearance of supporting it with physics and mathematics. This is presumably because these topics are less remote from the senses. If Reppert can convince you that his argument is supported by physics and mathematics, you will be likely to think that reasonable disagreement with his position is impossible. You will be less likely to be persuaded if you recognize that his argument remains a philosophical one.

There are philosophical arguments for the existence of God, and this blog has discussed such arguments. But these arguments belong to philosophy, not to science.

 

 

 

 

Bias vs. Variance

Scott Fortmann-Roe explains the difference between error due to bias and error due to variance:

  • Error due to Bias: The error due to bias is taken as the difference between the expected (or average) prediction of our model and the correct value which we are trying to predict. Of course you only have one model so talking about expected or average prediction values might seem a little strange. However, imagine you could repeat the whole model building process more than once: each time you gather new data and run a new analysis creating a new model. Due to randomness in the underlying data sets, the resulting models will have a range of predictions. Bias measures how far off in general these models’ predictions are from the correct value.
  • Error due to Variance: The error due to variance is taken as the variability of a model prediction for a given data point. Again, imagine you can repeat the entire model building process multiple times. The variance is how much the predictions for a given point vary between different realizations of the model.

Later in the essay, he suggests that there is a natural tendency to overemphasize minimizing bias:

A gut feeling many people have is that they should minimize bias even at the expense of variance. Their thinking goes that the presence of bias indicates something basically wrong with their model and algorithm. Yes, they acknowledge, variance is also bad but a model with high variance could at least predict well on average, at least it is not fundamentally wrong.

This is mistaken logic. It is true that a high variance and low bias model can perform well in some sort of long-run average sense. However, in practice modelers are always dealing with a single realization of the data set. In these cases, long run averages are irrelevant, what is important is the performance of the model on the data you actually have and in this case bias and variance are equally important and one should not be improved at an excessive expense to the other.

Fortmann-Roe is concerned here with bias and variance in a precise mathematical sense, relative to the project of fitting a curve to a set of data points. However, his point could be generalized to apply much more generally, to interpreting and understanding the world overall. Tyler Cowen makes such a generalized point:

Arnold Kling summarizes Robin’s argument:

If you have a cause, then other people probably disagree with you (if nothing else, they don’t think your cause is as important as you do). When other people disagree with you, they are usually more right than you think they are. So you could be wrong. Before you go and attach yourself to this cause, shouldn’t you try to reduce the chances that you are wrong? Ergo, shouldn’t you work on trying to overcome bias? Therefore, shouldn’t overcoming bias be your number one cause?

Here is Robin’s very similar statement.  I believe these views are tautologically true and they simply boil down to saying that any complaint can be expressed as a concern about error of some kind or another.  I cannot disagree with this view, for if I do, I am accusing Robin of being too biased toward eliminating bias, thus reaffirming that bias is in fact the real problem.

I find it more useful to draw an analogy with statistics.  Biased estimators are one problem but not the only problem.  There is also insufficient data, lazy researchers, inefficient estimators, and so on.  Then I don’t see why we should be justified in holding a strong preference for overcoming bias, relative to other ends.

Tyler is arguing, for example, that someone may be in error because he is biased, but he can also be in error because he is too lazy to seek out the truth, and it may be more important in a particular case to overcome laziness than to overcome bias.

This is true, no doubt, but we can make a stronger point: In the mathematical discussion of bias and variance, insisting on a completely unbiased model will result in a very high degree of variance, with the nearly inevitable consequence of a higher overall error rate. Thus, for example, we can create a polynomial which will go through every point of the data exactly. Such a method of predicting data is completely unbiased. Nonetheless, such a model tends to be highly inaccurate in predicting new data due to its very high variance: the exact curve is simply too sensitive to the exact points found in the original data. In a similar way, even in the more general non-mathematical case, we will likely find that insisting on a completely unbiased method will result in greater error overall: the best way to find the truth may be to adopt a somewhat simplified model, just as in the mathematical case it is best not to try to fit the data exactly. Simplifying the model will introduce some bias, but it will also reduce variance.

To the best of my knowledge, no one has a demonstrably perfect method of adopting the best model, even in the mathematical case. Much less, therefore, can we come up with a perfect trade-off between bias and variance in the general case. We can simply use our best judgment. But we have some reason for thinking that there must be some such trade-off, just as there is in the mathematical case.

The Actual Infinite

There are good reasons to think that actual infinities are possible in the real world. In the first place, while the size and shape of the universe are not settled issues, the generally accepted theory fits better with the idea that the universe is physically infinite than with the idea that it is finite.

Likewise, the universe is certainly larger than the size of the observable universe, namely about 93 billion light years in diameter. Supposing you have a probability distribution which assigns a finite probability to the claim that the universe is physically infinite, there is no consistent probability distribution which will not cause the probability of an infinite universe to go to 100% at the limit, as you exclude smaller finite sizes. But if someone had assigned a reasonable probability distribution before modern physical science existed, it would very likely have been one that make the probability of an infinite universe go very high by the time the universe was confirmed to be its present size. Therefore we too should think that the universe is very probably infinite. In principle, this argument is capable of refuting even purported demonstrations of the impossibility of an actual infinite, since there is at least some small chance that these purported demonstrations are all wrong.

Likewise, almost everyone accepts the possibility of an infinite future. Even the heat death of the universe would not prevent the passage of infinite time, and a religious view of the future also generally implies the passage of infinite future time. Even if heaven is supposed to be outside time in principle, in practice there would still be an infinite number of future human acts. If eternalism or something similar is true, then an infinite future in itself implies an actual infinite. And even if such a theory is not true, it is likely that a potentially infinite future implies the possibility of an actual infinite, because any problematic or paradoxical results from an actual infinite can likely be imitated in some way in the case of an infinite future.

On the other hand, there are good reasons to think that actual infinities are not possible in the real world. Positing infinities results in paradoxical or contradictory results in very many cases, and the simplest and therefore most likely way to explain this is to admit that infinities are simply impossible in general, even in the cases where we have not yet verified this fact.

An actual infinite also seems to imply an infinite regress in causality, and such a regress is impossible. We can see this by considering the material cause. Suppose the universe is physically infinite, and contains an infinite number of stars and planets. Then the universe is composed of the solar system together with the rest of the universe. But the rest of the universe will be composed of another stellar system together with the remainder, and so on. So there will be an infinite regress of material causality, which is just as impossible with material causality as with any other kind of causality.

Something similar is implied by St. Thomas’s argument against an infinite multitude:

This, however, is impossible; since every kind of multitude must belong to a species of multitude. Now the species of multitude are to be reckoned by the species of numbers. But no species of number is infinite; for every number is multitude measured by one. Hence it is impossible for there to be an actually infinite multitude, either absolute or accidental.

We can look at this in terms of our explanation of defining numbers. This explanation works only for finite numbers, and an infinite number could not be defined in such a way, precisely because it would result in an infinite regress. This leads us back to the first argument above against infinities: an infinity is intrinsically undefined and unintelligible, and for that reason leads to paradoxes. Someone might say that something unintelligible cannot be understood but is not impossible; but this is no different from Bertrand Russell saying that there is no reason for things not to come into being from nothing, without a cause. Such a position is unreasonable and untrue.

Spinoza’s Geometrical Ethics

Benedict Spinoza, admiring the certainty of geometry, writes his Ethics Demonstrated in Geometrical Order in a manner imitating that of Euclid’s Elements.

Omitting his definitions and axioms for the moment, we can look at his proofs. Thus we have the first:

1: A substance is prior in nature to its states. This is evident from D3 and D5.

The two definitions are of “substance” and “mode,” which latter he equates with “state of a substance.” However, neither definition explains “prior in nature,” nor is this found in any of the other definitions and axioms.

Thus his argument does not follow. But we can grant that the claim is fairly reasonable in any case, and would follow according to many reasonable definitions of “prior in nature,” and according to reasonable axioms.

He proceeds to his second proof:

2: Two substances having different attributes have nothing in common with one another. This is also evident from D3. For each ·substance· must be in itself and be conceived through itself, which is to say that the concept of the one doesn’t involve the concept of the other.

D3 and D4 (which must be used here although he does not cite it explicitly in the proof) say:

D3: By ‘substance’ I understand: what is in itself and is conceived through itself, i.e. that whose concept doesn’t have to be formed out of the concept of something else. D4: By ‘attribute’ I understand: what the intellect perceives of a substance as constituting its essence.

Thus when he speaks of “substances having different attributes,” he means ones which are intellectually perceived as being different in their essence.

Once again, however, “have nothing in common” is not found in his definitions. However, it occurs once in his axioms, namely in A5:

A5: If two things have nothing in common, they can’t be understood through one another—that is, the concept of one doesn’t involve the concept of the other.

The axiom is pretty reasonable, at least taken in a certain way. If there is no idea common to the ideas of two things, the idea of one won’t be included in the idea of the other. But Spinoza is attempting to draw the conclusion that “if two substances have different attributes, i.e. are different in essence, then they have nothing in common.” But this does not seem to follow from a reasonable understanding of D3 and D4, nor from the definitions together with the axioms. “Dog” and “cat” might be substances, and the idea of dog does not include that of cat, nor cat the idea of dog, but they have “animal” in common. So his conclusion is not evident from the definition, nor does it follow logically from his definitions and axioms, nor does it seem to be true.

And this is only the second supposed proof out of 36 in part 1 of his book.

I would suggest that there are at least two problems with his whole project. First, Spinoza knows where he wants to get, and it is not somewhere good. Among other things, he is aiming for proposition 14:

14: God is the only substance that can exist or be conceived.

This is closely related to proposition 2, since if it is true that two different things can have nothing in common, then it is impossible for more than one thing to exist, since otherwise existence would be something in common to various things.

Proposition 14 is absolutely false taken in any reasonable way. Consequently, since Spinoza is absolutely determined to arrive at a false proposition, he will necessarily employ falsehoods or logical mistakes along the way.

There is a second problem with his project. Geometry speaks about a very limited portion of reality. For this reason it is possible to come to most of its conclusions using a limited variety of definitions and axioms. But ethics and metaphysics, the latter of which is the actual topic of his first book, are much wider in scope. Consequently, if you want to say much that is relevant about them, it is impossible in principle to proceed from a small number of axioms and definitions. A small number of axioms and definitions will necessarily include only a small number of terms, and speaking about ethics and metaphysics requires a large number of terms. For example, suppose I wanted to prove everything on this blog using the method of definitions and axioms. Since I have probably used thousands of terms, hundreds or thousands of definitions and axioms would be required. There would simply be no other way to get the desired conclusions. In a similar way, we saw even in the first few proofs that Spinoza has a similar problem; he wants to speak about a very broad subject, but he wants to start with just a few definitions and axioms.

And if you do employ hundreds of axioms, of course, there is very little chance that anyone is going to grant all of them. They will at least argue that some of them might be mistaken, and thus your proofs will lose the complete certainty that you were looking for from the geometrical method.

 

Numbering The Good

The book Theory of Games and Economic Behavior, by John Von Neumann and Oskar Morgenstern, contains a formal mathematical theory of value. In the first part of the book they discuss some objections to such a project, as well as explaining why they are hopeful about it:

1.2.2. It is not that there exists any fundamental reason why mathematics should not be used in economics. The arguments often heard that because of the human element, of the psychological factors etc., or because there is allegedly no measurement of important factors, mathematics will find no application, can all be dismissed as utterly mistaken. Almost all these objections have been made, or might have been made, many centuries ago in fields where mathematics is now the chief instrument of analysis. This “might have been” is meant in the following sense: Let us try to imagine ourselves in the period which preceded the mathematical or almost mathematical phase of the development in physics, that is the 16th century, or in chemistry and biology, that is the 18th century. Taking for granted the skeptical attitude of those who object to mathematical economics in principle, the outlook in the physical and biological sciences at these early periods can hardly have been better than that in economics, mutatis mutandis, at present.

As to the lack of measurement of the most important factors, the example of the theory of heat is most instructive; before the development of the mathematical theory the possibilities of quantitative measurements were less favorable there than they are now in economics. The precise measurements of the quantity and quality of heat (energy and temperature) were the outcome and not the antecedents of the mathematical theory. This ought to be contrasted with the fact that the quantitative and exact notions of prices, money and the rate of interest were already developed centuries ago.

A further group of objections against quantitative measurements in economics, centers around the lack of indefinite divisibility of economic quantities. This is supposedly incompatible with the use of the infinitesimal calculus and hence (!) of mathematics. It is hard to see how such objections can be maintained in view of the atomic theories in physics and chemistry, the theory of quanta in electrodynamics, etc., and the notorious and continued success of mathematical analysis within these disciplines.

This project requires the possibility of treating the value of things as a numerically measurable quantity. Calling this value “utility”, they discuss the difficulty of this idea:

3.1.2. Historically, utility was first conceived as quantitatively measurable, i.e. as a number. Valid objections can be and have been made against this view in its original, naive form. It is clear that every measurement, or rather every claim of measurability, must ultimately be based on some immediate sensation, which possibly cannot and certainly need not be analyzed any further. In the case of utility the immediate sensation of preference, of one object or aggregate of objects as against another, provides this basis. But this permits us only to say when for one person one utility is greater than another. It is not in itself a basis for numerical comparison of utilities for one person nor of any comparison between different persons. Since there is no intuitively significant way to add two utilities for the same person, the assumption that utilities are of non-numerical character even seems plausible. The modern method of indifference curve analysis is a mathematical procedure to describe this situation.

They note however that the original situation was no different with the idea of quantitatively measuring heat:

3.2.1. All this is strongly reminiscent of the conditions existent at the beginning of the theory of heat: that too was based on the intuitively clear concept of one body feeling warmer than another, yet there was no immediate way to express significantly by how much, or how many times, or in what sense.

Beginning the derivation of their particular theory, they say:

3.3.2. Let us for the moment accept the picture of an individual whose system of preferences is all-embracing and complete, i.e. who, for any two objects or rather for any two imagined events, possesses a clear intuition of preference.

More precisely we expect him, for any two alternative events which are put before him as possibilities, to be able to tell which of the two he prefers.

It is a very natural extension of this picture to permit such an individual to compare not only events, but even combinations of events with stated probabilities.

By a combination of two events we mean this: Let the two events be denoted by B and C and use, for the sake of simplicity, the probability 50%-50%. Then the “combination” is the prospect of seeing B occur with a probability of 50% and (if B does not occur) C with the (remaining) probability of 50%. We stress that the two alternatives are mutually exclusive, so that no possibility of complementarity and the like exists. Also, that an absolute certainty of the occurrence of either B or C exists.

To restate our position. We expect the individual under consideration to possess a clear intuition whether he prefers the event A to the 50-50 combination of B or C, or conversely. It is clear that if he prefers A to B and also to C, then he will prefer it to the above combination as well; similarly, if he prefers B as well as C to A, then he will prefer the combination too. But if he should prefer A to, say B, but at the same time C to A, then any assertion about his preference of A against the combination contains fundamentally new information. Specifically: If he now prefers A to the 50-50 combination of B and C, this provides a plausible base for the numerical estimate that his preference of A over B is in excess of his preference of C over A.

If this standpoint is accepted, then there is a criterion with which to compare the preference of C over A with the preference of A over B. It is well known that thereby utilities, or rather differences of utilities, become numerically measurable. That the possibility of comparison between A, B, and C only to this extent is already sufficient for a numerical measurement of “distances” was first observed in economics by Pareto. Exactly the same argument has been made, however, by Euclid for the position of points on a line in fact it is the very basis of his classical derivation of numerical distances.

It is important to note that the the things being assigned values are described as events. They should not be considered to be actions or choices, or at any rate, only insofar as actions or choices are themselves events that happen in the world. This is important because a person might very well think, “It would be better if A happened than if B happened. But making A happen is vicious, while making B happen is virtuous, so I will make B happen.” He prefers A as an outcome, but the actions which cause these events do not line up, in their moral value, with the external value of the outcomes. Of course, just as the person says that A happening is a better outcome than B happening, he can say that “choosing to make B happen” is a better outcome than “choosing to make A happen.” So in this sense there is nothing to exclude actions from being included in this system of value. But they can only be included insofar as actions themselves are events that happen in the world.

Von Neumann and Morgenstern continue:

The introduction of numerical measures can be achieved even more directly if use is made of all possible probabilities. Indeed: Consider three events, C, A, B, for which the order of the individual’s preferences is the one stated. Let a be a real number between 0 and 1, such that A is exactly equally desirable with the combined event consisting of a chance of probability 1 – a for B and the remaining chance of probability a for C. Then we suggest the use of a as a numerical estimate for the ratio of the preference of A over B to that of C over B.

So for example, suppose that C is an orange (or as an event, eating an orange). is eating a plum, and is eating an apple. The person prefers the orange to the plum, and the plum to the apple. The person prefers a combination of a 20% chance of an apple and an 80% chance of an orange to a plum, while he prefers a plum to a combination of a 40% chance of an apple and a 60% chance of an orange. Since this indicates that his preference changes sides at some point, we suppose that this happens at a 30% chance of an apple and a 70% chance of an orange. All the combinations giving more than a 70% chance of the orange, he prefers to the plum; and he prefers the plum to all the combinations giving less than a 70% chance of the orange. The authors are suggesting that if we assign numerical values to the plum, the apple, and the orange, we should do this in such a way that the difference between the values of the plum and the apple, divided by the difference between the values of the orange and the apple, should be 0.7.

The basic intuition here is that since the combinations of various probabilities of the orange and apple vary continuously from (100% orange, 0% apple) to (0% orange, 100% apple), the various combinations should go continuously through every possible value between the value of the orange and the value of the apple. Since we are passing through those values by changing a probability, they are suggesting mapping that probability directly onto a value. Thus if the value of the orange is 1 and the value of the apple is 0, we say that the value of the plum is 0.7, because the plum is basically equivalent in value to a combination of a 70% chance of the orange and a 30% chance of the apple.

Working this out formally in the later parts of the paper, they show that given that a person’s preferences satisfy certain fairly reasonable axioms, it will be possible to assign values to each of his preferences, and these values are necessarily uniquely determined up to the point of a linear transformation.

I will not describe the axioms themselves here, although they are described in the book, as well as perhaps more simply elsewhere.

Note that according to this system, if you want to know the value of a combination, e.g. (60% chance of A and 40% chance of B), the value will always be 0.6(value of A)+0.4(value of B). The authors comment on this result:

3.7.1. At this point it may be well to stop and to reconsider the situation. Have we not shown too much? We can derive from the postulates (3:A)-(3:C) the numerical character of utility in the sense of (3:2:a) and (3:1:a), (3:1:b) in 3.5.1.; and (3:1:b) states that the numerical values of utility combine (with probabilities) like mathematical expectations! And yet the concept of mathematical expectation has been often questioned, and its legitimateness is certainly dependent upon some hypothesis concerning the nature of an “expectation.” Have we not then begged the question? Do not our postulates introduce, in some oblique way, the hypotheses which bring in the mathematical expectation?

More specifically: May there not exist in an individual a (positive or negative) utility of the mere act of “taking a chance,” of gambling, which the use of the mathematical expectation obliterates?

The objection is this: according to this system of value, if something has a value v, and something else has the double value 2v, the person should consider getting the thing with value v to be completely equal with a deal where he has an exactly 50% chance of getting the thing with value 2v, and a 50% chance of getting nothing. That seems objectionable because many people would prefer a certainty of getting something, to a situation where there is a good chance of getting nothing, even if there is also a chance of getting something more valuable. So for example, if you were now offered the choice of $100,000 directly, or $200,000 if you flip a coin and get heads, and nothing if you get tails, you would probably not only prefer the $100,000, but prefer it to a very high degree.

Morgenstern and Von Neumann continue:

How did our axioms (3:A)-(3:C) get around this possibility?

As far as we can see, our postulates (3:A)-(3:C) do not attempt to avoid it. Even that one which gets closest to excluding a “utility of gambling” (3:C:b) (cf. its discussion in 3.6.2.), seems to be plausible and legitimate, unless a much more refined system of psychology is used than the one now available for the purposes of economics. The fact that a numerical utility, with a formula amounting to the use of mathematical expectations, can be built upon (3:A)-(3:C), seems to indicate this: We have practically defined numerical utility as being that thing for which the calculus of mathematical expectations is legitimate. Since (3:A)-(3:C) secure that the necessary construction can be carried out, concepts like a “specific utility of gambling” cannot be formulated free of contradiction on this level.

“We have practically defined numerical utility as being that thing for which the calculus of mathematical expectations is legitimate.” In other words, the reason for the strange result is that calling a value “double” very nearly simply means that a 50% chance of that value, and a 50% chance of nothing, is considered equal to the original value which was to be doubled.

Considering the case of the $100,000 and $200,000, perhaps it is not so strange after all, even if we think of value in the terms of Von Neumann and Morgenstern. You are benefited if you receive $100,000. But if you receive $100,000, and then another $100,000, how much benefit do you get from the second gift? Just as much? Not at all. The first gift will almost certainly make a much bigger change in your life than the second gift. So even by ordinary standards, getting $200,000 is not twice as valuable as getting $100,000, but less than twice as valuable.

There might be something such that it would have exactly twice the value of $100,000 for you in the Von Neumann-Morgenstern sense. If you care about money enough, perhaps $300,000, or $1,000,000. If so, then you would consider the deal where you flip a coin for this amount of money just as good (considered in advance) as directly receiving $100,000. If you don’t care enough about money for such a thing to be true, there will be something else that you do consider to have twice the value, or more, in this sense. For example, if you have a brother dying of cancer, you would probably prefer that he have a 50% chance of survival, to receiving the $100,000. This means that in the relevant sense, you consider the survival of your brother to have more than double the value of $100,000.

This system of value does not in fact prevent one from assigning a “specific utility of gambling,” even within the system, as long as the fact that I am gambling or not is considered as a distinct event which is an additional result. If the only value that matters is money, then it is indeed a contradiction to speak of a specific utility of gambling. But if I care both about money and about whether I am gambling or not, there is no contradiction.

Something else is implied by all of this, something which is frequently not noticed. Suppose you have a choice of two events in this way. One of them is something that you would want or would like, as small or big as you like. It could be having a nice day at the beach, or $100, or whatever you please. The other is a deal where you have a virtual certainty of getting nothing, and a very small probability of some extremely large reward. For example, it may be that your brother dying of cancer is also on the road to hell. The second event is to give your brother a chance of one in a googolplex of attaining eternal salvation.

Of course, the second event here is worthless. Nobody is going to do anything or give up anything for the sake of such a deal. What this implies is this: if a numerical value is assigned to something in the Von Neumann-Morgenstern manner, no matter what that thing is, that value must be low enough (in comparison to other values) that it won’t have any significant value after it is divided by a googolplex.

In other words, even eternal salvation does not have an infinite value, but a finite value (measured in this way), and low enough that it can be made worthless by enough division.

If we consider the value to express how much we care about something, then this actually makes intuitive sense, because we do not care infinitely about anything, not even about things which might be themselves infinite.

Pascal, in his wager, assumes a probability of 50% for God and for the truth of religious beliefs, and seems to assume a certainty of salvation, given that you accept those beliefs and that they happen to be true. He also seems to assume a certain loss of salvation, if you do not accept those beliefs and they happen to be true, and that nothing in particular will happen if the beliefs are not true.

These assumptions are not very reasonable, considered as actual probability assignments and actual expectations of what is going to happen. However, some set of assignments will be reasonable, and this will certainly affect the reasonableness of the wager. If the probability of success is too low, the wager will be unreasonable, just as above we noted that it would be unreasonable to accept the deal concerning your brother. On the other hand, if the probability of success is high enough, it may well be reasonable to take the deal.

Erroneous Responses to Pascal

Many arguments which are presented against accepting Pascal’s wager are mistaken, some of them in obvious ways. For example, the argument is made that the multiplicity of religious beliefs or potential religious beliefs invalidates the wager:

But Pascal’s argument is seriously flawed. The religious environment that Pascal lived in was simple. Belief and disbelief only boiled down to two choices: Roman Catholicism and atheism. With a finite choice, his argument would be sound. But on Pascal’s own premise that God is infinitely incomprehensible, then in theory, there would be an infinite number of possible theologies about God, all of which are equally probable.

First, let us look at the more obvious possibilities we know of today – possibilities that were either unknown to, or ignored by, Pascal. In the Calvinistic theological doctrine of predestination, it makes no difference what one chooses to believe since, in the final analysis, who actually gets rewarded is an arbitrary choice of God. Furthermore we know of many more gods of many different religions, all of which have different schemes of rewards and punishments. Given that there are more than 2,500 gods known to man, and given Pascal’s own assumptions that one cannot comprehend God (or gods), then it follows that, even the best case scenario (i.e. that God exists and that one of the known Gods and theologies happen to be the correct one) the chances of making a successful choice is less than one in 2,500.

Second, Pascal’s negative theology does not exclude the possibility that the true God and true theology is not one that is currently known to the world. For instance it is possible to think of a God who rewards, say, only those who purposely step on sidewalk cracks. This sounds absurd, but given the premise that we cannot understand God, this possible theology cannot be dismissed. In such a case, the choice of what God to believe would be irrelevant as one would be rewarded on a premise totally distinct from what one actually believes. Furthermore as many atheist philosophers have pointed out, it is also possible to conceive of a deity who rewards intellectual honesty, a God who rewards atheists with eternal bliss simply because they dared to follow where the evidence leads – that given the available evidence, no God exists! Finally we should also note that given Pascal’s premise, it is possible to conceive of a God who is evil and who punishes the good and rewards the evil.

Thus Pascal’s call for us not to consider the evidence but to simply believe on prudential grounds fails.

There is an attempt here to base the response on Pascal’s mistaken claim that the probability of the existence of God (and of Catholic doctrine as a whole) is 50%. This would presumably be because we can know nothing about theological truth. According to this, the website reasons that all possible theological claims should be equally probable, and consequently one will be in any case very unlikely to find the truth, and therefore very unlikely to attain the eternal reward, using Pascal’s apparent assumption that only believers in a specific theology can attain the reward.

The problem with this is that it reasons for Pascal’s mistaken assumptions (as well as changing them in unjustified ways), while in reality the effectiveness of the wager does not precisely depend on these assumptions. If there is a 10% chance that God exists, and the rest is true as Pascal states it, it would still seem to be a good bet that God exists, in terms of the practical consequences. You will probably be wrong, but the gain if you are right will be so great that it will almost certainly outweigh the probable loss.

In reality different theologies are not equally probable, and there will be one which is most probable. Theologies such as the “God who rewards atheism”, which do not have any actual proponents, have very little evidence for them, since they do not even have the evidence resulting from a claim. One cannot expect that two differing positions will randomly have exactly the same amount of evidence for them, so one theology will have more evidence than any other. And even if it did not have overall a probability of more than 50%, it could still be a good bet, given the possibility of the reward, and better than any of the other potential wagers.

The argument is also made that once one admits an infinite reward, it is not possible to distinguish between actions with differing values. This is described here:

If you regularly brush your teeth, there is some chance you will go to heaven and enjoy infinite bliss. On the other hand, there is some chance you will enjoy infinite heavenly bliss even if you do not brush your teeth. Therefore the expectation of brushing your teeth (infinity plus a little extra due to oral health = infinity) is the same as that of not brushing your teeth (infinity minus a bit due to cavities and gingivitis = infinity), from which it follows that dental hygiene is not a particularly prudent course of action. In fact, as soon as we allow infinite utilities, decision theory tells us that any course of action is as good as any other (Duff 1986). Hence we have a reductio ad absurdum against decision theory, at least when it’s extended to infinite cases.

As actually applied, someone might argue that even if the God who rewards atheism is less probable than the Christian God, the expected utility of being Christian or atheist will be infinite in each case, and therefore one will not be a more reasonable choice than another. Some people seem actually seem to believe that this is a good response, but it is not. The problem here is that decision theory is a mathematical formalism and does not have to correspond precisely with real life. The mathematics does not work when infinity is introduced, but this does not mean there cannot be such an infinity in reality, nor that the two choices would be equal in reality. It simply means you have not chosen the right mathematics to express the situation. To see this clearly, consider the following situation.

You are in a room with two exits, a green door and a red door. The green door has a known probability of 99% of leading to an eternal heaven, and a known probability of 1% of leading to an eternal hell. The red door has a known probability of 99% of leading to an eternal hell, and a known probability of 1% of leading to an eternal heaven.

The point is that if your mathematics says that going out the red door is just as good as going out the green door, your mathematics is wrong. The correct solution is to go out the green door.

I would consider all such arguments, namely arguing that all religious beliefs are equally probable, or that being rewarded for atheism is as probable as being rewarded for Christianity, or that all infinite expectations are equal, are examples of not very serious thinking. These arguments are not only wrong. They are obviously wrong, and obviously motivated by the desire not to believe. Earlier I quoted Thomas Nagel on the fear of religion. After the quoted passage, he continues:

My guess is that this cosmic authority problem is not a rare condition and that it is responsible for much of the scientism and reductionism of our time. One of the tendencies it supports is the ludicrous overuse of evolutionary biology to explain everything about life, including everything about the human mind. Darwin enabled modern secular culture to heave a great collective sigh of relief, by apparently providing a way to eliminate purpose, meaning, and design as fundamental features of the world. Instead they become epiphenomena, generated incidentally by a process that can be entirely explained by the operation of the nonteleological laws of physics on the material of which we and our environments are all composed. There might still be thought to be a religious threat in the existence of the laws of physics themselves, and indeed the existence of anything at all— but it seems to be less alarming to most atheists.

This is a somewhat ridiculous situation.

This fear of religion is very likely the cause of such unreasonable responses. Scott Alexander notes in this comment that such explanations are mistaken:

I find all of the standard tricks used against Pascal’s Wager intellectually unsatisfying because none of them are at the root of my failure to accept it. Yes, it might be a good point that there could be an “atheist God” who punishes anyone who accepts Pascal’s Wager. But even if a super-intelligent source whom I trusted absolutely informed me that there was definitely either the Catholic God or no god at all, I feel like I would still feel like Pascal’s Wager was a bad deal. So it would be dishonest of me to say that the possibility of an atheist god “solves” Pascal’s Wager.

The same thing is true for a lot of the other solutions proposed. Even if this super-intelligent source assured me that yes, if there is a God He will let people into Heaven even if their faith is only based on Pascal’s Wager, that if there is a God He will not punish you for your cynical attraction to incentives, and so on, and re-emphasized that it was DEFINITELY either the Catholic God or nothing, I still wouldn’t happily become a Catholic.

Whatever the solution, I think it’s probably the same for Pascal’s Wager, Pascal’s Mugging, and the Egyptian mummy problem I mentioned last month. Right now, my best guess for that solution is that there are two different answers to two different questions:

Why do we believe Pascal’s Wager is wrong? Scope insensitivity. Eternity in Hell doesn’t sound that much worse, to our brains, than a hundred years in Hell, and we quite rightly wouldn’t accept Pascal’s Wager to avoid a hundred years in Hell. Pascal’s Mugger killing 3^^^3 people doesn’t sound too much worse than him killing 3,333 people, and we quite rightly wouldn’t give him a dollar to get that low a probability of killing 3,333 people.

Why is Pascal’s Wager wrong? From an expected utility point of view, it’s not. In any particular world, not accepting Pascal’s Wager has a 99.999…% chance of leading to a higher payoff. But averaged over very large numbers of possible worlds, accepting Pascal’s Wager or Pascal’s Mugging will have a higher payoff, because of that infinity going into the averages. It’s too bad that doing the rational thing leads to a lower payoff in most cases, but as everyone who’s bought fire insurance and not had their house catch on fire knows, sometimes that happens.

I realize that this position commits me, so far as I am rational, to becoming a theist. But my position that other people are exactly equal in moral value to myself commits me, so far as I am rational, to giving almost all my salary to starving Africans who would get a higher marginal value from it than I do, and I don’t do that either.

While a far more reasonable response, there is wishful thinking going here as well, with the assumption that the probability that a body of religious beliefs is true as a whole is extremely small. This will not generally speaking be the case, or at any rate it will not be as small as he suggests, once the evidence derived from the claim itself is taken into account, just as it is not extremely improbable that a particular book is mostly historical, even though if one considered the statements contained in the book as a random conjunction, one would suppose it to be very improbable.