Lies, Religion, and Miscalibrated Priors

In a post from some time ago, Scott Alexander asks why it is so hard to believe that people are lying, even in situations where it should be obvious that they made up the whole story:

The weird thing is, I know all of this. I know that if a community is big enough to include even a few liars, then absent a strong mechanism to stop them those lies should rise to the top. I know that pretty much all of our modern communities are super-Dunbar sized and ought to follow that principle.

And yet my System 1 still refuses to believe that the people in those Reddit threads are liars. It’s actually kind of horrified at the thought, imagining them as their shoulders slump and they glumly say “Well, I guess I didn’t really expect anyone to believe me”. I want to say “No! I believe you! I know you had a weird experience and it must be hard for you, but these things happen, I’m sure you’re a good person!”

If you’re like me, and you want to respond to this post with “but how do you know that person didn’t just experience a certain coincidence or weird psychological trick?”, then before you comment take a second to ask why the “they’re lying” theory is so hard to believe. And when you figure it out, tell me, because I really want to know.

The strongest reason for this effect is almost certainly a moral reason. In an earlier post, I discussed St. Thomas’s explanation for why one should give a charitable interpretation to someone’s behavior, and in a follow up, I explained the problem of applying that reasoning to the situation of judging whether a person is lying or not. St. Thomas assumes that the bad consequences of being mistaken about someone’s moral character will be minor, and most of the time this is true. But if we asking the question, “are they telling the truth or are they lying?”, the consequences can sometimes be very serious if we are mistaken.

Whether or not one is correct in making this application, it is not hard to see that this is the principal answer to Scott’s question. It is hard to believe the “they’re lying” theory not because of the probability that they are lying, but because we are unwilling to risk injuring someone with our opinion. This is without doubt a good motive from a moral standpoint.

But if you proceed to take this unwillingness as a sign of the probability that they are telling the truth, this would be a demonstrably miscalibrated probability assignment. Consider a story on Quora which makes a good example of Scott’s point:

I shuffled a deck of cards and got the same order that I started with.

No I am not kidding and its not because I can’t shuffle.

Let me just tell the story of how it happened. I was on a trip to Europe and I bought a pack of playing cards at the airport in Madrid to entertain myself on the flight back to Dallas.

It was about halfway through the flight after I’d watched Pixels twice in a row (That s literally the only reason I even remembered this) And I opened my brand new Real Madrid Playing Cards and I just shuffled them for probably like 30 minutes doing different tricks that I’d learned at school to entertain myself and the little girl sitting next to me also found them to be quite cool.

I then went to look at the other sides of the cards since they all had a picture of the Real Madrid player with the same number on the back. That’s when I realized that they were all in order. I literally flipped through the cards and saw Nacho-Fernandes, Ronaldo, Toni Kroos, Karim Benzema and the rest of the team go by all in the perfect order.

Then a few weeks ago when we randomly started talking about Pixels in AP Statistics I brought up this story and my teacher was absolutely amazed. We did the math and the amount of possibilities when shuffling a deck of cards is 52! Meaning 52 x 51 x 50 x 49 x 48….

There were 8.0658175e+67 different combinations of cards that I could have gotten. And I managed to get the same one twice.

The lack of context here might make us more willing to say that Arman Razaali is lying, compared to Scott’s particular examples. Nonetheless, I think a normal person will feel somewhat unwilling to say, “he’s lying, end of story.” I certainly feel that myself.

It does not take many shuffles to essentially randomize a deck. Consequently if Razaali’s statement that he “shuffled them for probably like 30 minutes” is even approximately true, 1 in 52! is probably a good estimate of the chance of the outcome that he claims, if we assume that it happened by chance. It might be some orders of magnitude less since there might be some possibility of “unshuffling.” I do not know enough about the physical process of shuffling to know whether this is a real possibility or not, but it is not likely to make a significant difference: e.g. the difference between 10^67 and 10^40 would be a huge difference mathematically, but it would not be significant for our considerations here, because both are simply too large for us to grasp.

People demonstrably lie at far higher rates than 1 in 10^67 or 1 in 10^40. This will remain the case even if you ask about the rate of “apparently unmotivated flat out lying for no reason.” Consequently, “he’s lying, period,” is far more likely than “the story is true, and happened by pure chance.” Nor can we fix this by pointing to the fact that an extraordinary claim is a kind of extraordinary evidence. In the linked post I said that the case of seeing ghosts, and similar things, might be unclear:

Or in other words, is claiming to have seen a ghost more like claiming to have picked 422,819,208, or is it more like claiming to have picked 500,000,000?

That remains undetermined, at least by the considerations which we have given here. But unless you have good reasons to suspect that seeing ghosts is significantly more rare than claiming to see a ghost, it is misguided to dismiss such claims as requiring some special evidence apart from the claim itself.

In this case there is no such unclarity – if we interpret the claim as “by pure chance the deck ended up in its original order,” then it is precisely like claiming to have picked 500,000,000, except that it is far less likely.

Note that there is some remaining ambiguity. Razaali could defend himself by saying, “I said it happened, I didn’t say it happened by chance.” Or in other words, “but how do you know that person didn’t just experience a certain coincidence or weird psychological trick?” But this is simply to point out that “he’s lying” and “this happened by pure chance” are not exhaustive alternatives. And this is true. But if we want to estimate the likelihood of those two alternatives in particular, we must say that it is far more likely that he is lying than that it happened, and happened by chance. And so much so that if one of these alternatives is true, it is virtually certain that he is lying.

As I have said above, the inclination to doubt that such a person is lying primarily has a moral reason. This might lead someone to say that my estimation here also has a moral reason: I just want to form my beliefs in the “correct” way, they might say: it is not about whether Razaali’s story really happened or not.

Charles Taylor, in chapter 15 of A Secular Age, gives a similar explanation of the situation of former religious believers who apparently have lost their faith due to evidence and argument:

From the believer’s perspective, all this falls out rather differently. We start with an epistemic response: the argument from modern science to all-around materialism seems quite unconvincing. Whenever this is worked out in something closer to detail, it seems full of holes. The best examples today might be evolution, sociobiology, and the like. But we also see reasonings of this kind in the works of Richard Dawkins, for instance, or Daniel Dennett.

So the believer returns the compliment. He casts about for an explanation why the materialist is so eager to believe very inconclusive arguments. Here the moral outlook just mentioned comes back in, but in a different role. Not that, failure to rise to which makes you unable to face the facts of materialism; but rather that, whose moral attraction, and seeming plausibility to the facts of the human moral condition, draw you to it, so that you readily grant the materialist argument from science its various leaps of faith. The whole package seems plausible, so we don’t pick too closely at the details.

But how can this be? Surely, the whole package is meant to be plausible precisely because science has shown . . . etc. That’s certainly the way the package of epistemic and moral views presents itself to those who accept it; that’s the official story, as it were. But the supposition here is that the official story isn’t the real one; that the real power that the package has to attract and convince lies in it as a definition of our ethical predicament, in particular, as beings capable of forming beliefs.

This means that this ideal of the courageous acknowledger of unpalatable truths, ready to eschew all easy comfort and consolation, and who by the same token becomes capable of grasping and controlling the world, sits well with us, draws us, that we feel tempted to make it our own. And/or it means that the counter-ideals of belief, devotion, piety, can all-too-easily seem actuated by a still immature desire for consolation, meaning, extra-human sustenance.

What seems to accredit the view of the package as epistemically-driven are all the famous conversion stories, starting with post-Darwinian Victorians but continuing to our day, where people who had a strong faith early in life found that they had reluctantly, even with anguish of soul, to relinquish it, because “Darwin has refuted the Bible”. Surely, we want to say, these people in a sense preferred the Christian outlook morally, but had to bow, with whatever degree of inner pain, to the facts.

But that’s exactly what I’m resisting saying. What happened here was not that a moral outlook bowed to brute facts. Rather we might say that one moral outlook gave way to another. Another model of what was higher triumphed. And much was going for this model: images of power, of untrammelled agency, of spiritual self-possession (the “buffered self”). On the other side, one’s childhood faith had perhaps in many respects remained childish; it was all too easy to come to see it as essentially and constitutionally so.

But this recession of one moral ideal in face of the other is only one aspect of the story. The crucial judgment is an all-in one about the nature of the human ethical predicament: the new moral outlook, the “ethics of belief” in Clifford’s famous phrase, that one should only give credence to what was clearly demonstrated by the evidence, was not only attractive in itself; it also carried with it a view of our ethical predicament, namely, that we are strongly tempted, the more so, the less mature we are, to deviate from this austere principle, and give assent to comforting untruths. The convert to the new ethics has learned to mistrust some of his own deepest instincts, and in particular those which draw him to religious belief. The really operative conversion here was based on the plausibility of this understanding of our ethical situation over the Christian one with its characteristic picture of what entices us to sin and apostasy. The crucial change is in the status accorded to the inclination to believe; this is the object of a radical shift in interpretation. It is no longer the impetus in us towards truth, but has become rather the most dangerous temptation to sin against the austere principles of belief-formation. This whole construal of our ethical predicament becomes more plausible. The attraction of the new moral ideal is only part of this, albeit an important one. What was also crucial was a changed reading of our own motivation, wherein the desire to believe appears now as childish temptation. Since all incipient faith is childish in an obvious sense, and (in the Christian case) only evolves beyond this by being child-like in the Gospel sense, this (mis)reading is not difficult to make.

Taylor’s argument is that the arguments for unbelief are unconvincing; consequently, in order to explain why unbelievers find them convincing, he must find some moral explanation for why they do not believe. This turns out to be the desire to have a particular “ethics of belief”: they do not want to have beliefs which are not formed in such and such a particular way. This is much like the theoretical response above regarding my estimation of the probability that Razaali is lying, and how that might be considered a moral estimation, rather than being concerned with what actually happened.

There are a number of problems with Taylor’s argument, which I may or may not address in the future in more detail. For the moment I will take note of three things:

First, neither in this passage nor elsewhere in the book does Taylor explain in any detailed way why he finds the unbeliever’s arguments unconvincing. I find the arguments convincing, and it is the rebuttals (by others, not by Taylor, since he does not attempt this) that I find unconvincing. Now of course Taylor will say this is because of my particular ethical motivations, but I disagree, and I have considered the matter exactly in the kind of detail to which he refers when he says, “Whenever this is worked out in something closer to detail, it seems full of holes.” On the contrary, the problem of detail is mostly on the other side; most religious views can only make sense when they are not worked out in detail. But this is a topic for another time.

Second, Taylor sets up an implicit dichotomy between his own religious views and “all-around materialism.” But these two claims do not come remotely close to exhausting the possibilities. This is much like forcing someone to choose between “he’s lying” and “this happened by pure chance.” It is obvious in both cases (the deck of cards and religious belief) that the options do not exhaust the possibilities. So insisting on one of them is likely motivated itself: Taylor insists on this dichotomy to make his religious beliefs seem more plausible, using a presumed implausibility of “all-around materialism,” and my hypothetical interlocutor insists on the dichotomy in the hope of persuading me that the deck might have or did randomly end up in its original order, using my presumed unwillingness to accuse someone of lying.

Third, Taylor is not entirely wrong that such an ethical motivation is likely involved in the case of religious belief and unbelief, nor would my hypothetical interlocutor be entirely wrong that such motivations are relevant to our beliefs about the deck of cards.

But we need to consider this point more carefully. Insofar as beliefs are voluntary, you cannot make one side voluntary and the other side involuntary. You cannot say, “Your beliefs are voluntarily adopted due to moral reasons, while my beliefs are imposed on my intellect by the nature of things.” If accepting an opinion is voluntary, rejecting it will also be voluntary, and if rejecting it is voluntary, accepting it will also be voluntary. In this sense, it is quite correct that ethical motivations will always be involved, even when a person’s opinion is actually true, and even when all the reasons that make it likely are fully known. To this degree, I agree that I want to form my beliefs in a way which is prudent and reasonable, and I agree that this desire is partly responsible for my beliefs about religion, and for my above estimate of the chance that Razaali is lying.

But that is not all: my interlocutor (Taylor or the hypothetical one) is also implicitly or explicitly concluding that fundamentally the question is not about truth. Basically, they say, I want to have “correctly formed” beliefs, but this has nothing to do with the real truth of the matter. Sure, I might feel forced to believe that Razaali’s story isn’t true, but there really is no reason it couldn’t be true. And likewise I might feel forced to believe that Taylor’s religious beliefs are untrue, but there really is no reason they couldn’t be.

And in this respect they are mistaken, not because anything “couldn’t” be true, but because the issue of truth is central, much more so than forming beliefs in an ethical way. Regardless of your ethical motives, if you believe that Razaali’s story is true and happened by pure chance, it is virtually certain that you believe a falsehood. Maybe you are forming this belief in a virtuous way, and maybe you are forming it in a vicious way: but either way, it is utterly false. Either it in fact did not happen, or it in fact did not happen by chance.

We know this, essentially, from the “statistics” of the situation: no matter how many qualifications we add, lies in such situations will be vastly more common than truths. But note that something still seems “unconvincing” here, in the sense of Scott Alexander’s original post: even after “knowing all this,” he finds himself very unwilling to say they are lying. In a discussion with Angra Mainyu, I remarked that our apparently involuntary assessments of things are more like desires than like beliefs:

So rather than calling that assessment a belief, it would be more accurate to call it a desire. It is not believing something, but desiring to believe something. Hunger is the tendency to go and get food; that assessment is the tendency to treat a certain claim (“the USA is larger than Austria”) as a fact. And in both cases there are good reasons for those desires: you are benefited by food, and you are benefited by treating that claim as a fact.

In a similar way, because we have the natural desire not to injure people, we will naturally desire not to treat “he is lying” as a fact; that is, we will desire not to believe it. The conclusion that Angra should draw in the case under discussion, according to his position, is that I do not “really believe” that it is more likely that Razaali is lying than that his story is true, because I do feel the force of the desire not to say that he is lying. But I resist that desire, in part because I want to have reasonable beliefs, but most of all because it is false that Razaali’s story is true and happened by chance.

To the degree that this desire feels like a prior probability, and it does feel that way, it is necessarily miscalibrated. But to the degree that this desire remains nonetheless, this reasoning will continue to feel in some sense unconvincing. And it does in fact feel that way to me, even after making the argument, as expected. Very possibly, this is not unrelated to Taylor’s assessment that the argument for unbelief “seems quite unconvincing.” But discussing that in the detail which Taylor omitted is a task for another time.




Statistical Laws of Choice

I noted in an earlier post the necessity of statistical laws of nature. This will necessarily apply to human actions as a particular case, as I implied there in mentioning the amount of food humans eat in a year.

Someone might object. It was said in the earlier post that this will happen unless there is a deliberate attempt to evade this result. But since we are speaking of human beings, there might well be such an attempt. So for example if we ask someone to choose to raise their right hand or their left hand, this might converge to an average, such as 50% each, or perhaps the right hand 60% of the time, or something of this kind. But presumably someone who starts out with the deliberate intention of avoiding such an average will be able to do so.

Unfortunately, such an attempt may succeed in the short run, but will necessarily fail in the long run, because although it is possible in principle, it would require an infinite knowing power, which humans do not have. As I pointed out in the earlier discussion, attempting to prevent convergence requires longer and longer strings on one side or the other. But if you need to raise your right hand a few trillion times before switching again to your left, you will surely lose track of your situation. Nor can you remedy this by writing things down, or by other technical aids: you may succeed in doing things trillions of times with this method, but if you do it forever, the numbers will also become too large to write down. Naturally, at this point we are only making a theoretical point, but it is nonetheless an important one, as we shall see later.

In any case, in practice people do not tend even to make such attempts, and consequently it is far easier to predict their actions in a roughly statistical manner. Thus for example it would not be hard to discover the frequency with which an individual chooses chocolate ice cream over vanilla.

Telephone Game

Victor Reppert says at his blog,

1. If the initial explosion of the big bang had differed in strength by as little as one part in 10\60, the universe would have either quickly collapsed back on itself, or
expanded [too] rapidly for stars to form. In either case, life would be impossible.
2. (An accuracy of one part in 10 to the 60th power can be compared to firing a bullet at a one-inch target on the other side of the observable universe, twenty billion light years away, and hitting the target.)

The claim seems a bit strong. Let x be a measurement in some units of the strength of “the inital explosion of the big bang.” Reppert seems to be saying that if x were increased or decreased by x / (10^60), then the universe would have either collapsed immediately, or it would have expanded without forming stars, so that life would have been impossible.

It’s possible that someone could make a good argument for that claim. But the most natural argument for that claim would be to say something like this, “We know that x had to fall between y and z in order to produce stars, and y and z are so close together that if we increased or decreased x by one part in 10^60, it would fall outside y and z.” But this will not work unless x is already known to fall between y and z. And this implies that we have measured x to a precision of 60 digits.

I suspect that no one, ever, has measured any physical thing to a precision of 60 digits, using any units or any form of measurement. This suggests that something about Reppert’s claim is a bit off.

In any case, the fact that 10^60 is expressed by “10\60”, and the fact that Reppert omits the word “too” mean that we can trace his claim fairly precisely. Searching Google for the exact sentence, we get this page as the first result, from November 2011. John Piippo says there:

1. If the initial explosion of the big bang had differed in strength by as little as one part in 10\60, the universe would have either quickly collapsed back on itself, or expanded [too] rapidly for stars to form. In either case, life would be impossible. (An accuracy of one part in 10 to the 60th power can be compared to firing a bullet at a one-inch target on the other side of the observable universe, twenty billion light years away, and hitting the target.)

Reppert seems to have accidentally or deliberately divided this into two separate points; number 2 in his list does not make sense except as an observation on the first, as it is found here. Piippo likewise omits the word “too,” strongly suggesting that Piippo is the direct source for Reppert, although it is also possible that both borrowed from a third source.

We find an earlier form of the claim here, made by Robin Collins. It appears to date from around 1998, given the statement, “This work was made possible in part by a Discovery Institute grant for the fiscal year 1997-1998.” Here the claim stands thus:

1. If the initial explosion of the big bang had differed in strength by as little as 1 part in 1060, the universe would have either quickly collapsed back on itself, or expanded too rapidly for stars to form. In either case, life would be impossible. [See Davies, 1982, pp. 90-91. (As John Jefferson Davis points out (p. 140), an accuracy of one part in 1060 can be compared to firing a bullet at a one-inch target on the other side of the observable universe, twenty billion light years away, and hitting the target.)

Here we still have the number “1.”, and the text is obviously the source for the later claims, but the word “too” is present in this version, and the claims are sourced. He refers to The Accidental Universe by Paul Davies. Davies says on page 88:

It follows from (4.13) that if p > p_crit then > 0, the universe is spatially closed, and will eventually contract. The additional gravity of the extra-dense matter will drag the galaxies back on themselves. For p_crit, the gravity of the cosmic matter is weaker and the universe ‘escapes’, expanding unchecked in much the same way as a rapidly receding projectile. The geometry of the universe, and its ultimate fate, thus depends on the density of matter or, equivalently, on the total number of particles in the universe, N. We are now able to grasp the full significance of the coincidence (4.12). It states precisely that nature has chosen to have a value very close to that required to yield a spatially flat universe, with = 0 and p = p_crit.

Then, at the end of page 89, he says this:

At the Planck time – the earliest epoch at which we can have any confidence in the theory – the ratio was at most an almost infinitesimal 10-60. If one regards the Planck time as the initial moment when the subsequent cosmic dynamics were determined, it is necessary to suppose that nature chose to differ from p_crit by no more than one part in 1060.

Here we have our source. “The ratio” here refers to (p – p_crit) / p_crit. In order for the ratio to be this small, has to be almost equal to p_crit. In fact, Davies says that this ratio is proportional to time. If we set time = 0, then we would get a ratio of exactly 0, so that p = p_crit. Davies rightly states that the physical theories in question cannot work this way: under the theory of the Big Bang, we cannot discuss the state of the universe at t = 0 and expect to get sensible results. Nonetheless, this suggests that something is wrong with the idea that anything has been calibrated to one part in 1060. Rather, two values have started out basically equal and grown apart throughout time, so that if you choose an extremely small value of time, you get an extremely small difference in the two values.

This also verifies my original suspicion. Nothing has been measured to a precision of 60 digits, and a determination made that the number measured could not vary by one iota. Instead, Davies has simply taken a ratio that is proportional to time, and calculated its value with a very small value of time.


There is a real issue here, and it is the question, “Why is the universe basically flat?” But whatever the answer to this question may be, the question, and presumably its answer, are quite different from the claim that physics contains constants that are constrained to the level of “one part in 1060.” To put this another way: if you answer the question, “Why is the universe flat?” with a response of the form, “Because = 1892592714.2256399288581158185662151865333331859591, and if it had been the slightest amount more or less than this, the universe would not have been flat,” then your answer is very likely wrong. There is likely to be a simpler and more general answer to the question.

Reppert in fact agrees, and that is the whole point of his argument. For him, the simpler and more general answer is that God planned it that way. That may be, but it should be evident that there is nothing that demands either this answer or an answer of the above form. There could be any number of potential answers.

Playing the telephone game and expecting to get a sensible result is a bad idea. If you take a statement from someone else and restate it without a source, and your source itself has no source, it is quite possible that your statement is wrong and that the original claim was quite different. Even apart from this, however, Reppert is engaging in a basically mistaken enterprise. In essence, he is making a philosophical argument, but attempting to give the appearance of supporting it with physics and mathematics. This is presumably because these topics are less remote from the senses. If Reppert can convince you that his argument is supported by physics and mathematics, you will be likely to think that reasonable disagreement with his position is impossible. You will be less likely to be persuaded if you recognize that his argument remains a philosophical one.

There are philosophical arguments for the existence of God, and this blog has discussed such arguments. But these arguments belong to philosophy, not to science.





Bias vs. Variance

Scott Fortmann-Roe explains the difference between error due to bias and error due to variance:

  • Error due to Bias: The error due to bias is taken as the difference between the expected (or average) prediction of our model and the correct value which we are trying to predict. Of course you only have one model so talking about expected or average prediction values might seem a little strange. However, imagine you could repeat the whole model building process more than once: each time you gather new data and run a new analysis creating a new model. Due to randomness in the underlying data sets, the resulting models will have a range of predictions. Bias measures how far off in general these models’ predictions are from the correct value.
  • Error due to Variance: The error due to variance is taken as the variability of a model prediction for a given data point. Again, imagine you can repeat the entire model building process multiple times. The variance is how much the predictions for a given point vary between different realizations of the model.

Later in the essay, he suggests that there is a natural tendency to overemphasize minimizing bias:

A gut feeling many people have is that they should minimize bias even at the expense of variance. Their thinking goes that the presence of bias indicates something basically wrong with their model and algorithm. Yes, they acknowledge, variance is also bad but a model with high variance could at least predict well on average, at least it is not fundamentally wrong.

This is mistaken logic. It is true that a high variance and low bias model can perform well in some sort of long-run average sense. However, in practice modelers are always dealing with a single realization of the data set. In these cases, long run averages are irrelevant, what is important is the performance of the model on the data you actually have and in this case bias and variance are equally important and one should not be improved at an excessive expense to the other.

Fortmann-Roe is concerned here with bias and variance in a precise mathematical sense, relative to the project of fitting a curve to a set of data points. However, his point could be generalized to apply much more generally, to interpreting and understanding the world overall. Tyler Cowen makes such a generalized point:

Arnold Kling summarizes Robin’s argument:

If you have a cause, then other people probably disagree with you (if nothing else, they don’t think your cause is as important as you do). When other people disagree with you, they are usually more right than you think they are. So you could be wrong. Before you go and attach yourself to this cause, shouldn’t you try to reduce the chances that you are wrong? Ergo, shouldn’t you work on trying to overcome bias? Therefore, shouldn’t overcoming bias be your number one cause?

Here is Robin’s very similar statement.  I believe these views are tautologically true and they simply boil down to saying that any complaint can be expressed as a concern about error of some kind or another.  I cannot disagree with this view, for if I do, I am accusing Robin of being too biased toward eliminating bias, thus reaffirming that bias is in fact the real problem.

I find it more useful to draw an analogy with statistics.  Biased estimators are one problem but not the only problem.  There is also insufficient data, lazy researchers, inefficient estimators, and so on.  Then I don’t see why we should be justified in holding a strong preference for overcoming bias, relative to other ends.

Tyler is arguing, for example, that someone may be in error because he is biased, but he can also be in error because he is too lazy to seek out the truth, and it may be more important in a particular case to overcome laziness than to overcome bias.

This is true, no doubt, but we can make a stronger point: In the mathematical discussion of bias and variance, insisting on a completely unbiased model will result in a very high degree of variance, with the nearly inevitable consequence of a higher overall error rate. Thus, for example, we can create a polynomial which will go through every point of the data exactly. Such a method of predicting data is completely unbiased. Nonetheless, such a model tends to be highly inaccurate in predicting new data due to its very high variance: the exact curve is simply too sensitive to the exact points found in the original data. In a similar way, even in the more general non-mathematical case, we will likely find that insisting on a completely unbiased method will result in greater error overall: the best way to find the truth may be to adopt a somewhat simplified model, just as in the mathematical case it is best not to try to fit the data exactly. Simplifying the model will introduce some bias, but it will also reduce variance.

To the best of my knowledge, no one has a demonstrably perfect method of adopting the best model, even in the mathematical case. Much less, therefore, can we come up with a perfect trade-off between bias and variance in the general case. We can simply use our best judgment. But we have some reason for thinking that there must be some such trade-off, just as there is in the mathematical case.

The Actual Infinite

There are good reasons to think that actual infinities are possible in the real world. In the first place, while the size and shape of the universe are not settled issues, the generally accepted theory fits better with the idea that the universe is physically infinite than with the idea that it is finite.

Likewise, the universe is certainly larger than the size of the observable universe, namely about 93 billion light years in diameter. Supposing you have a probability distribution which assigns a finite probability to the claim that the universe is physically infinite, there is no consistent probability distribution which will not cause the probability of an infinite universe to go to 100% at the limit, as you exclude smaller finite sizes. But if someone had assigned a reasonable probability distribution before modern physical science existed, it would very likely have been one that make the probability of an infinite universe go very high by the time the universe was confirmed to be its present size. Therefore we too should think that the universe is very probably infinite. In principle, this argument is capable of refuting even purported demonstrations of the impossibility of an actual infinite, since there is at least some small chance that these purported demonstrations are all wrong.

Likewise, almost everyone accepts the possibility of an infinite future. Even the heat death of the universe would not prevent the passage of infinite time, and a religious view of the future also generally implies the passage of infinite future time. Even if heaven is supposed to be outside time in principle, in practice there would still be an infinite number of future human acts. If eternalism or something similar is true, then an infinite future in itself implies an actual infinite. And even if such a theory is not true, it is likely that a potentially infinite future implies the possibility of an actual infinite, because any problematic or paradoxical results from an actual infinite can likely be imitated in some way in the case of an infinite future.

On the other hand, there are good reasons to think that actual infinities are not possible in the real world. Positing infinities results in paradoxical or contradictory results in very many cases, and the simplest and therefore most likely way to explain this is to admit that infinities are simply impossible in general, even in the cases where we have not yet verified this fact.

An actual infinite also seems to imply an infinite regress in causality, and such a regress is impossible. We can see this by considering the material cause. Suppose the universe is physically infinite, and contains an infinite number of stars and planets. Then the universe is composed of the solar system together with the rest of the universe. But the rest of the universe will be composed of another stellar system together with the remainder, and so on. So there will be an infinite regress of material causality, which is just as impossible with material causality as with any other kind of causality.

Something similar is implied by St. Thomas’s argument against an infinite multitude:

This, however, is impossible; since every kind of multitude must belong to a species of multitude. Now the species of multitude are to be reckoned by the species of numbers. But no species of number is infinite; for every number is multitude measured by one. Hence it is impossible for there to be an actually infinite multitude, either absolute or accidental.

We can look at this in terms of our explanation of defining numbers. This explanation works only for finite numbers, and an infinite number could not be defined in such a way, precisely because it would result in an infinite regress. This leads us back to the first argument above against infinities: an infinity is intrinsically undefined and unintelligible, and for that reason leads to paradoxes. Someone might say that something unintelligible cannot be understood but is not impossible; but this is no different from Bertrand Russell saying that there is no reason for things not to come into being from nothing, without a cause. Such a position is unreasonable and untrue.

Spinoza’s Geometrical Ethics

Benedict Spinoza, admiring the certainty of geometry, writes his Ethics Demonstrated in Geometrical Order in a manner imitating that of Euclid’s Elements.

Omitting his definitions and axioms for the moment, we can look at his proofs. Thus we have the first:

1: A substance is prior in nature to its states. This is evident from D3 and D5.

The two definitions are of “substance” and “mode,” which latter he equates with “state of a substance.” However, neither definition explains “prior in nature,” nor is this found in any of the other definitions and axioms.

Thus his argument does not follow. But we can grant that the claim is fairly reasonable in any case, and would follow according to many reasonable definitions of “prior in nature,” and according to reasonable axioms.

He proceeds to his second proof:

2: Two substances having different attributes have nothing in common with one another. This is also evident from D3. For each ·substance· must be in itself and be conceived through itself, which is to say that the concept of the one doesn’t involve the concept of the other.

D3 and D4 (which must be used here although he does not cite it explicitly in the proof) say:

D3: By ‘substance’ I understand: what is in itself and is conceived through itself, i.e. that whose concept doesn’t have to be formed out of the concept of something else. D4: By ‘attribute’ I understand: what the intellect perceives of a substance as constituting its essence.

Thus when he speaks of “substances having different attributes,” he means ones which are intellectually perceived as being different in their essence.

Once again, however, “have nothing in common” is not found in his definitions. However, it occurs once in his axioms, namely in A5:

A5: If two things have nothing in common, they can’t be understood through one another—that is, the concept of one doesn’t involve the concept of the other.

The axiom is pretty reasonable, at least taken in a certain way. If there is no idea common to the ideas of two things, the idea of one won’t be included in the idea of the other. But Spinoza is attempting to draw the conclusion that “if two substances have different attributes, i.e. are different in essence, then they have nothing in common.” But this does not seem to follow from a reasonable understanding of D3 and D4, nor from the definitions together with the axioms. “Dog” and “cat” might be substances, and the idea of dog does not include that of cat, nor cat the idea of dog, but they have “animal” in common. So his conclusion is not evident from the definition, nor does it follow logically from his definitions and axioms, nor does it seem to be true.

And this is only the second supposed proof out of 36 in part 1 of his book.

I would suggest that there are at least two problems with his whole project. First, Spinoza knows where he wants to get, and it is not somewhere good. Among other things, he is aiming for proposition 14:

14: God is the only substance that can exist or be conceived.

This is closely related to proposition 2, since if it is true that two different things can have nothing in common, then it is impossible for more than one thing to exist, since otherwise existence would be something in common to various things.

Proposition 14 is absolutely false taken in any reasonable way. Consequently, since Spinoza is absolutely determined to arrive at a false proposition, he will necessarily employ falsehoods or logical mistakes along the way.

There is a second problem with his project. Geometry speaks about a very limited portion of reality. For this reason it is possible to come to most of its conclusions using a limited variety of definitions and axioms. But ethics and metaphysics, the latter of which is the actual topic of his first book, are much wider in scope. Consequently, if you want to say much that is relevant about them, it is impossible in principle to proceed from a small number of axioms and definitions. A small number of axioms and definitions will necessarily include only a small number of terms, and speaking about ethics and metaphysics requires a large number of terms. For example, suppose I wanted to prove everything on this blog using the method of definitions and axioms. Since I have probably used thousands of terms, hundreds or thousands of definitions and axioms would be required. There would simply be no other way to get the desired conclusions. In a similar way, we saw even in the first few proofs that Spinoza has a similar problem; he wants to speak about a very broad subject, but he wants to start with just a few definitions and axioms.

And if you do employ hundreds of axioms, of course, there is very little chance that anyone is going to grant all of them. They will at least argue that some of them might be mistaken, and thus your proofs will lose the complete certainty that you were looking for from the geometrical method.


Numbering The Good

The book Theory of Games and Economic Behavior, by John Von Neumann and Oskar Morgenstern, contains a formal mathematical theory of value. In the first part of the book they discuss some objections to such a project, as well as explaining why they are hopeful about it:

1.2.2. It is not that there exists any fundamental reason why mathematics should not be used in economics. The arguments often heard that because of the human element, of the psychological factors etc., or because there is allegedly no measurement of important factors, mathematics will find no application, can all be dismissed as utterly mistaken. Almost all these objections have been made, or might have been made, many centuries ago in fields where mathematics is now the chief instrument of analysis. This “might have been” is meant in the following sense: Let us try to imagine ourselves in the period which preceded the mathematical or almost mathematical phase of the development in physics, that is the 16th century, or in chemistry and biology, that is the 18th century. Taking for granted the skeptical attitude of those who object to mathematical economics in principle, the outlook in the physical and biological sciences at these early periods can hardly have been better than that in economics, mutatis mutandis, at present.

As to the lack of measurement of the most important factors, the example of the theory of heat is most instructive; before the development of the mathematical theory the possibilities of quantitative measurements were less favorable there than they are now in economics. The precise measurements of the quantity and quality of heat (energy and temperature) were the outcome and not the antecedents of the mathematical theory. This ought to be contrasted with the fact that the quantitative and exact notions of prices, money and the rate of interest were already developed centuries ago.

A further group of objections against quantitative measurements in economics, centers around the lack of indefinite divisibility of economic quantities. This is supposedly incompatible with the use of the infinitesimal calculus and hence (!) of mathematics. It is hard to see how such objections can be maintained in view of the atomic theories in physics and chemistry, the theory of quanta in electrodynamics, etc., and the notorious and continued success of mathematical analysis within these disciplines.

This project requires the possibility of treating the value of things as a numerically measurable quantity. Calling this value “utility”, they discuss the difficulty of this idea:

3.1.2. Historically, utility was first conceived as quantitatively measurable, i.e. as a number. Valid objections can be and have been made against this view in its original, naive form. It is clear that every measurement, or rather every claim of measurability, must ultimately be based on some immediate sensation, which possibly cannot and certainly need not be analyzed any further. In the case of utility the immediate sensation of preference, of one object or aggregate of objects as against another, provides this basis. But this permits us only to say when for one person one utility is greater than another. It is not in itself a basis for numerical comparison of utilities for one person nor of any comparison between different persons. Since there is no intuitively significant way to add two utilities for the same person, the assumption that utilities are of non-numerical character even seems plausible. The modern method of indifference curve analysis is a mathematical procedure to describe this situation.

They note however that the original situation was no different with the idea of quantitatively measuring heat:

3.2.1. All this is strongly reminiscent of the conditions existent at the beginning of the theory of heat: that too was based on the intuitively clear concept of one body feeling warmer than another, yet there was no immediate way to express significantly by how much, or how many times, or in what sense.

Beginning the derivation of their particular theory, they say:

3.3.2. Let us for the moment accept the picture of an individual whose system of preferences is all-embracing and complete, i.e. who, for any two objects or rather for any two imagined events, possesses a clear intuition of preference.

More precisely we expect him, for any two alternative events which are put before him as possibilities, to be able to tell which of the two he prefers.

It is a very natural extension of this picture to permit such an individual to compare not only events, but even combinations of events with stated probabilities.

By a combination of two events we mean this: Let the two events be denoted by B and C and use, for the sake of simplicity, the probability 50%-50%. Then the “combination” is the prospect of seeing B occur with a probability of 50% and (if B does not occur) C with the (remaining) probability of 50%. We stress that the two alternatives are mutually exclusive, so that no possibility of complementarity and the like exists. Also, that an absolute certainty of the occurrence of either B or C exists.

To restate our position. We expect the individual under consideration to possess a clear intuition whether he prefers the event A to the 50-50 combination of B or C, or conversely. It is clear that if he prefers A to B and also to C, then he will prefer it to the above combination as well; similarly, if he prefers B as well as C to A, then he will prefer the combination too. But if he should prefer A to, say B, but at the same time C to A, then any assertion about his preference of A against the combination contains fundamentally new information. Specifically: If he now prefers A to the 50-50 combination of B and C, this provides a plausible base for the numerical estimate that his preference of A over B is in excess of his preference of C over A.

If this standpoint is accepted, then there is a criterion with which to compare the preference of C over A with the preference of A over B. It is well known that thereby utilities, or rather differences of utilities, become numerically measurable. That the possibility of comparison between A, B, and C only to this extent is already sufficient for a numerical measurement of “distances” was first observed in economics by Pareto. Exactly the same argument has been made, however, by Euclid for the position of points on a line in fact it is the very basis of his classical derivation of numerical distances.

It is important to note that the the things being assigned values are described as events. They should not be considered to be actions or choices, or at any rate, only insofar as actions or choices are themselves events that happen in the world. This is important because a person might very well think, “It would be better if A happened than if B happened. But making A happen is vicious, while making B happen is virtuous, so I will make B happen.” He prefers A as an outcome, but the actions which cause these events do not line up, in their moral value, with the external value of the outcomes. Of course, just as the person says that A happening is a better outcome than B happening, he can say that “choosing to make B happen” is a better outcome than “choosing to make A happen.” So in this sense there is nothing to exclude actions from being included in this system of value. But they can only be included insofar as actions themselves are events that happen in the world.

Von Neumann and Morgenstern continue:

The introduction of numerical measures can be achieved even more directly if use is made of all possible probabilities. Indeed: Consider three events, C, A, B, for which the order of the individual’s preferences is the one stated. Let a be a real number between 0 and 1, such that A is exactly equally desirable with the combined event consisting of a chance of probability 1 – a for B and the remaining chance of probability a for C. Then we suggest the use of a as a numerical estimate for the ratio of the preference of A over B to that of C over B.

So for example, suppose that C is an orange (or as an event, eating an orange). is eating a plum, and is eating an apple. The person prefers the orange to the plum, and the plum to the apple. The person prefers a combination of a 20% chance of an apple and an 80% chance of an orange to a plum, while he prefers a plum to a combination of a 40% chance of an apple and a 60% chance of an orange. Since this indicates that his preference changes sides at some point, we suppose that this happens at a 30% chance of an apple and a 70% chance of an orange. All the combinations giving more than a 70% chance of the orange, he prefers to the plum; and he prefers the plum to all the combinations giving less than a 70% chance of the orange. The authors are suggesting that if we assign numerical values to the plum, the apple, and the orange, we should do this in such a way that the difference between the values of the plum and the apple, divided by the difference between the values of the orange and the apple, should be 0.7.

The basic intuition here is that since the combinations of various probabilities of the orange and apple vary continuously from (100% orange, 0% apple) to (0% orange, 100% apple), the various combinations should go continuously through every possible value between the value of the orange and the value of the apple. Since we are passing through those values by changing a probability, they are suggesting mapping that probability directly onto a value. Thus if the value of the orange is 1 and the value of the apple is 0, we say that the value of the plum is 0.7, because the plum is basically equivalent in value to a combination of a 70% chance of the orange and a 30% chance of the apple.

Working this out formally in the later parts of the paper, they show that given that a person’s preferences satisfy certain fairly reasonable axioms, it will be possible to assign values to each of his preferences, and these values are necessarily uniquely determined up to the point of a linear transformation.

I will not describe the axioms themselves here, although they are described in the book, as well as perhaps more simply elsewhere.

Note that according to this system, if you want to know the value of a combination, e.g. (60% chance of A and 40% chance of B), the value will always be 0.6(value of A)+0.4(value of B). The authors comment on this result:

3.7.1. At this point it may be well to stop and to reconsider the situation. Have we not shown too much? We can derive from the postulates (3:A)-(3:C) the numerical character of utility in the sense of (3:2:a) and (3:1:a), (3:1:b) in 3.5.1.; and (3:1:b) states that the numerical values of utility combine (with probabilities) like mathematical expectations! And yet the concept of mathematical expectation has been often questioned, and its legitimateness is certainly dependent upon some hypothesis concerning the nature of an “expectation.” Have we not then begged the question? Do not our postulates introduce, in some oblique way, the hypotheses which bring in the mathematical expectation?

More specifically: May there not exist in an individual a (positive or negative) utility of the mere act of “taking a chance,” of gambling, which the use of the mathematical expectation obliterates?

The objection is this: according to this system of value, if something has a value v, and something else has the double value 2v, the person should consider getting the thing with value v to be completely equal with a deal where he has an exactly 50% chance of getting the thing with value 2v, and a 50% chance of getting nothing. That seems objectionable because many people would prefer a certainty of getting something, to a situation where there is a good chance of getting nothing, even if there is also a chance of getting something more valuable. So for example, if you were now offered the choice of $100,000 directly, or $200,000 if you flip a coin and get heads, and nothing if you get tails, you would probably not only prefer the $100,000, but prefer it to a very high degree.

Morgenstern and Von Neumann continue:

How did our axioms (3:A)-(3:C) get around this possibility?

As far as we can see, our postulates (3:A)-(3:C) do not attempt to avoid it. Even that one which gets closest to excluding a “utility of gambling” (3:C:b) (cf. its discussion in 3.6.2.), seems to be plausible and legitimate, unless a much more refined system of psychology is used than the one now available for the purposes of economics. The fact that a numerical utility, with a formula amounting to the use of mathematical expectations, can be built upon (3:A)-(3:C), seems to indicate this: We have practically defined numerical utility as being that thing for which the calculus of mathematical expectations is legitimate. Since (3:A)-(3:C) secure that the necessary construction can be carried out, concepts like a “specific utility of gambling” cannot be formulated free of contradiction on this level.

“We have practically defined numerical utility as being that thing for which the calculus of mathematical expectations is legitimate.” In other words, the reason for the strange result is that calling a value “double” very nearly simply means that a 50% chance of that value, and a 50% chance of nothing, is considered equal to the original value which was to be doubled.

Considering the case of the $100,000 and $200,000, perhaps it is not so strange after all, even if we think of value in the terms of Von Neumann and Morgenstern. You are benefited if you receive $100,000. But if you receive $100,000, and then another $100,000, how much benefit do you get from the second gift? Just as much? Not at all. The first gift will almost certainly make a much bigger change in your life than the second gift. So even by ordinary standards, getting $200,000 is not twice as valuable as getting $100,000, but less than twice as valuable.

There might be something such that it would have exactly twice the value of $100,000 for you in the Von Neumann-Morgenstern sense. If you care about money enough, perhaps $300,000, or $1,000,000. If so, then you would consider the deal where you flip a coin for this amount of money just as good (considered in advance) as directly receiving $100,000. If you don’t care enough about money for such a thing to be true, there will be something else that you do consider to have twice the value, or more, in this sense. For example, if you have a brother dying of cancer, you would probably prefer that he have a 50% chance of survival, to receiving the $100,000. This means that in the relevant sense, you consider the survival of your brother to have more than double the value of $100,000.

This system of value does not in fact prevent one from assigning a “specific utility of gambling,” even within the system, as long as the fact that I am gambling or not is considered as a distinct event which is an additional result. If the only value that matters is money, then it is indeed a contradiction to speak of a specific utility of gambling. But if I care both about money and about whether I am gambling or not, there is no contradiction.

Something else is implied by all of this, something which is frequently not noticed. Suppose you have a choice of two events in this way. One of them is something that you would want or would like, as small or big as you like. It could be having a nice day at the beach, or $100, or whatever you please. The other is a deal where you have a virtual certainty of getting nothing, and a very small probability of some extremely large reward. For example, it may be that your brother dying of cancer is also on the road to hell. The second event is to give your brother a chance of one in a googolplex of attaining eternal salvation.

Of course, the second event here is worthless. Nobody is going to do anything or give up anything for the sake of such a deal. What this implies is this: if a numerical value is assigned to something in the Von Neumann-Morgenstern manner, no matter what that thing is, that value must be low enough (in comparison to other values) that it won’t have any significant value after it is divided by a googolplex.

In other words, even eternal salvation does not have an infinite value, but a finite value (measured in this way), and low enough that it can be made worthless by enough division.

If we consider the value to express how much we care about something, then this actually makes intuitive sense, because we do not care infinitely about anything, not even about things which might be themselves infinite.

Pascal, in his wager, assumes a probability of 50% for God and for the truth of religious beliefs, and seems to assume a certainty of salvation, given that you accept those beliefs and that they happen to be true. He also seems to assume a certain loss of salvation, if you do not accept those beliefs and they happen to be true, and that nothing in particular will happen if the beliefs are not true.

These assumptions are not very reasonable, considered as actual probability assignments and actual expectations of what is going to happen. However, some set of assignments will be reasonable, and this will certainly affect the reasonableness of the wager. If the probability of success is too low, the wager will be unreasonable, just as above we noted that it would be unreasonable to accept the deal concerning your brother. On the other hand, if the probability of success is high enough, it may well be reasonable to take the deal.