On the Nature of Mathematical Proof
1. Mathematical induction
The two main methods of mathematical proof are called “constructive” and “indirect”. Let me start with the constructive method of “mathematical induction” or, as it is also called, “complete induction”. It is the prototype of a constructive proof and the main principle for investigating the qualities of natural numbers: if a certain property holds for n = 1, the smallest of all natural numbers, and if from its validity for some n a similar validity for (n + 1) follows, then this assertion is true for all natural numbers. The proof may alternatively be based on a hypothesis, as is stated in Richard Courant and Herbert Robbins' classic book What is Mathematics? 1 (newly revised by Ian Stewart), to which I shall refer repeatedly in this appendix. I quote: “The fact that the proof of a theorem consists in the application of certain simple rules of logic does not dispose of the creative element in mathematics, which lies in the choice of the possibilities to be examined. The question of the origin of the hypothesis belongs to a domain in which no very general rules can be given; experiment, analogy and constructive intuition play their part here. But once the correct hypothesis is formulated, the principle of mathematical induction is often sufficient to provide the proof.”
In the examples I am going to discuss, the results (some of which are also quoted in the above‐mentioned book) are all well known. Moreover, to prove their correctness by mathematical induction is a relatively easy job, so that the authors (in certain cases) left this to their readers. I felt sufficiently challenged by one of the problems considered, namely the sums of arbitrary powers of natural numbers. One rainy weekend in my Californian refuge – such days occur anyway too rarely in Southern California – I recalled Courant's words quoted above. My ambition did not admit to contenting myself with a mere verification of the known solutions. Rather I wanted to show how certain surprising results can be visualised by geometrisation. I think that some of the problems treated can be recommended for advanced courses in high‐school mathematics. The problem I chose was the general solution for the sum of (p.652) the q‐th power of the first n natural numbers for any power q of k. The classical solution of this problem is well known. It requires a knowledge of certain finite sums such as those defining the Euler and Bernoulli numbers2 (see below). My real motivation was to learn more about n‐dimensional point spaces, which play a part in my thoughts on information as described in Chapters 4 and 5.
1.1 The sum of integers 1∑n
I start with the trivial case of q∑n for q = 1. There is an amusing story of a boy at elementary school. The teacher, tired from lecturing all morning, gave his charges the problem of adding up all the numbers from one to a hundred, hoping to get some time for a little nap. But after a few moments one of the youngsters arrived at the solution, 5050, and he had to enlighten his startled teacher as to how he had got the result that quickly. The boy's name was Carl Friedrich Gauss, who did in his head what is depicted in Figure A3.1. This graphical depiction of the trick used by Gauss is one‐dimensional in nature. I say this because the method I am going to describe is based on n‐dimensional geometry. The fact that two one‐dimensional lines are involved is solely due to the arithmetical trick that Gauss used, which in its generalisation is based on mathematical induction.
1.2 1.2 The sum of squares 2∑n
The problem for q = 2 is only slightly more complex if we use a suitable geometrical illustration, which in this case has to be two‐dimensional (see Figure A3.2).
Closer inspection of Figure A3.2 reveals:
(1) The sum of each vertical column, and hence as well of the diagonal, is given by
(2) All horizontal rows are n‐fold repeats of their respective positional number k in the vertical column. Their sum for (any) position k is n·k(p.653)
(3) 3) The total sum of all numbers in the table is n 1∑n = n2(n + 1)/2.
(4) The sum of all squares, i.e. , is then represented by all red colored numbers of the lower‐lef triangle plus the diagonal.
These are the conditions on which the construction of the diagram is based. In order to calculate the sum of squares , we need a fifth non‐trivial statement that can easily be proven by mathematical induction. This statement reads:
(5) The sum of all red numbers in the lower‐left triangle (plus diagonal) for any n is twice the sum of all black numbers in the upper‐right (shaded) triangle.
The proof by complete induction is greatly aided by the geometrical construction in Figure A3.2. First we compare the sums of each horizontal row of the lower‐left triangle with their complementary vertical sum in the upper‐right triangle. The ratio of horizontal to vertical sum starts with 2:1, followed by 6:3, 12:6 etc., the value of each ratio being two. The same holds for the ratios of the sums in corresponding triangles. The proof can now be completed by induction - in our diagram by transition from n to n + 1. Tis means for the lower‐left triangle the addition of one horizontal row, the sum of which is (n + 1)n. (Note that the diagonal does not belong to either of the two triangles.) For the upper‐right triangle the transition means the addition of one vertical column: , which holds for any number n. It proves that the corresponding horizontal sums in the lower‐lef triangle (plus (p.654) diagonal) of all red numbers are twice as large as those in the upper‐right triangle. Tus, the “geometrical” construction in this case provides a direct visualisation of “complete induction”, yielding .
However, we see clearly from this simplest case of a higher (i.e. two) dimensional representation that we are not dealing with straightforward squares (i.e. n2) but rather with “weighted” ones. The “value” of each number in the k‐th horizontal line is “k”. Tis means a kind of “stretching” of one dimension, which will also occur in all higher‐dimensional cases.
1.3 The sum of cubes 3∑n
The geometrical representation introduced with 2∑n can be generalised by any higher dimension q for calculating q∑n. It appears to be more involved because we now require some spatial imagination. For the sum of cubes 3∑n the procedure is illustrated in Figures A3.3a and b.
The cube consists of n planes, each containing n2 equal numbers k, where k runs from 1 to n. From an inspection of the geometrical model we can make the following statements:
(1) The sum of the cubes of integers 3∑n is given by the sum of all numbers in the (red) shaded planes, which include the “diagonal plane” from the front upper‐left point to the lower‐right edge (Figure A3.3a).
(2) The total sum of all (red and black) numbers in the cube is .
(3) The diagonal, the sum of which again equals 1∑n, but which does not need to be considered separately, divides the cube into two parts, i.e. the numbers in the red shaded area, the sum of which is 3∑n, and the non‐shaded rest of the numbers, the sum of which we call the “rest sum” 3∑rest.
(4) There is no fxed proportion of 3∑rest to 3∑n, as was the case for the sum of squares. It rather depends on n. We therefore proceed to calculate 3∑rest using mathematical induction, which is represented geometrically in Figure A3.3b.
Enlarging the cube from (k − 1)3 to k3 positions requires an addition of three faces, which for convenience I have called the xy, xz and yz faces. Adding the bottom face (xy) is equivalent to adding k3, i.e. k2 (shaded) positions each having the value k For the xz and yz faces we add (2k − 1) sums of integers running from 1 to (k − 1). Since the bottom face belongs to the shaded part, the sums run only from 1 to (k − 1). Furthermore, since the two faces (xz and yz) have one edge in common, there are only (2k − 1) sums running from 1 to (k − 1). Hence 3∑rest in each single step changes by: (p.655)
Now we can add up for the rest sum of the whole cube as defined above:
Knowing the sum of squares we arrive at:
This indicates to us already a symmetry, resulting from the particular geometry. Since the sum 3∑n + 3∑rest yields the “volume” of the whole space 3∑tot= n2 1∑n the sum 3∑n comes out to be .
For n → ∞ the sums 3∑n and 3∑rest converge to the same value yielding .
Let me quote once more from the book of Courant et al.1, who refer in particular to this formula: “It should be remarked that although the principle of mathematical induction sufces to prove the formula once this formula has been written down, the proof gives no indication of how this formula was arrived at in the first place; why precisely the expression [n(n + 1)/2]2 should be guessed as an expression for the sum of the first n cubes, rather than any of the infinitely many expressions of a similar type that could have been considered.”
My first reaction was, too: “Wonders will never cease”. However, geometry offers a suggestive explanation. The particular space I have called 3Stot (for which ) yields a special symmetry among the two sums 3∑n and 3∑rest that is not present for any other dimension. In order to explain this let us now consider first the generalisation to any arbitrary dimension q.
1.4 The general case q∑n
So far I have applied mathematical induction more in a “physical sense”, namely by deriving results simply from quantitative inspection of geometrical structures. Let me conclude now by using mathematical induction to generalise the rules for an application to structures of higher dimensionality that are harder to visualise because of our lack of imagination. Nevertheless, the geometrical interpretation with the help of the three sums q∑tot, q∑n and q∑rest will remain the basis of our more abstract deliberations.
The four‐dimensional hypercube in Figure A3.4a will be the model that applies in a formally analogous way to any dimension q. We define, again, one co‐ordinate where k runs from 1 to n and call it the “vertical” axis. For each value k of the (p.657)
(1) The total sum of numbers in the hypercube q∑tot is given by nq−1 ·5∑n = (nq+1 + nq)/2.
(2) The total sum consist of two parts: , i.e. the sum of powers q of integers 1 to n (the “unknown” we want to determine) and the rest sum .
(3) In Figure A3.4b q∑n is given by all red‐coloured symbols, while q∑rest is represented by all black symbols. Note that in the fgure the colours red and black distinguish two diferent classes of number arrangement.
(4) As introduced with the three‐dimensional case we use the above relation between q∑tot, q∑n and q∑rest in its variational form in order to obtain an explicit expression for q∑rest. Its geometrical interpretation, as given for 3∑rest, is still obvious for Erest, but becomes increasingly cumbersome for larger dimensions because of our inability to imagine high‐dimensional structures and their sharing of common faces with lower hyperplanes.
The variational increase of the total sum when going from (k − 1) to k is:
where (1 − k−1)q can be expressed by its (finite) binomial power series yielding:
yielding the total rest sum by adding up all δ terms:
The equation q∑n = q∑tot − q∑rest then contains as the only unknown q∑n, all other terms referring to sums with powers 〈 q:
Now let us look at hidden symmetries among the various sums.
q∑tot, which is the sum of q∑n and q∑rest,has the simple form of (nq+1 + nq)/2. The sum q∑n, according to the expression given above, is dominated by its “leading term” n(q+1)/(q + 1), which represents its limiting value for large values of n. For n = 100 the leading term is already about hundred times larger than all of the rest. We therefore write:
where q Δn includes all other terms listed above. Then q∑rest can be written as
Here we see immediately the symmetry of q∑n and q∑rest, the leading terms of which become identical for q = 3, where the sum of both equals the leading term of q∑tot, which is given by (n(q+1) + nq)/2. This symmetry is unique forq = 3 because it does not exist for any other arbitrary q value. Moreover, for q = 3, the first term in the above expression for q∑n becomes equal to n2 ·1∑n/2 while qΔn/(q + 1) is about n··1∑n /2, so that the whole expression 3∑n equals ( 1∑n ) 2. For analogous (p.660) reasons the term n·1∑n /2 has to be subtracted from the leading term, so that q∑rest becomes equal to 1∑n· 1∑n−1, again a unique symmetry for q = 3.
Now assume that any symmetry exists that allows us to write:
Representation of the sums by their “leading terms” would yield
which means that
having the only solution:
With this, the uniqueness of q = 3 is reduced to the trivial facts of the theory of natural numbers, namely 3 being the only number that is the sum of all its precursors and 4 being the only number given by x + x = x · x = xx, for x = 2.
The above rules hold also for any higher‐order products q∑= i∑· k∑· 1∑… that can always be reduced to a product of two sums. The general validity depends essentially on the fact that the leading terms for any q 〉 3 are devoid of the above symmetry, since is always (increasingly) larger than 1 for increasing q 〉 3.
In our case the solution for q∑n builds up iteratively from the solutions for smaller q values. Hence it is expressed in terms of powers of nq, nq−1, nq−2 etc. The same result was achieved in the classic solution in terms of Bernoulli numbers Bn defined by the generating function2
with B0 = 1, B1 = −1/2, B2 = 1/6, B4 = −1/30,…
It is interesting to notice certain symmetries, although none of them reaches that of 3/n = (1∑n)2 with its much simpler geometry.
My main incentive for treating the above problem is not the reproduction of expressions already known for the sums of powers of integers, but rather stressing the geometrical visualisation of the problem. Higher‐dimensional spaces are of interest in several parts of this book, for example in Sections 1.2, 1.8, 1.9, 2.9, 3.6, 4.4, 4.5 and 4.8. They are a major tool in treating the problems involved in the “generation of information”.
The final part of the proof, in which we go beyond dimension q = 4 and where our spatial imagination begins to fail, shows how the geometrical depiction converges (p.661) with the known general proof in which the Bernoulli numbers appear. My mathematician colleague and friend Andreas Dress tried to find out whether the proof given here was known in the literature, so that I could quote some reference. However, he could not find any reference, although the general opinion of his colleagues was that “geometrisation” of the problem might be familiar and considered to be “folklore”. To be serious again, I did not expect to present new mathematics in this book and I would be truly surprised if those deliberations have not been made before. Let me come to some conclusion.
1.5 Summary and assessment
The method introduced above involves geometrical visualisation. It uses the word “space” in the mathematical sense as an ordered set with a mathematically well‐defined structure. It is a geometry of sequences of natural numbers dominated by the sequence , calculated by Carl Friedrich Gauss when he - as the story goes - was a youngster at elementary school (see Figure A3.1). Dealing with discrete numbers the geometrical representation has to use point spaces, which in general are hyperspaces of arbitrary dimensionality q, denoted qStot. The index “tot” = total refers to the fact that this space consists of two subspaces: qSn and qSrest, with qSn comprising the set of the first n natural numbers of power q, i.e. and qSrest representing the set of all points o q∑tot a do not belong to q∑n. These spaces have the “seemingly simple” form of hypercubes, e.g. q∑tot = q(n−1 ) 1∑n, with qn−1 being the ((n − 1)-dimensional) hyperplane of the hypercube nq.
Why then do I call q(n−1) 1∑n a “seemingly simple” form of a hypercube? A “true” hypercube should be an analogue of a cube in three‐dimensional space. The quantity qn−1 is indeed a “true” hyperplane of the “true” hyperspace qn, but q(n−1) 1∑n is not because 1∑n = (n2 + n)/2 yields a cube with numbers valued differently in each of the n dimensions. It certainly is a hyperspace with peculiar properties and the total space qStot could be interpreted as a “hypercube” that “changes” its size in some complicated manner. “How” is seen by the calculations presented above, which involve “mathematical induction” in building up the stretched form. The calculated sums are derived from the three hyperspaces qStot, qSn and qSrest, and therefore still represent a “geometrisation” of the problem.
This can be shown in a simple way if we look at the largely dominating leading terms of the sum expressions obtained by the detailed calculations. The leading term is the one with the highest power of k that represents a good approximation at large values of n (such as 100 or more). The maximum error then may be around 1%. The leading terms for all values of q have the general form for q∑n: nq+1/(q + 1), for q∑tot: nq+1/2 and for q∑rest: ((q − 1)/2)·(nq+1/(q + 1)). If we compare the expression for q∑n with that for a “true” hypercube, such as qn, we see that with (p.662) the increase in n the hyperspace qSn undergoes inflation of its size (or volume) in the “vertical” direction in proportion to n/2. This is pictured for a four‐dimensional space (q = 4) in Figure A3.4b, c and d, where the latter two fgures are more an artistic refection on this effect. Compared with a normal hypercube nq the “volume” of which is the sum of n of its hyperplanes, i.e. n × nq−1 = nq, the hyperspace qStot consists of n of its infating hyperplanes n(nq−1n/2) = nq+1/2. Going back to q∑n and q∑rest we see that for q 〉 3 the sum q∑rest infates faster than q∑n by the factor (q − 1)/2. This is the gross behaviour, neglecting all the details of the sums which in the geometrical picture are caused by overlapping of hyperplanes (of all dimensions) planes and edges. If we want to calculate these details precisely we have to apply “mathematical induction” in each step of the iteration because the changes occur by going from k to k + 1 for all k values from 1 to n. A geometrical visualisation of this procedure (for q = 2, 3 and 4) was shown in Figures A3.2, A3.3 and A3.4. The symmetries in q∑n found for those spaces can be traced back to the peculiarities that distinguish the numbers 2, 3 and 4 from larger numbers (see Figure A3.2). I am sure that the geometrisation will uncover further secrets at higher dimensions, but this is not the place to go into more detail.
In physics we do not yet have an answer to the questions “why strings” and “why do we live in a three‐dimensional world for which our brain developed some intuitive access”? Fundamental theories of physics (such as string theory) are based on higher‐dimensional spaces. Maybe the symmetries observed in our geometrical visualisation are of some relevance in this connection. At the “big bang” they started from a singularity that could “materialise” only in the form of a one‐dimensional string infating further to higher dimensions.
In conclusion let me return to Richard Courant1, who has reminded us that “constructive intuition” as much as the “rules of logic [such as mathematical induction] play their part” in the extraordinary efciency of mathematics for the problems of physics, which lef such a deep impression on Eugene Wigner. I met Richard Courant on several visits to the USA. He always was curious to hear about Göttingen, where he lived until 1933. Afer the war, the city of Göttingen made him an honorary citizen. Since the same honour was granted on me, many years later, our pictures now hang in the city hall opposite one another, allowing us to indulge forever in “virtual communication”.
2. Indirect proof*
Cantor's diagonal argument is a famous example of an indirect method of mathematical proof. Reasoning in this case is based on assuming tentatively the contrary (p.663) of the statement to be proven and then demonstrating some absurdity or antinomy in the resulting conclusions. Hence the truth of the original statement is proven as a consequence of the logical principle of the “excluded middle” (tertium non datur). Cantor's diagonal argument3, which I mentioned repeatedly in other sections, has found numerous applications. I choose it as an example in this appendix because of its relevance for an appreciation of biological complexity and its astonishing consequences, discussed in Chapters 4 and 5. The following presentation is based on both an early text by Cantor's contemporary Felix Klein4, one of the great representatives of the Göttingen mathematical school (see Section 3.6), and the book by Courant and Robbins quoted in the first part of this appendix.
Cantor's argument, at first sight, might sound simple. However, it is anything but simple or trivial, and accordingly it found several opponents among his contemporary colleagues, many of them highly distinguished mathematicians. I would rather call the concept ingenious. What is the diference? Something trivial must be both obvious and true, while an ingenious idea cannot be obvious — otherwise anyone could have seen it. Yes, it may even be strange but, nevertheless, it must be true.
Let me therefore go back somewhat further. Cantor3 is one of the founders of set theory and thereby provided a mathematical analysis of infinity. The most prominent example — even for non‐mathematicians — is that of the infinite set of the sequence of positive integers, 1, 2, 3, … referred to in Appendix 1. It has “no end”; that is what the Latin adjective “infinitus” means. Adding 1 to any number n in this sequence defines the next larger integer. And this never reaches an end; but it establishes one of its most important characteristics. The infinite set of all positive (as well as of all negative) integers is “denumerable”.
Infinity is usually expressed by the symbol 8, but this does not mean that ∞ can be thought of being something like an ordinary number. This leads us immediately to some important consequences regarding the power or potency (German: Mächtigkeit) of a set, which is called its “cardinality”. If a set were finite, e.g. if the sequence of positive integers were to stop at any finite natural number n, then the total set of n natural numbers could not be equivalent to any of its proper subsets. But for an infinite set, Cantor showed that this is possible. For instance, there is a one‐to‐one correspondence between the set of all natural numbers and its subset of even numbers:
The two are equivalent denumerable sets.
Cantor denoted the cardinality of a set by the first symbol of the Hebrew alphabet, called aleph (אּ). The “smallest” infinite set is that of natural numbers, being assigned (p.664) the cardinality אּ0. Many other, more complicated, sets have the same cardinality because one can establish a one‐to‐one correspondence with the set of integers, and it is the fact of countability or denumerability that is expressed in the cardinality of a set. Accordingly, אּ0 has all sorts of odd properties, such as
where n is a finite number. The most surprising fact is that the set of all rational numbers also has the same cardinality אּ0. This is surprising because we are used to the fact that rationals distribute themselves (fairly) densely between any two subsequent integers. Yes, there are infinitely more rationals than integers, but they are denumerable because each is made up of a pair of integers (p, q), i.e. p/q, yielding the cardinality of אּ2 0 = אּ0. The denumerability of the set is explained in the scheme shown in Figure A3.5a and b (following Courant et al.1).
The sequence obtained (right‐hand figure) contains every rational number, showing that the total set is denumerable. The important point for the discussion below is that all points can be connected by a line which, however, covers a two‐dimensional plane — according to the unlimited variation of both p and q.
Does the countability of rational numbers mean that any infinite set is denumerable? The answer to this question is Cantor's legacy: There are infinities that are uncountable and therefore must be “bigger” than the infinity of natural numbers. And here is Cantor's proof in the words of his contemporary Felix Klein4: “Let us write the set C1, containing the real numbers 0 〈 x 〈 1, as decimal fractions, all being brought into a denumerable series where the a, b, c, … are numerals 0, 1, …, 9 of any possible choice and sequence” (Figure A3.5 is redrawn from Klein's monograph). Klein continues with a warning against the choice of the numerals 0 and 9
Cantors proof is of an indirect nature. It is shown that by assuming C1 to be denumerable contradictions are introduced. In order to demonstrate this, Cantor considered the “diagonal” consisting of the numerals 0, a1, b2, c3,…, keeping in mind that he was dealing with an infinite number of those numerals. He replaced all the numerals appearing in the diagonal of x by diferent symbols x′ = 0, a′1, b′2,c′3, again continued for the whole infinite set. Two non‐terminating decimal fractions can only equal one another if all numerals agree. This is not the case for x1 and x′1, and neither is it the case for any xi in the infinite set. Hence the decimal fraction x′ difers from all numbers (x1, x2, x3,…) of the system which we had assumed to be denumerable, proving the non‐denumerability of the continuous set C1.
Klein also explains that the x′ obtained is a true decimal fraction, which excludes 0.999…= 1 or any decimal with finite number of places. He emphasises that x′ must difer from all decimal sequences xi contained in the original scheme. I do not want to go into further detail of his exposition, except to mention that he also generalised the sequence of numerals a, b, c,… by complexes of numerals that are surrounded by zeros. Following a suggestion by J. König4 5 he called those complexes “molecules”, perhaps showing a premonition of applications to macromolecular sequences as discussed in Chapters 4 and 5.
In my introduction to this appendix I said that it would not provide any new results to the mathematician. That is not quite true. If he is going to apply his thinking to biology, he should be aware of what we consider to be the true mathematical problems in biology.
(1.) Courant, R. and Robbins, H. (1996). “What is Mathematics? An Elementary Approach to Ideas and Methods”. Revised by Ian Stewart. Oxford University Press, New York.
(2.) Gradshteyn, I. S. and Ryzhik, I. M. (1999). “Tables of Integrals, Series and Products”. Jefrey, A. (ed.). Academic Press, London.
(*) I think that this section should be included in view of the problems treated in this book, although it is a short summary of something treated in detail in the literature.3,4
(3.) Cantor, G. (1874). “Über eine Eigenschaf des Inbegrifes aller reellen algebraischen Zahlen” (“On a Characteristic Property of All Real Algebraic Numbers”). Crelles J. f. Mathematik 77: 258–262. Cantor's main work, in particular the papers on set theory between 1870 and 1900, are discussed in most major textbooks of mathematics.
(4.) Klein, F. (1924). “Elementarmathematik I” in Band XIV der “Grundlehren der Mathematischen Wissenschafen”. pp. 271–288.
(5.) König, J. Quoted in ref. 4, p. 279.