Lev Tarasov Th e World Is Built on Probability

NWA VA Vis VA VA VA VAVAV MV 4A LA

VIVA VAVA + VAVA VAVA" VAVA VAY MMA VA VIVA VM VA VA / y |/\|4

NINN XININ RN NN NN “XN NN NN AINIINININ \ ESS

NUNN NESS

AININ BININ BS NU NBNUNGN “IN UININ UONIN

The World Is Built On Probability

Lev Tarasov

2023

Lev Tarasov

The World is Built on Probability

MIR Publishers Moscow

Translated from the Russian by Michael Burov

First published 1988

Revised from the 1984 Russian edition

This completely digital version typeset in using TEX with EB Garamond font by DAMITR MAZANAV damitr@proton.me

Released on the web by http://mirtitles.org in 2023.

Access the BTEX project files gitlab.com/mirtitles/twibop

Creative Commons by SA 4.0 License

About Mir Titles Project

The Mir Titles project is an attempt to conserve the knowledge in the form of various books that came out during the Soviet era for the future generations. The collection contains books on science, mathematics, philosophy, popular science, history. The collection also has Soviet, Russian and children’s literature. The project would not have been possible without the help we received from friends and contributors from across the world.

Foreword

Iam very happy to release this completely electronic version of one of my favourite books. In this electronic edition all the figures have been reworked in the SVG format using Inkscape for a clearer presentation. in BTEX. - DAMITR

Contents

Preface 7 I Tamed Chance 1 1 Mathematics of Randomness 17 V1 ‘Probability <4 4 da ¥ hdesi aS ead Baw oe ea tok es 17 1.2 RandomNumbers...............2.020005 28 13> Randomy Events’. <<. .-6 606.8 dt ek Bo eh aes 33 1.4 DiscreteRandom Variables ..............0.-. 38 1.5 Continuous Random Variables ............... 46 2 Decision Making 53 2.1 These Difficult Decisions ..............000. 53 2.2 Random Processes with Discrete States ........... 59 2.3. Queueing Systems. 2c be ee ee ee he ee eR ES 66 2.4 Method of Statistical Testing ............000. 76 2.5 Gamesand Decision Making ............004. 84 3 Control and Self-control 97 3.1 TheProblemofControl.................0.0. 97 3.2 From the “Black Box” to Cybernetics .. 2... 2.004. 101

CONTENTS

3.3." Information: es-4. eee ae yee a YO Eee 105

3.4 Selection of Information from Noise ............ 120

3.5 On the Way to a Stochastic Model of the Brain .. 1... 125

II Fundamentality of the Probability Laws 133

4 Probability in Classical Physics 135

4.1 Thermodynamics andIts Puzzles ..... 2.0.0.0... 136

4.2 Molecules in a Gas and Probability ............. 146

4.3 Pressure and Temperature of anIdealGas.......... 158

AA. Fluctuations 4233.0.) ele 6, ol tle eo BSS ee 161

4.5 Entropy and Probability... ......... 0.0000. 169

4.6 EntropyandInformation ........... 0.0000. 177

5 Probability in the Microcosm 183

5.1 Spontaneous Micro-processes .. 6... ee ee ee 184

5.2 From Uncertainty Relations the Wave Function ...... 195

5.3 Interference and Summing Probability Amplitudes... . . 200

5:4. “Probability and Causality’ 122. s-Nt- desi kel oad 208

6 Probability in Biology 213

G:l (Introduction: 42! avue se ete hoe ee Ee ge eS 213 6.2 The Patterns After the Random Combination of Genes in

Crossbreeding es: :s0s eae -a east toas se whee eet par 8 220

G3» Mutations: 24. gk ade eee? ane Pav Sy 230

6.4 Evolution Through the Eyes of Geneticists ......... 235

A Concluding Conversation 241

Recommended Literature 249

Preface

...In nature, where chance also seems to reign, we have long ago demonstrated in each particular field the inherent necessity and regularity that asserts itself in this chance.

F. Engels

A vast concourse of events and phenomena occur in the world around us. The events are interrelated: some are effects or outcomes of others which are, in turn, the causes of still others. Gazing into this gigantic whirlpool of interrelated phenomena, we can come to two significant conclusions. One is that there are both completely determined (uniquely defined) outcomes and ambiguous outcomes. While the former can be precisely predicted, the latter can only be treated probabilistically. The second, no less essential conclusion is that ambiguous outcomes occur much more frequently than completely determined ones. Suppose you press a button and the lamp on your desk lights up. The second event (the lamp lights up) is the completely determined result of the first event (the button is pressed). Such an event is called a completely determined one. Take another example: a die is tossed. Each face of the die has a different number of dots. The die falls and the face with four dots ends up at the top. The second event in this case (four dots face-up) is not the completely determined outcome of the first event (the die is tossed). The top face may have contained one, two, three, five, or six dots. The event

PREFACE

of appearance of the number of dots on the top face after a die is tossed is an example of a random event. These examples clearly indicate the difference between random and completely determined events.

We encounter random events (and randomness of various kinds) very often, much more frequently than is commonly thought. The choice of the winning numbers in a lottery is random. The final score of a football match is random. The number of sunny days at a given geographical location varies randomly from year to year. A set of random factors underlies the completion of any service activity: delivery ambulance arrival, telephone connection, etc.

Maurice Glaymann and Tamas Varga have written an interesting book called Les Probabilités a lécole (Probability in Games and Entertainment), in which they make an interesting remark:

“When facing a chance situation, small children think that it is possible to predict its outcome. When they are a bit older, the believe that nothing can be postulated. Little by little they discover that there are patterns hiding behind the seeming chaos of the random world, and these patterns can be used to get their bearings in reality.”

There are three distinct stages here: lack of understanding of the random at first, then mere confusion, and finally a correct viewpoint. Let us forget small children for a time and try to apply this to ourselves. We shall have to recognize that frequently we stop at the first stage in a simple-minded belief that any outcome can be precisely predicted. The misconception that randomness is simply equal to chaos, or the absence of causality, has lasted a long time. And even now not everybody clearly appreciates that the abundance of random events around us conceal definite ( probabilistic) patterns.

These ideas prompted me to write this book. I want to help the reader discover for himself the probabilistic nature of the world around us, to in- troduce random phenomena and processes, and to show that it is possible to orient oneself in this random world and to operate effectively within it.

This book begins with a talk between myself and an imaginary reader about the role of chance, and ends with another talk about the relationship between randomness and symmetry. The text is divided into two major parts. The first is on the concept of probability and considers the various applica- tions of probability in practice, namely, making decisions in complicated situations, organizing queues, participating in games, optimizing the control of various processes, and doing random searches. The basic notions of cyber- netics, information theory, and such comparatively new fields as operations research and the theory of games are given. The aim of the first part is to con- vince the reader that the random world begins directly in his own living room because, in fact, all modern life is based on probabilistic methods. The second part shows how fundamental chance is in Nature using the probabilistic laws of modern physics and biology as examples. Elements of quantum mechanics are also involved, and this allows me to demonstrate how probabilistic laws are basic to microscopic phenomena. The idea was that by passing from the first part of the book to the second one, the reader would see that probability is not only around us but is at the basis of everything.

In conclusion I would like to express my gratitude to everyone who helped me when writing this book. I.I. Gurevich, Corresponding Member of the USSR Academy of Sciences, gave me the idea of writing this text and gave me a number of other provoking ideas concerning the material and structure of the book. B.V. Gnedenko, Member of the USSR Academy of Sciences, G.Ya. Myakishev, D.Sc. (Philosophy), and O.F. Kabardin. Cand. Sc. (Physics and Mathematics) read the manuscript thoroughly and made valuable remarks. V.A. Ezhiv and A.N. Tarasova rendered me constant advice and unstinting support the whole time I was preparing the text.

PREFACE

Part I

Tamed Chance

Introduction

And chance, inventor God ... A. S. Pushkin

A Discussion on the Role of Chance

AUTHOR: “You wrote some nice words about chance in the Preface. In spite of them, I still think chance plays a negative role on the whole. Naturally, there is good luck, but everybody knows it is better not to count on it. Chance interferes with our plans, so it’s better not hang on it, we should rather ward it off as much as possible.”

AUTHOR: “That is exactly the traditional attitude towards the ran- dom. However, it is an attitude we must clearly review. First of all, is it really possible to get by without the random?”

READER: “I don’t say that it’s possible. I said we should try.”

AUTHOR: “Suppose you work at an ambulance centre. Obviously, you cannot foresee when an ambulance will be needed, where it will be necessary to send it to, and how much time the patient

INTRODUCTION

will require. But a practical decision depends on all of these points. How many doctors should be on duty at anyone time? On the one hand, they should not be idle waiting for calls for long periods of time, yet on the other hand, patients should not have to remain without aid for too long. You cannot avoid chance. What I am trying to say is: we cannot e/iminate chance, and so we must fake it into account.”

READER: “True, we have to make peace with chance in this example. However, it still is a negative factor.”

AUTHOR: “Thus, we see that sometimes we have to take chance into consideration rather than control it. But we can go further. We can discover situations in which chance becomes a positive factor rather than a negative one, so that it is desirable to raise the level of the random threshold.”

READER: “I don’t understand you.”

AUTHOR: “Of course, chance occasions interfere with our plans. At the same time because it makes us new solutions and improve our ability to create”.

READER: “Do you mean an improvement is obtained by overcoming difficulties?”

AUTHOR: “The main point is that randomness can create new pos- sibilities. An American writer has written an interesting science fiction story. A group of scientists with various disciplines is officially informed that a sensational discovery has been made, but unfortunately the discoverer died in an explosion during a demonstration of the phenomenon and thus the secret was lost. In reality neither the invention nor the inventor ever existed. The scientists were presented with the evidence of a tragedy: indistinct fragments of records, a library, and an equipped labo-

ratory. In other words, the scientists were given a vast quantity of unconnected information with chance data from various fields of science and technology. The evidence could be called informational noise. The scientists were certain a discovery had been made, and therefore the target was achievable. They uti- lized all the information at their disposal and ‘revealed’ the secret of the non-existing invention. We might say that they succeeded in sifting information from the noise.”

READER: “But that’s only a science fiction story.”

AUTHOR: “True. However, the idea behind the story is far from being fiction. Any discovery is related to the use of random factors.”

READER: “I don’t think anyone can discover anything important unless he or she has a professional grasp of the subject.”

AUTHOR: “I think so too. Moreover, a discovery requires both ex- pertise on the part of the researcher and a certain level of the development within the science as a whole. And yet ..., random factors play a fundamental role in that.”

READER: “As I understand, the word fundamental means something primary, something at the basis. Can you apply the term funda- mental to something random? I admit that randomness may be useful. But can it be fundamental? In the last analysis, we deal with random variables when there is something we do not know and cannot take into account.

AUTHOR: “By believing that randomness is related to inadequate knowledge, you make it swjectzve. It follows that you believe that randomness appears, as it were, on the surface and that there is nothing random at the basis of phenomena. Is it correct?”

INTRODUCTION

INTRODUCTION

READER: “Precisely. That is why we cannot assert randomness is fundamentality. As science develops, our ability to take different factors into account increases, and the result is that the domain of random variables will gradually recede. There is sense in saying that science is the enemy of chance.

AUTHOR: “You're not quite right. Indeed, the advance of science en- hances our ability to make scientific predictions, that is, science is against the random factor. But at the same time, it turns out that while our scientific knowledge becomes deeper, or, more accurately, while we look at the molecular and atomic aspects of phenomena, randomness not only does not become less impor- tant, bit on the contrary, it reigns supreme. Its existence proves to be independent of the degree of our knowledge. Randomness reveals its fwndametality at the level of the microcosm.”

READER: “This is the first time I’ve heard someone say that. Please tell me more.”

AUTHOR: “Let me say at once that this topic has had a long history. It was first formalized in Ancient Greece with two approaches to the random being stated. The two views are associated with the names of Democritus and Epicurus. Democritus identi- fied the random with the wzknown, believing that Nature is completely deterministic. He said: People have created an idol out of the random as a cover for their inability to think things out. Epicurus considered that the random is inherent in various phenomena, and that it is, therefore, object/ve. Democritus’s point of view was preferred for a long time, but in the 20" cen- tury, the progress of science showed that Epicurus was right. In his doctoral thesis Difference Between the Democritian and Epicurian Philosophy on Nature (1841), Karl Marx positively evaluated Epicurus’s view of the random and pointed out the

deep philosophical significance of the teachings of Epicurus on the spontaneous displacement of atoms. Of course, we should not exaggerate the contribution of Epicurus to our understand- ing of the random because he could only guess.”

READER: “It turns out that I presented Democritus’s views on the random without knowing it. But I would like to have some concrete examples showing the fundamentality of the random.”

AUTHOR: “Consider, for instance, a nuclear-powered submarine. How is the engine started?”

READER: “As far as I understand it, special neutron-absorbing rods are drawn from the core of the reactor. Then a controlled chain reaction involving the fission of uranium nuclei begins ...”

AUTHOR: “(interrupting) Let us try and see how everything begins.”

READER: “After entering a uranium nucleus, a neutron triggers its g disintegration into two fragments and another neutron is re- leased. The neutrons split two more uranium nuclei; four neu- trons are then set free, which in turn split four more nuclei. The Pp process develops like an avalanche.”

AUTHOR: “All right. But where does the first neutron come from?” READER: “Who knows? Say, they come from cosmic rays.”

AUTHOR: “The submarine is deep under water. The thick layer of water protects it from cosmic rays.”

READER: “Well then, I don’t know...”

AUTHOR: “The fact is that a uranium nucleus may either split be- cause a neutron enters it or it may decay spontaneously. The process of spontaneous nuclear fission 1s random.”

INTRODUCTION

INTRODUCTION

READER: “But maybe spontaneous nuclear fission is caused by fac- tors we do not know about yet.”

AUTHOR: “This is something physicists have been trying to solve. Many attempts have been made to find the hidden parameters which govern the processes in the microcosm. It has been con- cluded that there are no such parameters, and therefore ran- domness in the microcosm is fundamental. This cornerstone problem is thoroughly treated in guantum mechanics, a theory which appeared in the early 20% century in connection with research on atomic processes.”

READER: “The only thing I know about quantum mechanics is that it describes the laws governing the behaviour of elementary particles.”

AUTHOR: “We shall talk about quantum mechanics in more detail later. Let me only note here that it demonstrates the fundamen- tal role of spontaneous processes and, therefore, demonstrates the fundamentality of the random. The operation of any ra- diation generator, from a vacuum tube to a laser, would be impossible without spontaneous processes. They are funda- mental as the trigger without which the radiation generation would not start.”

READER: “And yet, it is difficult for me to believe that randomness is fundamental. You mentioned a nuclear-powered submarine. When the captain orders that the engines be turned on, he does not rely on a lucky chance. An appropriate button is pressed, and the engines start (if they are in good condition). The same can be said when a vacuum tube is turned on. Where is the randomness here?”

AUTHOR: “Nevertheless, when we consider phenomena in the mi-

crocosm, the processes are triggered by random factors.”

READER: “However, we generally deal with processes occurring in the macrocosm.”

AUTHOR: “Firstly, while studying the world around us and trying to comprehend its cause and effect relations, we must address the atomic level, i. e., the level of microcosm phenomena. Sec- ondly, the randomness in microcosmic phenomena is essentially reflected in the processes observed at the macrocosmic scale.”

READER: “Can you give me an example when the fundamentality of randomness reveals itself at the macrocosmic scale?”

AUTHOR: Evolution, which is a continuous process in both the plant and animal kingdoms, may serve as an example. Evolution relies on mutation, i.e., random changes in the structure of genes. A random mutation may be rapidly spread by the reproduction of the organisms themselves. It is essential that selection occurs simultaneously with mutation. The organisms which contain the random gene are then selected to that those best fitted to their environment survive. In consequence, evolution requires the selection of random gene changes.”

READER: “I don’t quite understand this business of selection.”

AUTHOR: “Here’s an example. The flowers of a certain orchid look like a female wasp. They are pollinated by male wasps which take the flowers to be females. Suppose a mutation occurs, and the shape and colour of the flower are changed. The flower will then remain unpollinated. The result is that the mutation is not passed on to the new generation. It may be said that selection rejected the mutation which changed the outward appearance of the flower. There was a species of orchid which became a self- pollinator, the flowers of this species rapidly acquired diverse

INTRODUCTION

10

INTRODUCTION

shape and colour owing to the mutation.”

READER: “As far as I know, evolution progresses in the direction of the differentiation of species. Doesn’t this show that the mutations underlying evolution are not, in fact, so random?”

AUTHOR: “That argument doesn’t stand to reason. Evolution se- lects the fittest organisms rather than the more complex. Some- times a higher degree of organization is preferable, but some- times this is not the case. This is why human beings, jelly-fish, and the influenza virus can coexist in today’s world. It is essen- tial that evolution leads to the appearance of new species that are unpredictable in principle. It may be said that any species is unique because it occurred fundamentally by chance.”

READER: “I have to admit that the randomness does look to be a fundamental factor indeed.”

AUTHOR: “Since we are discussing the fundamentality of random- ness in the picture of evolution, let me draw your attention to one more important point. Modern science demonstrates that

299

chance and selection are the ‘creator’. READER: “Just as Pushkin said, ‘And chance, inventor God ...”” AUTHOR: “Precisely. This line is strikingly accurate.”

READER: “It appears that when speaking about chance and selec- tion, we should imply the selection of information from noise, shouldn’t we? The same selection that we discussed in connec- tion with the science-fiction story.”

AUTHOR: Absolutely.”

READER: “I have to agree that we should consciously recognize the existence of randomness rather than try and control it.”

11 INTRODUCTION

AUTHOR: “We could say more. Naturally, the randomness which is due to the incompleteness of our knowledge is undesirable. While studying the world, man has fought, is fighting, and will continue to fight it. It should be noted at the same time that there is an objective randomness underlying every phenomena along with the subjective randomness which is due to lack of data on a phenomenon. We should also take into account the positive, creative role of the random. And in this connection it is really necessary to recognize and control randomness. Man should be able, when necessary, to create special situations, abun- dant with the random, and utilize the situation to his own ends.”

READER: “Butis it really possible to treat randomness in such a way? Isn’t it like trying 40 control the uncontrollable?”

AUTHOR: “Both science and daily life indicate that it is possible to orient ourselves consciously in very random situations. Special calculation methods have been developed that depend on ran- domness. Special theories have been produced, such as queueing theory, the theory of games, and the theory of random search, to deal with it.”

READER: “It is hard for me to imagine a scientific theory built on randomness.”

AUTHOR: “Let me emphasize right away that randomness does not preclude scientific prediction. The fundamentality of random- ness does not mean that the world around us is chaotic and de- void of order. Randomness does not imply there are no causal relations. But we shall deal with all that later. It is interesting to try and imagine a world in which randomness as an objective factor is completely absent.”

READER: “This would be an ideally ordered world.”

12

INTRODUCTION

AUTHOR: “In sucha world, the state of any object at a given time would be unambiguously determined by its past states and, in its turn, would determine the future states just as definitely. The past would be strictly connected with the present, as would the present with the future.”

READER: “Anything occurring in such a world would be predeter- mined.”

AUTHOR: “Pierre Laplace, a great French scientist of the 17° cen- tury, suggested in this connection that we imagine a superbeing who knew the past and the future of such a world in every detail. Laplace wrote:”

‘The intellect who could know, at a given moment, every force that animates the nature and the relative positions of its every component, and would, in addition, be vast enough to analyse these data, would describe by a single formula the motions of the greatest bodies in the universe and the motions of the lightest atoms. There would be nothing uncertain for this being, and the future, like the past, be open to his gaze.’

READER: “An ideally ordered world is therefore unreal.”

AUTHOR: “As you see, it isn’t hard to feel that the real world should admit the existence of objective randomness. Now let us re- turn to the problem of causal relations. These relations are probabilistic in the real world. It is only in particular cases (for example, when solving maths problems at school) that we deal with unambiguous, strictly determined relations. Here we ap- proach one of the most essential notions of the modern science, the notion of probability.”

READER: “I’m familiar with it. If I throw a die, I can equally expect any number of dots from one to six. The probability of each

13

number is the same and equal to 1/6.”

AUTHOR: “Suppose you stand at the side of a road, with motor-cars passing by. What is the probability of the first two digits in their four digit number being equal?”

READER: “The probability equals 1/10.”

AUTHOR: “Therefore, if you’re patient and observe enough cars, about one tenth of them will have number-plates with the same first two digits, would they? Say, about thirty cars out of 300 will have such plates. Maybe, 27 or 32, but not 10 or 100.”

READER: “I think so.”

AUTHOR: “Butthen there would be no need to stand at the roadside. The result could be predicted. This is an example of probabilistic prediction. Look at how many random factors are involved in this situation. A car could turn off the road before reaching the observer, or another car could stop or even turn back. And nonetheless, both today and tomorrow, about 30 cars out of 300 would have plates with the same first two digits.”

READER: “So, in spite of numerous random factors, the situation has a certain constancy.”

AUTHOR: “This constancy iscommonly called statistical stability. It is essential that statistical stability is observed because of random factors rather than despite them.”

READER: “I hadn’t thought that we deal with probabilistic predic- tions everywhere. They include, for instance, sports predictions and weather forecasts.”

AUTHOR: “You're absolutely right. An important point is that prob- abilistic (statistical) causal relations are common, while those

INTRODUCTION

INTRODUCTION

leading to unambiguous predictions are just a special case. While definite predictions only presuppose the necessity of a phe- nomenon, probabilistic predictions are related simultaneously both with necessity and randomness. Thus, mutations are ran- dom, but the process of selection is governed by laws, that is, it is a necessary prerequisite.”

READER: “I see. The individual acts of the spontaneous fission of uranium nuclei are random, but the development of the chain reaction is unavoidable.”

AUTHOR: “Taken separately, any discovery is random. However, a situation which is favourable for the appearance of such a chance should exist. This chance is determined by the advance of science, the expertise of the researchers, and the level of mea- surement technology. A discovery is random, but the logic of the progress leading to the discovery in the long run is regular, unavoidable, and necessary.”

READER: “Now see why the fundamentality of randomness does not result in the disorder of our world. Randomness and neces- sity are always combined.”

AUTHOR: “Correct. Friedrich Engels wrote in The Origin of the Family, Private Property, and the State (1884): ‘In Nature, where chance also seems to reign, we have long ago demon- strated in each particular field the inherent necessity and regu- larity that asserts itself in this chance.’ The Hungarian mathe- matician A.Rényi wrote about the same thing in an interesting book Letters on Probability”:

‘Tcame across Contemplations by Aurelius and acciden- tally opened the page where he wrote about two possibili- ties: the world is either in vast chaos or, otherwise, order and regularity reign supreme. And although I had read

15

these lines many times, it was the first time that I thought over why Marcus Aurelius believed that the world should be dominated by either chance or order. Why did he be- lieve that these two possibilities are contradictory? The world is dominated by randomness, but order and regular- ity operate at the same time, being shaped out of the mass of random events according to the laws of the random.’

READER: “As far as I understand, order and regularity are produced.

from a mass of random events, and this leads to the concept

probability.

AUTHOR: “You're absolutely right. dividual factors vary from

case to case. At the same time, the picture as a whole remains stable. This stability is expressed in terms of probability. This is why our world proves to be flexible, dynamic, and capable of advancing.”

READER: “It follows that the world around us may justly be said to

be a world of probability.”

AUTHOR: “It is better to speak of the world as being built on prob-

ability. When we examine this world, we shall concentrate on two groups of questions. Firstly, I shall show how man, owing to his use of probability in science and technology, was able to tame randomness and thus turn it from being his enemy into an ally and friend. Secondly, using the achievements of mod- ern physics and biology, I shall demonstrate the probabilistic features of the laws of nature. In consequence, I shall show that the world around us (including both the natural and artificial

world) is really built on probability.”

INTRODUCTION

Chapter I

Mathematics of Randomness

This doctrine, combining the accuracy of mathematical proofs and the uncertainty of chance occasions and making peace between these seemingly contradictory elements has a full right to contend for the title of the mathematics of the random. Blaise Pascal

Probability

Classical definition of probability. When we toss a coin, we do not know which will land face up, heads or tails. However, there is something we do know. We know that the chances of both heads and tails are equal. We also know that the chances of any of the faces of a die landing face up are equal. That the chances are equal in both examples is due to symmetry. Both the coin

17

18

MATHEMATICS OF RANDOMNESS

and the die are symmetrical. When two or more events have equal chances of occurring, we call them equally possible outcomes. Heads or tails are equally possible outcomes. Suppose we are interested in a certain result while throwing a die, for instance, a face with a number of dots exactly divisible by three. Let us call outcomes satisfying such a requirement favourable. There are two favourable outcomes in our example, namely, a three or a six. Now let us call outcomes exclusive if the appearance of one in single trial makes it impossible for the others to appear at the same trial. A die cannot land with several faces up, so they are exclusive outcomes.

We can now formulate the classical definition of probability:

The probability of an event 1s the ratio of the number of favourable

outcomes to the total number of equally possible exclusive outcomes.

Suppose P, is the probability of an even A, m4 is the number of favourable outcomes, and 7 the total number of equally possible and exclusive outcomes. According to the classical definition of probability M4.

B, = (1.1)

n

If my, =n, then P; = 1 and the event A is a certain event (it always occurs in every outcome). If m, = 0, then P, = 0, and the event 4 is an impossible event (it never occurs). The probability of a random event lies between 0 and 1.

Let an event 4 be throwing a die and getting a number exactly divisible by three. Here my = 2 and so the probability of the event is 1/3, because n = 6. Consider one more example. We have a bag with 15 identical but differently coloured balls (seven white, two green, and six red). You draw a ball at random. What is the probability of drawing a white (red or green) ball? Drawing a white ball can be regarded as an event A, drawing a red ball is an event B, and drawing a green ball is an event C. The number of favourable

outcomes of drawing a ball of a certain colour equals the number of balls of this colour, i. e., 4 = 7, mg = 6, and mc = 2. Using (1.1) and given n = 15, we can find the probabilities:

Addition and multiplication of probabilities. What is the probability that a randomly drawn ball will be either red or green? The number of favourable outcomes is mg + mc = 6 + 2 = 8, and therefore the probability

will be Mp +m 8 Pic = ar re

We see that 23,¢ = 2 + Po. The probability of drawing either a red or a green ball is the sum of two probabilities: the probability of drawing a red ball and that of drawing a green ball. The probability of drawing a ball which is either red or green or white is the sum of three probabilities, P, + 2; + 2. It is equal to unity (7/15 +2/5 + 2/15 = 1). This stands to reason because the event in question will always occur.

The rule for adding probabilities can be formulated as follows:

The probability of one event of several exclusive events occurring 1s the sum of the probabilities of each separate event.

Suppose that two dice are thrown. What is the probability of getting two fours at the same time? The total number of equally possible exclusive outcomes is 2 = 6 x 6 = 36. Each one is listed in Figure 1.1, where the left figure in the parentheses is the number on one die, and the right figure is the number on the other. There is only one favourable outcome, and it is indicated in Figure 1.1 as (4,4). Hence, the probability of the event is 1/36. This probability is the product of two probabilities: the probability of a four

(1,1) (1,2) (1,3) (1,4) (1,5) (1,6)

Figure 1.1: Possible outcomes of rolling a die.

(2,1) (2,2) (2,3) (2,4) (2,5) (2,6)

(3,1) (3,2) (3,3) (3,4) (3,5) (3,6)

PROBABILITY (41) (5,1) (6,1) (4,2) (5,2) (6,2) (4,3) (5,3) (6,3) (4,4) (5,4) (6,4) (4,5) | (5,5) (6,5) (4,6) (5,6) (6,6)

20

MATHEMATICS OF RANDOMNESS

appearing on one die and that of a four on the other i.e.

1 1 6 36°

x

1 Ry=RxR=z

The rule for multiplication of probabilities can be formulated as follows:

The probability of several events occurring simultaneously equals the product of the probabilities of each separate event.

By the was, it is not necessary for the events to be simultaneous. Instead of throwing two dice at the same time, we could throw a single die twice. The probability of getting two fours at the same time when two dice are thrown is the same as the probability of getting two fours when one die is thrown twice.

In many cases both rules (addition and multiplication of probabilities) are used jointly to calculate the probability of an event. Suppose we are interested in the probability P of the same number coming up on two dice. Since it is only essential that the numbers be equal, we can apply the rule for adding probabilities,

P=Fy + Py + Py t By + Bs + Lec.

Each of the probabilities P, is, in turn, a product P. x B.. Hence

P= (P,xB) + (A xB) +4 (RXR) =6(2% 2) = 7

6 Go 6 This result can be obtained right away from Figure 1.1, where the favourable

outcomes are shown in the gray, (1,1), (2,2), (3,3), (4,4), (5,5), and (6,6). The total number of such outcomes is six. Consequently, P = 6/36 = 1/6.

Frequency and probability. The classical definition of probability

and the rules for addition and multiplication of probabilities can be used to

21

Mp/100 0.24

0.22 0.20

0.18

0.16

0.14 0.12 = 0.167

0.10

calculate the probability of a random event. However, what is the practical value of such calculations? For instance, what does it mean in practice that the probability of getting a four when a die is thrown equals 1/6? Naturally, the assertion does not imply that a four will appear once and only once in any six trials. It is possible that it will appear once, but it is also possible that it will appear two (or more) times, or that it will not appear at all. In order to discover the probability of an event in practice we should perform a large number of trials and calculate how frequently a four appears.

Let us perform several sets of trials, for instance, throwing the die 100 times in each set. Let us designate (4, to be the number of times a four appears in the first set, 4, to be the number of fours in the second set, etc. The ratios 4, / 100, 14, /100, 4, /100, ... are the frequencies with which a

PROBABILITY

Figure 1.2: Outcomes of rolling a dice many

times.

22.

Figure 1.3: Frequencies of outcomes of trials of a die as a function of number of trials. Note how the deviation of the frequency of the occurrence of an event from its probability decreases as the number of trials increases.

MATHEMATICS OF RANDOMNESS

So N rare

in each set

0.20

0.19

A

Frequency of desired outcome Oo ms Co

0.16 | 1/6 = 0.167

N

100 500 1000 1500 2000 2500 Number of trials

four appeared in each set. Having performed several sets of trials, we can see that the frequency of the appearance of a four varies from set to set in a random fashion in the vicinity of the probability of the trials, we can see that the frequency of the appearance of a four varies from set to set 12 a random fashion in the vicinity of the probability of the given event, i.e. in the vicinity of 1/6. This is clear from Figure 1.2, where the number & of sets of trials is plotted along the abscissa axis and the frequencies with which a four appears along the axis of ordinates.

Naturally, if we perform the experiment again, we will get other values of

23

the frequencies 14, /100. However, the pattern of oscillations of the frequen- cies of the event under consideration will be stable: the deviations upwards and downwards from the straight line 4.4, which is associated with the proba- bility of the event, will balance. Even though the amplitudes of the deviations will vary from set to set, they will not tend to grow or decrease. This is a consequence of the equivalence of each set of trials. The number of trials in each set is the same, and the results obtained in a given set do not depend on the results in any other set.

Let us make an important change in that we gradually increase the number of trials in each set. Using the results of our previous experiment, as presented in Figure 1.2, let us obtain a new result by adding the value of a set of trials to the result of the preceding sets. In other words, we calculate the number of fours in the first 100 trials (in our case, 14, = 22), then the number of fours in the first 200 trials (14, + 14, = 22 + 16 = 38), the in the first 300 trials (M, + M4, + M; = 22 + 16 + 18 = 56) etc. We then find the frequencies of getting a four in the each new set: 14/100 = 0.22, (4, + 4) /200 = 0.19, (M, +, +) /300 = 0.187, etc. These frequencies are plotted in Figure 1.3 against the number of trials in each set (100, 200, ..., 2500). The figure demonstrates a crucial fact: the deviation of the frequency of the occurrence of an event from its probability decreases as the number of trials increases. In other words,

frequency of the occurrence of a random event tends to its probability with increasing number of trials.

Is it possible to give a definition of probability based on frequency? Since the frequency of the occurrence of a random event tends to its prob- ability as the number of trials increases, we might well ask whether we can define the probability of an event as the limit of the ratio of the number of its occurrence to the number of trials as the number of trials lends to infinity. Suppose N is the number of trials and 4, (NV) is the number of occurrence

PROBABILITY

24

MATHEMATICS OF RANDOMNESS

of an event A. We want to know whether we can define the probability P, of the event A as

P, = lim (1.2)

Noo N

2A)

Richard von Mises (1883-1953), a German mathematician of the early 20th century, believed that equation (1.2) could be considered a definition of the probability of a random event, and he called it the frequency definition of probability. Von Mises pointed out that the classical definition of probability (1.1) only “works” when there is a fizte number of equally possible outcomes. For instance, situations involving the throwing of coins or dice.

However, we often encounter situations without the symmetry that determines whether the outcomes are equally possible. These are the cases when we cannot apply the classical definition of probability. Von Mises assumed that then the frequency definition can be used because it does not require a finite number of equally possible outcomes and, moreover, does not require any calculation of probability at all. A probability using the frequency approach is determined by experiment rather than being calculated.

However, is it possible to determine the probability of a random event in practice using (1.2)? The relationship presupposes an 77/inite number of identical trials. In practice, we must stop at a_/77zte number of trials, and it is debatable what number to stop at. Should we stop after a hundred trials, or is it necessary for there to be a thousand, a million, or a hundred million? And what is the accuracy of the probability determined in such a way? There are no answers to these questions. Besides, it is not practicable to provide the same conditions while performing a very large number of trials, to say nothing of the fact that the trials may be impossible to repeat.

Consequently, relationship (1.2) is practically useless, moreover it is pos- sible to prove (though I shall not do so) that the limit in (1.2) does not strictly

25

speaking exzst. This means that the Von Mises’s error was that he made an unwarranted generalization from a correct proposition: he concluded that the probability of a random even is the limit of the frequency of its occurrence when the number of trials tends to infinity from the correct observation that the frequency of the occurrence of a random even approaches its probability as the number of trials increases.

Geometrical definition of probability. Suppose that two people have agreed to meet at a certain place between nine and ten o'clock. They also agreed that each would wait for a quarter of an hour and, if the other didn’t arrive, would leave. What is the probability that they meet? Suppose «x is the moment one person arrives at the appointed place, and y is the moment the other arrives. Let us consider a point with coordinates (x, y) on a plane as an outcome of the rendezvous. Every possible outcome is within the area of a square each side of which corresponds to one hour ( Figure 1.4). The outcome is favourable (the two meet) for all points (x, y) such that y| < 1/4. These points are within the blue part of the square in the Figure 1.4.

All the outcomes are exclusive and equally possible, and therefore the probability of the rendezvous equals the ratio of the blue area to the area of the square. This is reminiscent of the ratio of favourable outcomes to the total number of equally possible outcomes in the classical definition of probability. It should be borne in mind that this is a case where the number of outcomes (both favourable and unfavourable) is infinite. Therefore, in- stead of calculating the ratio of the number of favourable outcomes to the total number of outcomes, it is better to consider here the ratio of the area containing favourable outcomes to the total area of the random events.

It is not difficult to use Figure 1.4 and find the favourable area; it is the difference between the area of the whole square and the unhatched area, i.e. 1 - (3/4)° = 7/164". Dividing 7/16" by 14°, we find the probability of the rendezvous to be 7/16.

This example illustrates the geometrical definition of probability:

PROBABILITY

26 MATHEMATICS OF RANDOMNESS

yh

10

Figure 1.4: Finding the probability using the geometrical method.

The probability of a random event is the ratio of the area favourable

for an event to the total area of events.

The geometrical definition of probability is a generalization of the classical definition for the case when the number of equally possible outcomes is infinite.

The development of the concept of probability. Although probabilis-

tic notions were used by ancient Greek philosophers (such as Democritus,

27

Epicurus, and Carus Lucretius), the theory of probability as a science be- gan in the mid-17" century, with the work of the French scientists Blaise Pascal and Pierre Fermat and the Dutch scientist Christian Huygens. The classical definition for the probability of a random event was formulated by the Swiss mathematician Jacob Bernoulli in Avs conjectandi (The Art of Conjectures). The definition was given its final shape later by Pierre Laplace. The geometrical definition of probability was first applied in the 13° century. Important contributions to probability theory were made by the Russian mathematical school in the 19° century (P.L. Chebyshev, A.A. Markov, and A.M. Lyapunov).

The extensive employment of probabilistic concepts in physics and tech- nology demonstrated, by the early 20 century, that there was a need for a more refined definition of probability. It was necessary, in particular, in order to eliminate the reliance of probability on “common sense”. An unsuccessful attempt to give a general definition for the probability of a random event on the basis of the limit of its frequency of occurrence was made, as we have seen, by Richard von Mises. However, an axiomatic approach rather than a frequency one resulted in more refined definition of probability. The new approach was based on a set of certain assumptions (axioms) from which all the other propositions are deduced using clearly formulated rules.

The axiomatic definition of probability now generally accepted was elaborated by the Soviet mathematician A.N. Kolmogorov, Member of the USSR Academy of Sciences, in The Basic Notions of the Probability Theory (1936, in Russian). I shall not discuss the axiomatic definition of probability because it would require set theory. Let me only remark that Kolmogorov’s axioms gave a strict mathematical substantiation to the concept of probability

and made probability theory a fully fledged mathematical discipline.

The existence of several definitions for the same notion (probability) should not worry the reader.

PROBABILITY

28

MATHEMATICS OF RANDOMNESS

As L.E. Maistrov put itin Te Development of the Notion of Probability (Nauka, Moscow, 1980):

“There are many definitions of notions, and this is an essential feature of modern science. Hence the notion of probability is no exception. Modern definitions in science represent diverse viewpoints, of which there may be very many for a fundamental notion, and each view re- flects a property of the defined notion. This includes the notion of probability.”

Let me add that new definitions for a notion appear as our understanding of it becomes deeper and its properties are made clearer.

Random Numbers

Random Number Generators. Let us put ten identical balls numbered from 0 to 9 into a box. We take out a ball at random and write down its number. Suppose it is five. Then we put the ball back into the box, stir the balls well, and take out a ball at random. Suppose this time we get a one. We write it down, put the ball back into the box, stir the balls, and take out a ball at random again. This time we get a two. Repeating this procedure many times, we obtain a disordered set of numbers, for instance: 5, 1, 2, 7, 2, 3, 0, 2, 1, 3, 9, 2, 4, 4, 1, 3, ... This sequence is disordered because each number appeared at random, since each time a ball was taken out at random from a well-stirred set of identical balls.

Having obtained a set of random digits, we can compile a set of random numbers. Let us consider, for instance, four-digit numbers. We need only separate our series of random numbers into groups of four digits and consider each group to be a random number: 5127, 2302, 1392, 4413, ...

Any device that yields random numbers is called a random number generator. There are three types of generators: urns, dice, and roulettes

29

(Figure 1.5). Our box with balls is an urn.

Dice are the simplest random number generators. An example of such a generator is a cube each of whose faces is marked with a different number. Another example is a coin (or a token). Suppose five of the faces of a cube are marked with the numbers 0, 1, 2, 3, 4, while the sixth face is unmarked. Now suppose we have a token one side of which is labelled with 0 and the other with 5. Let us throw the cube and token simultaneously and add together the numbers that appear face up, the trial being discounted when the unmarked face lands face up. This generator allows us to obtain a disordered set of numbers from 0 to 9, which can then be easily used to produce sets of random numbers. A roulette is a circle marked in sectors, each of which is marked with a different number. A roulette has a rotating arrow or rolling ball. A trial involves spinning the arrow and recording the number

A roulette is a circle marked in sectors, each of which is marked with a different number. A roulette has a rotating arrow or rolling ball. A trial involves spinning the arrow and recording the number corresponding to the sector of the roulette circle within which the arrow stops.

Note that a roulette may have any number of sectors. For instance. we could divide a circle into ten sectors and label them from 0 to 9. As a random number generator, our roulette in this case is equivalent to the two generators discussed above:

(1) an urn with ten balls and (2) adie anda token thrown at the same time.

A diagram of these equivalent random number generators is shown in Fig- ure 1.5.

Tables of Random Numbers. An example of a random number table is shown in Figure 1.6. The table consists of three hundred four-digit numbers. Each digit in the table was chosen randomly, as a result of a trial, e.g. throwing a die and a token. Therefore, it is understandable that there is no order in the

RANDOM NUMBERS

30

Figure 1.5: Three types of random number gen- erators: urns, dice and roulettes.

MATHEMATICS OF RANDOMNESS

An Urn With Balls A Roulette

i A Die A Token

31

numbers, and there is no way of predicting which digit will follow a given one. You could compile many tables after many trials. Nevertheless, there will not be even the shadow of order in the sequence of digits.

0655 8453 4467 3234 5320 0709 2523 9224 6271 2607 5255 5161 4889 7429 4647 4331 0010 8144 8638 0307 6314 8951 2335 0174 6993 6157 0063 6006 1736 3775 3157 9764 4862 5848 6919 3135 2837 9910 7791 8941 9052, 9565 «4635. =0653,— 2254 = 5704 = 8865. 2627) 7959 = 33682. 4105 4105. 33187) 4312s «1596 )— 9403 6859s 7802 3180 4499 1437-2851 = 6727, 55580 =: 0368 = 4746 (0604S 7956) 2304 = 8417 4064 4171 7013 4631 8288 4785 6560 8851 9928 2439 1037. 5765 1562 9869 0756 5761 6346 5392 2986 2018 5718 8791 = 0754 = 22222013, «0830S: 0927) Ss (04667526 ~— 6610 5127, 2302) 1392) 4413, (9651 = 8922 )=— 1023 6265S 7877 ~— 4733 9401 2423 6301 2611 0650 0400 5998 1863 9182 9032 4064 5228 4153 2544 4125 9654 6380 6650 8567 5045 5458 1402 9849 9886 5579 4171 9844 0159 2260 1314 2461 3497 9785 S678 4471 2873 3724 8900 7852 5843 4320 4553 2545 4436 9265 6675 7989 5592 3759 3431 3466 8269 9926 7429 7516 1126 6345 4576 5059 7746 9313. 7489 2464 2575 9284 1787 2391 4245 5618 0146 5179 8081 9663361 = (0109— ss 7730S G256)—s «1303S 6503S 4081 = 4754 3010 5081 3300 9979 1970 6279 6307 7935 4977 0501 9599 9828 8740 6666 6692 5590 2455 3963 6463 1609 4242 3961 6247 4911 7264 0247 0583 7679 7942 2482 3585 9123, S014 6328 = 9659S 1863S «0532 G313. 3199 7619 5950 3384 0276 4503 3333 8967 3382 3016 0639 2007 8462 3145 96582, 8605) 7300) = 6298 = 6673 6406 = S951 7427 0456 0944 3058 2545 3756 2436 2408 4477 5707 5441 0672 1281 8897 5409 0653 5519 9720 O11 4745 7979 5163 9690 0413 3043 1014 0226 5460 2835 3294 3674 4995 eS oo ae e220 SCM 5 6751 6447 4991 6458 9307 3371 3243 2958 4738 3996

This is not amazing. A chance is a chance. But a chance has a reverse aspect. For instance, try and count how many times each digit occurs in Figure 1.6. You will find that digit 0 occurs 118 times (the frequency it appears is 118/1200 = 0.099), digit 1 occurs 110 times (the frequency it appears is 0.090), digit 2 occurs 114 times (0.095), digit 3 occurs 125 times (0.104), digit

RANDOM NUMBERS

Figure 1.6: A table of random numbers.

32

MATHEMATICS OF RANDOMNESS

4 occurs 135 times (0.113), digit 5 occurs 135 times (0.113), digit 6 occurs 132 times (0.110), digit 7 occurs 116 times (0097), digit 8 occurs 93 times (0.078), and digit 9 occurs 122 times (0.102). We can see that the appearance frequency for each digit is about the same, i. e. close to 0.1. Naturally, the reader has come to a conclusion that 0.1 is the probability that a digit appears. The reader may say that the appearance frequency of a digit is close to the probability of its appearance over a long series of trials (there are 1200 trials

here).

Although this is natural, we should wonder once (again how an unordered set of random digits can have an inherent stability. This is a demonstration of the reverse aspect of chance and illustrates the determinism of probability.

I advise the reader to “work” a little with a random number table (see Figure 1.6). For instance, 32 numbers out of the three hundred ones in the table begin with zero, 20 begin with 1, 33 begin with 2, 33 begin with 3, 38 begin with 4, 34 begin with 5, 34 begin with 6, 24 begin with 7, 20 begin with 8, and 32 begin with 9. The probability that a number begins with a certain digit equals 0.1. It is easy to see that the results of our count are in a rather good keeping with this probability (one tenth of three hundred is thirty). However, the deviations are more noticeable than in the example considered earlier. But this is natural because the number of trials above was 1200 while here it is much less, only 300.

It is also interesting to count how many times a digit occurs in the second place (the number of hundreds), in the third place (tens), and the fourth place (units). It is easy to see that in every case the frequency with which a given digit appears is close to the probability, i.e. close to 0.1. Thus, zero occurs in the second place 25 times, in the third place 33 times, and in the fourth place 28 times.

An example with the number-plates of motor-cars randomly passing the observer was cited in the introduction. It was noted that the probability that the first two digits in the licence number were identical is 0.1. The probability

33

that the two last digits of the number or two middle digits or the first and the last digit are identical is the same.

In order to see this, we need not observe a sequence of cars passing by. We can simply use a random number table (see Figure 1.6). The four-digit random numbers in the table can be taken as the license numbers of cars randomly passing the observer. We can see that 40 of the 300 number’s have the same two first digits, 28 numbers have the same two last digits, 24 numbers have the same two middle digits, and 32 numbers have the same first and last digits. In other words, the frequencies with which a pair of identical digits appears actually varies around the probability, i.e. in the neighbourhood of 0.1.

Random Events

When we throw a die or take a ball out of an urn we deal with a random event. There are several interesting problems where the probability of a random event is required to be found.

A problem with coloured balls. There are three blue balls and a red ball in a box. You take two balls out of the box at random. Which is more probable: that the two balls are blue or that one is blue and one is red?

People often answer that it is more probable that two blue balls are taken out because the number of blue balls in the box is three times greater than the number of red ones. However, the probability of taking out two blue balls is equal to the probability of taking out a blue and a red ball. You can see this by considering Figure 1.7. Clearly there are three ways in which two blue balls may be chosen and three ways of choosing a blue and a red ball at the same time. Therefore, the outcomes are equally probable.

We can also calculate the probability of the outcomes. The probability

RANDOM EVENTS

( 2 e-.

A.) ie

Figure 1.7: Different ways of taking out two out of three blue and one red balls.

34

MATHEMATICS OF RANDOMNESS

of taking out two blue balls equals the product of two probabilities. The first one is the probability of taking out a blue ball from a set of four balls (three blue ones plus a red one), which is 3/4. The second probability is that of taking out a blue ball from a set of three balls (two blue ones plus a red one) which is 2/3. Consequently, the probability of taking out two blue balls simultaneously is 3/4 x 2/3 = 1/2.

The probability of taking out a blue and a red ball is the sum P,,. + Py, where P,,,, is the probability of taking out a blue ball from a set of four balls (three blue ones plus a red one) multiplied by the probability of taking out a red ball from a set of three balls (two blue ones plus it red one) and P,,,, is the probability of taking outa red ball from a set of four balls (the second all in this case must then bea blue one). In other words, P,,, is the probability of taking out a blue ball first and then a red ball while P,, is the probability of taking out a red ball first and then a blue ball. Inasmuch as P,, = 3/4 x 1/3 = 1/4 and Py, = 1/4, the probability of taking out a pair of differently coloured balls equals 1/4 + 1/4 = 1/2.

Throwing A Die: A Game. There are two players in this game, player A and player B. The die is thrown three times in succession during each turn. If a certain face turns up at least once during a turn (let it be a5), player 4 scores a point. But if the five does not turn up, a point is scored by player B. The game is played until one of them scores, say, a hundred points. Who has the chance of winning greater? Player A or player B?

In order to answer, we first calculate the probability of player 4 scoring a pointin a turn (the die is thrown three times in succession). He receives a point in any of the following three cases: if five turns up in the first trial, if five does not turn up in the first trial but turns up in the second one, and if five does not turn up in the first two trials but turns up in the third one. Let us designate the probability of these three events as P., P,, and P,, respectively. The sought probability is P = PB, + B, + Py. Note that the probability of five appearing when the die is thrown is 1/6, and the probability that five does not appear is

35

5/6. Itis clear that P, = 1/6. To find 2, we should multiply the probability of the absence of a five in the first trial by the probability of its presence in the second trial, B, = 5/6 x 1/6 = 5/36. The probability 2, is the product of the probability of the absence of a five in two trials (the first and the second) and the probability of a five in the third trial, R, = (5/6)? x 1/6 = 25/216. Consequently, P = ) +B + B = 1/64+ 5/6 + 25/216 = 91/216. Since P < 1/2, player B has more chance of winning this game. We could have reached the same conclusion in a simpler way by considering the probability of player B scoring a point after three trials. This is the probability of the absence of five in three trials: p = 5/6x5/6x5/6 = 125/216. Since p > 1/2, player B’s chances are better. Note that P + p = 91/216 + 125/216 = 1. This is natural because one of the players, A or B, must score a point in each turn.

Let us change the rules of the game a little: the die is thrown four times rather than three times in each turn. The other conditions remain the same. The probability of player B scoring a point in aturn is 5/6x5/6x5/6x5/6 = 625 /1296. This is less than 1/2, and therefore now A has a better chance of winning a game.

The Problem Of An Astrologer. A tyrant got angry with an astrologer and ordered his execution. However, at the last moment the tyrant made up his mind to give the astrologer a chance to save himself. He took two black and two white balls and told the astrologer to put them into two urns at random. The executioner was to choose an urn and pick a ball out of it at random. If the ball was white, the astrologer would be pardoned, and if the ball was black, he would be executed. How should the astrologer distribute the balls between the two urns in order to give himself the greatest chance of being saved?

Suppose the astrologer puts a white and a black ball into each urn (Fig- ure 1.8 (a)). In this case, no matter which urn the executioner chooses, he will

draw a white ball out of it with a probability of 1/2. Therefore, the probability

RANDOM EVENTS

36

Figure 1.8: Different ways of arranging two white and two black balls for different proba- bilities of drawing out a ball of a given colour.

MATHEMATICS OF RANDOMNESS

= . Q e i (c) (d)

the astrologer would be saved is 1/2.

The probability of the astrologer being saved will be the same if he puts the two white balls into one urn and the two black balls into the other (Fig- ure 1.8 (b)). His destiny will be decided by the executioner when he chooses an urn. The executioner may choose either urn with equal probability.

The best solution for the astrologer is to put a white ball into one um and a white ball and two black ones into the other urn (Figure 1.8 (c)). If the executioner chooses the first urn, the astrologer will certainly be saved, but if the executioner picks the second urn, the astrologer will be saved with a probability of 1/3. Since the executioner chooses either urn with probability 1/2, the overall probability that the astrologer will be saved is (1/2 x 1) + (1/2 x 1/3) = 2/3.

By contrast, if the astrologer puts a black ball into one urn and a black ball and two white balls into the other (Figure 1.8 (d)), the probability of him being saved will be smallest: (1/2 x 0) + (1/2 x 2/3) = 1/3.

Thus, in order to have the greatest chance of being saved, the astrologer should distribute the balls between the urns as shown in (Figure 1.8 (c)) This

37

is the best strategy. The worst strategy is to distribute the balls as shown in (Figure 1.8 (d)). Of course, the selection of the best strategy does not guarantee the desired outcome. Although the risk is decreased, it still remains.

Wandering In A Labyrinth. A labyrinth with treasure has a death trap, as shown in Figure 1.9. Unlucky treasure-hunters die in the trap. What is the Probability that they will avoid the trap and reach the treasure?

After walking away from the entrance A to point 1 (see Figure 1.9) a treasure-hunter may either go straight ahead (in which case he walks directly into the trap) or turn to the left (in which case he arrives at point 2) We shall suppose he picks either path at random, with equal probability, i.e. with probability 1/2. Alter arriving at point 2, the treasure-hunter may either go straight ahead or turn right or turn left with probability 1/3. The first two paths lead to the trap, while the third path leads to point 3. The probability of someone getting from the entrance 4 to point 3 is the product of the probability of turning left at point 1 and the probability of turning left at point 2, ie. 1/2 x 1/3. It is easy to see now that the probability of reaching point 4 from A is 1/2 x 1/3 x 1/2; the probability of reaching point 5 from A is 1/2 x 1/3 x 1/2 x 1/3; and finally, the probability of reaching the treasure from Ais P* = 1/2x1/3x1/2x1/3x1/2 = 1/72. The only way of getting from the entrance of the labyrinth to the treasure is shown in the figure by the dash line. The probability that a person will follow it is thus P* = 1/72, while the probability of walking into the trap is P~ = 71/72.

The probability P~ was calculated from the fact that P* + P™ = 1. However, we can calculate P~ directly. Let us expand P~ as the sum P™ = P+2B, +P, +P, +B where the P are the probabilities of arriving at point Z from A multiplied by the probability of walking into the trap from point

RANDOM EVENTS

Figure 1.9: The probability of finding the trea-

sure or a trap in a labyrinth.

38

MATHEMATICS OF RANDOMNESS

t = 1,2,3,4,5).

P,=1/2, B= 1/2«2/3,

P, =1/2x1/3x1/2, P,=1/2x«1/3x 1/2 2/3,

Pz = 1/2x 1/3 x 1/2 1/3 «1/2.

You can then find that? +3+2+2).+P =71/72.

Discrete Random Variables

Random Variables. Suppose there is a batch of 100 manufactured articles and 11 articles are rejected as defective, 9 articles are rejected in another batch of the same size, 10 articles are rejected in the third one, 12 articles are rejected in the fourth one, etc. We use to denote the overall number of manufactured articles in a batch and m to denote the number of rejected articles. The number 7 is constant (here z = 100) while the value of m varies from batch to batch in a random manner. Suppose there is a definite probability that there will be m rejected articles in a randomly selected batch of 7 articles.

The number of rejected articles (the variable m) is an example of a random variable. It varies randomly from one trial to another, and a certain probability ts associated with the occurrence of each value of the variable. Note that we are dealing with a discrete random variable here, i.e. it may only take a discrete set of values (the integers from 0 to 100 in this case).

There are also continuous random variables. For instance, the length and weight of newborn babies vary randomly from child to child and may take any value within a particular interval There are some special features of continuous random variables which we shall discuss later; we shall first

39

consider discrete variables.

Expected values and variance of a discrete random variable. Let x be a discrete random variable which may assume 5 values: 21, 2, 2. Xyy5 +X. These values are associated with the probabilities p,, 2), ... Pj,» +. p,- For in- stance, p,, is the probability that a variable is x,,. The sum of all the prob- abilities (p; + p, + ... + p,) is the probability that a trial will give one of the values «,, x, ....%,, (without saying which one). This probability is unity. Consequently,

Stead (13) m=1

The set of probabilities p, + p) + ... + p, (also called the distribution of the probabilities) contains all the information needed about the random variable. However, we do not need all the probabilities for many practical purposes. It is sufficient to know two most important characteristics of a random variable: its expected value (its mathematical expectation) and its variance.

The expected value is an average value of the random variable taken over a large number of trials. We shall use the letter E to denote the expected value. The expected value of a random variable x is the sum of the products of each variable and its probability, i.e.

E(x) = pyr + Pry +. + PX or using the summation sign,

E(x) = ~ pet. (1.4) m=1

We also need to know how a variable deviates from the expected value, or, in other words, how much the random variable is scattered. The expected value of the deviation from the expected value (that is the difference x E(x))

DISCRETE RANDOM VARIABLES

£ The notation 3 means that the summation is

m=1 performed over all m from 1 to s.

40 MATHEMATICS OF RANDOMNESS

cannot be used because it is equal to zero. We can show this as follows:

m=1

-_ 3 Pin *n =£() Ss Pm m= m=1

= E(x) - E(x),

This is why the expected value of the squared deviation (rather than the expected value of the deviation itself) is used, i.e.

vat = 0” = E(x—E(x))? = > Pin (Xn E(x))?. (1.5) m=1

This is the variance of a random variable and we shall use var to denote it. The square root of the variable /var is called the standard (or root-mean-square) deviation o of the random variable. It is easy to show that

var = E(x”) (E(x))’. (1.6)

Two probability distributions are shown in Figure 1.10 (a). The two random variables possess different expected values while having the same

4l

Elm) < E(x)

var % = var X2

E(x) = E(x)

vat 1 < var Xy

E(x), var x, E(x), varx, E(x,), var x, E(x), vat Xo

variance. Looking at Figure 1.10 (b), we can see a different picture: the random variables possess different variances while having the same expected values.

Bernoulli’s Binomial Distribution. Suppose a series of 2 independent identical trials is performed. The trials are independent in the sense that the results of any trial do not influence the results of any other trial. Some trials produce a desired outcome while the rest do not. Let us call the desired outcome, “event U”. This is a random event. Suppose event U occurs in in trials. This is a random variable. Let us consider the probability P, (7) that event U will occur m times in a series of x trials.

This isa commonly occurring situation. Suppose 2 manufactured articles are checked. Then event U is a rejection, and P, (7m) is the probability of in articles being rejected out of a set of in articles Suppose a hospital registers n newborn babies and the event U is the birth of a girl. Hence P, (m) is the probability that there will be m girls in a set of n newborn babies. Suppose in a lottery, 2 tickets are checked, event U is the discovery of a prize-winning ticket, and P, (m) is the probability that m prize-winning tickets will be found out of a total of 2 tickets. Suppose in a physics experiment z neutrons are recorded, the event U is the occurrence of a neutron with an energy within

DISCRETE RANDOM VARIABLES

Figure 1.10: Distributions of random variables with different parameters. (a) shows two distri- butions with different expected values but the same variance, while (b) shows two distributions with different variance but the same expected val- ues.

42

MATHEMATICS OF RANDOMNESS

a certain range, and P, (m) is the probability that m of the x neutrons will possess energies in the range. In all these examples, the probability P, (7m) is described by the same formula which is the bznomial distribution (sometimes named after a 17" century Swiss mathematician called Jacob Bernoulli).

The binomial distribution is derived by assuming that the probability that event U will occur in a single trial is known and does not vary from trial to trial. Let us call this probability ». The probability that event U does not occur in a single trial is g = 1 p. It is important that the probability that an article is rejected does not depend in any way on how many rejected articles there are in the given batch. The probability that a girl is born in any actual delivery does not depend on whether a girl or a boy was born in the previous birth (nor on how many girls have so far been born). The probability of winning a prize neither increases nor decreases as the lottery tickets are checked. The probability that a neutron has an energy in a given range does not change during the experiment.

Now, once the probability p that a certain random event will occur in a single trial 1s known, we find the probability P,(m) of m occurrences in a sertes of n independent identical trials.

Suppose the event U occurred in the first m trials but did not occur in n—m trials, then the probability of the situation would be p”g”"”. Naturally, other orders are possible. For instance, event U may not occur in the first »—m trials and occur in the rest in trials. The probability of this situation is also pq”. There are also other possible situations. There are as many situations as there are ways choosing 7 elements taken in at a time (this is written ( ;, )). The probability of each situation is identical and equals pg”. The order in which event U occurs is inessential. It is only essential that it occurs in m trials and does not occur in the remaining z m trials. The sought probability P,(m) is the sum of the probabilities of each ( ;, ) situation, i.e. the product

43

of pg” and (}, ): P,(m) = ("ena (1.7)

There is a formula for the number of combinations of z elements taken m at a time:

n\_ n! _ n(n-1)(n-2)...(n—m +1) (”) m!(n m)! m! , ae) Here mz! = 1-2-3-...: (read x! as “en factorial”), by convention 0! = 1. Substituting (1.8) into (1.7), we can find P n! mi n-m 1 (m) = y= me qo. (1.9)

This is the bznomial distribution, or the distribution of a binomial random variable. I shall explain this term below, and we shall see that

> 2, (m) =1. (1.10) m=0 A Py (m) % So err | Ponons ®% 0 4 6 8 10 12 14 16 20

By way of example, let us calculate the probability that m girls are born in a group of 20 babies. Assume that the probability of delivering a girl is 1/2,

DISCRETE RANDOM VARIABLES

Figure 1.11: The binomial distribution.

44

MATHEMATICS OF RANDOMNESS

We set p = 1/2 and = 20 in expression (1.9) and consider the integer values of variable m within the range from 0 to 20. The result can be conveniently presented as a diagram (Figure 1.11). We see that the birth of 10 girls is the most probable; the probability of delivering, for instance, 6 or 14 girls is six times smaller.

Ifa random variable has a binomial distribution, then its expected value

m=0

or the product of the number of trials and the probability of the event ina single trial,

E(m) = np. (1.11) The variance of such a random variable is the product of the number of trials, the probability of the occurrence of the event in a single trial, and the probability it does not occur:

var = E(m*) (E(m))? = npq. (1.12)

The normal (Gaussian) Distribution. Probability calculations using the binomial distribution are difficult for large m. For instance, in order to find the probability that 30 girls were delivered from 50 births, you have to calculate i

Pyo(50) = Sp 7594 (0-5) Note that even 20! is a 19—digit number. In such cases one can use a formula which is the limit of the binomial distribution at large x:

_ 1 (m-E (m))?

Ba irene | ~ Qvard?” where E (m) = np and var = npq, and exp = 2.718... is the base of natu- ral logarithms. The distribution defined in (1.13) is called the normal or Gaussian distribution.

(1.13)

45 DISCRETE RANDOM VARIABLES

The Poisson Distribution. If the probability that an event will occur in a single trial is very small (p « 1), the binomial distribution at large becomes the Podsson (rather than the normal) distribution, and is defined as

we

P,(m) =

n

exp(- np). (1.14)

This distribution is also sometimes called the /aw of rare events. It is in- teresting to note that the variance of a random variable with the Poisson distribution equals its expected value.

Two distributions are compared in Figure 1.12. The parameters of the first distribution are 2 = 30 and p = 0.3, and it is close to the normal distribution with the expected value E (m) = 9. The second distribution’s parameters aren = 30 and p = 0.05, and it is close to the Poisson distribution with

E(m) =1355. Pm) Pm) 0.3|- n=30 n=30 =0.3 pr 0.2 0.1 m 01 10 15 0

; F . nae Figure 1.12: The Poisson (right) and Gaussian A Little of Mathematics. The expression (g + p)”, where 7 is a positive (left) disttibutions. a

integer, is called a binomial (two-term) expression of degree x. You should know about the binomial expansions of second and third degrees:

(q +p) =4q +2qp + p's

(qt+p)=q +3qut+3qp +p.

46

MATHEMATICS OF RANDOMNESS

In general (for a random integer 2) the binomial expansion is

n(n 7 1) ze ~m+t 1) gp" +...-ngp" +p".

(q+p)” = q’ tng” pt..+

Using the notation given in (1.8), we can rewrite this formula as

wor (heollerrolelere coolant

Thus from (1.9), we can conclude that

csr eS (arr Sam

m=0

Consequently, the probabilities P, (7) coincide with the coefficients of the binomial expansion, and this is why the bznomial distribution is so called. The probabilities 7 and p in a binomial distribution are such that q + p = 1. Therefore, (g + p)” = 1. On the other hand,

(q+ p)” =D Alm

Hence (1.10).

Continuous Random Variables

Continuous random variables are very unlike discrete ones. A continuous variable can assume any of infinite set of values, which continuously fill a certain interval. It is impossible in principle to list every value or such a variable at the very least because there rs no such thing as two neighbouring values (just as it is impossible to mark two neighbouring points on the number

47

axis). Besides, the probability of a concrete value of a continuous random variable is zero.

Can Probability Of A Possible Event Equal To Zero? You know now that an impossible event has a zero probability. However, a possible event can also have a zero probability.

Suppose a thin needle is thrown many times at random onto a strip of paper on which a number axis is marked. We can regard the x-coordinate of the point where the needle crosses the number axis (Figure 1.13 (a)) to be a continuous random variable. This coordinate varies in a random fashion from one trial to another.

We could also use a roulette instead of throwing a needle. A strip of paper with a numbered line could be. pasted to the circumference of the roulette circle, as shown in Figure 1.13 (b). Wherever the freely rotating arrow of the roulette is pointing when it stops, it yields a number that will be a continuous random variable.

What is the probability of the arrow stopping at a certain point x? In other words, what is the probability that a concrete value x of a continuous random variable is chosen? Suppose the roulette circle’s radius R is divided into a finite number of identical sectors, e.g. 10 sectors (Figure 1.14). The

CONTINUOUS RANDOM VARIABLES

Figure 1.13: The probability that a continuous random variable will take a certain value is zero.

48

Ax

Figure 1.14: A roulette to generate continuous random variables.

A

Figure 1.15: A finite non-zero mass can be gener- ated from the sum of an infinite number of zero masses.

MATHEMATICS OF RANDOMNESS

length of the arc corresponding to the sector equals Ax = 27R/10. The probability that the arrow will stop within the sector hatched in the figure is Ax/2zR = 1/10. Thus, the probability that the random variable will take a value from x to x + Ax is Ax/27R. Let us gradually narrow the range of numbers, i. e. divide the circle into larger numbers of sectors. The probability Ax /27R that any value is in the range from x to x + Ax also will fall. In order to obtain the probability that the variable will take the value x exactly, we must find the limit as Ax 0. In this case, the probability Ax /27R becomes zero. Thus we can see that the probability that a continuous random variable will take a certain value is indeed zero.

That event may be both possible and possess a zero probability may seem paradoxical, but it is not. In fact there are parallels you are surely well aware of. Consider a body of volume V with a mass 4. Let us select a point A within the body and consider a smaller volume 4, which contains the point (Figure 1.15) and assign a mass /, to it. Let us gradually shrink the smaller volume around point 4. We obtain a sequence of volumes containing A, ie. VV, V4, V3, ..., and a corresponding sequence of decreasing masses: M, M,, M,, M;,...,. The limit of the mass vanishes as the volume around 4 contracts to zero. We can see that a body which has a finite mass consists of points which have zero masses. In other words, the nonzero mass of the body is the sum of an infinite number of zero masses of its separate points. In the same way, the nonzero probability that a roulette arrow stops within a given range Ax is the sum of an infinite number of zero probabilities that the arrow will stop at each individual value within the considered range.

The Density Of A Probability. This conceptual difficulty can be avoided by using the idea density. Although the mass of a point within a body is zero, the body’s density at the point is non-zero. If A is the mass of a volume AV within which the point in question is located (we shall describe the point in terms of its position vector r, then the density (r) at this point

49

is the limit of the ratio AM/AV as AV converges to the point at r, i.e., . AM P(e) = Jim, Fy If the volume AV is small enough, we can say that AM = p(r)AV. Using a strict approach, we should substitute AV by the differential dV’.

The mass / of a body occupying volume V is then expressed by the integral:

m= | p(e)4r,

over the volume in question.

Probability theory uses a similar approach. When dealing with continuous random variables, the probability density is used rather than the probability itself. Let f(x) be the probability density of a random variable x, and so by analogy with the mass density we have

ee eee

= i : Ax>0 Ax

Here Ap, is the probability that a random variable will take a value between x and x + Ax. The probability p that a random variable will have a value between x, and 2, is, in terms of probability density, as follows:

pe { flx)dw. (1.15)

If the integration is over the whole range of values a random variable may take, the integral (1.15) will evaluate to unity (this is the probability of a certain event). In the example with a roulette mentioned above, the whole interval is from x = 0 to x = 27R. In general, we assume the interval is infinite, when

Sf (x)dx = 1. (1.16)

CONTINUOUS RANDOM VARIABLES

MATHEMATICS OF RANDOMNESS

The integral is very simple in the roulette example because the probability the roulette arrow stops within an interval from x to x + Ax does not depend on x. Therefore, the probability density does not depend on x, and hence,

A similar situation is encountered when the density of a body is the same at every point, i.e. when the body is uniform (p = M/V). More generally, density o(r) varies from point to point, and so does the probability density

f(x).

The Expected Value And The Variance Of A Continuous Random Variable. The expected value and. variance of a discrete random variable are expressed as sums over the probability distribution (see equations (1.4) to (1.6). When the random variable is continuous, integrals are used instead of sums and the probability density distribution is used rather than the probability

distribution:

E(x) = | xf(x) dx, (1.17) var = | (x- E(x))" f(x) dx (1.18)

The Normal Distribution Of Probability Density. The normal distribution of probability density. When dealing with continuous random variables, we often encounter the zormal distribution of probability density. This distribution is defined by the following expression (compare it with

(1.13)): ; f(x) = . exp [Se ), (1.19)

oV2n 2g"

Here g is the standard deviation = Yvar) the function (1.19) is called the xormal or Gaussian distribution.

The probability density of a continuous random variable is always normal

51

if the variance of its values is due to many different equally strong factors. It has been proved in probability theory that the sum of a large enough number of independent random variables obeying any distributions tends to the normal distribution, and the larger the number of sums the more accurately the normal distribution is.

For instance, suppose we are dealing with the production of nuts and bolts. The scatter of the inside diameter of the nut is due to random de- viations in the properties of the metal, the temperature, vibration of the machine tool, changes in the voltage, wear of the cutter, etc. All of these effects act independently and approximately with the same strength. They are superimposed, and the result is that the inside diameter of the nuts is a continuous random variable with a normal distribution. The expected value of this variable should evidently be the desired inside diameter of the nuts, while the variance characterizes the scatter of the obtained diameters around the desired value.

The Three-Sigma Rule. A normal distribution is shown in Figure 1.16.

CONTINUOUS RANDOM VARIABLES

Figure 1.16: The three-sigma rule for a Gaussian distribution.

52

MATHEMATICS OF RANDOMNESS

It has a maximum at the expected value E(x). The curve (the Gaussian curve) is bell-shaped and is symmetric about E(x). The area under the entire curve, i. e. for the interval (—oo < x < +00), is given by the integral [i fo)dx. Substituting (1.19) here, it can be shown that the area is equal to unity. This agrees with (1.16), whose meaning is that the probability of a certain event is unity. Let us divide the area under the Gaussian curve using vertical lines (see Figure 1.16). Let us first consider the section corresponding to the interval E(x) <x < E(x) +c. Itcan be shown (please believe me) that

E(x)+o

{ Ff (x)dx = 0.683. x)=

in

This means that the probability of x taking a value in the interval from E(x) -—o to E(x) + 7 equals 0.683. It can also be calculated that the probability of x taking a value from E(x) to E(x) + is 0.954, and the probability of x taking a value in the range of E(x) 3a to E(x) + 3a is 0.997. Consequently, a continuous random variable with a normal distribution takes a value in the interval E(x) to E(x) + with probability 0.997. This probability is practically equal to unity. Therefore, it is natural to assume for all practical purposes that a random variable will always take a value in the interval from on the right to on the left of E(x). This is called the three-sigma rule.

Chapter 2 Decision Making

Practical demands brought forth special scientific methods that can be collected under the heading “operations research”. We shall use this term to mean the application of quantitative mathematical methods to justify decisions in every area of goal-oriented human activity.

E. S. Wentzel

These Difficult Decisions

Decision making under uncertain conditions. We often have to make decisions when not all the information is available and this uncertainty always decreases to some extent our ability to decide. For example, where to go for a vacation or holiday? This has worried me many times, since various uncertainties concerning the weather, the hotel, the entertainment at the resort, and so on, must be foreseen. We try and decide on the best variant

53

54

DECISION MAKING

from our experience and the advice of our friends, and we often act “by inspiration”. This subjective approach to decision making is justifiable when the consequences involve ourselves and relatives. However, there are many situations when a decision can affect a large number of people and therefore requires a scientific and mathematically justifiable approach rather than a subjective one.

For instance, modern society cannot function without electricity, stores of food, raw materials, etc. The stores are kept everywhere: at factories, shops, hospitals, and garages. But how large should the stores be in a particular case? It is clear that they should not be too small, otherwise the function of the enterprise would be interrupted. Neither should they also be too large because they cost money to build and maintain: they would be dead stock. Store-keeping is a problem of exceptional importance. It is so complicated because a decision must always be made in conditions of uncertainty.

Two kinds of uncertainty. How should we make decisions under condi- tions of uncertainty? First of all, we should discover which factors are causing the uncertainty and evaluate their nature. There are two kinds of uncertainty. The first kind is due to factors which can be treated using the theory of proba- bility. These are either random variables or random functions, and they have statistical properties (for instance, the expected value and variance), which are either known or can be obtained over time. Uncertainty of this kind is called probabilistic or stochastic. The second kind of uncertainty is caused by unknown factors which are not random variables (random functions) be- cause the set of realizations of these factors does not possess statistical stability and therefore the notion of probability cannot be used. We shall call this uncertainty “bad”.

“So”, the reader may say, “it would seem that not every event that cannot be predicted accurately is a random event.”

“Well, yes, in a way.” Let me explain. In the preceding chapter we dis- cussed random events, random variables, and random functions. I repeatedly

5S

emphasized that there should always be statistical stability, which is expressed in terms of probability. However, there are events, which occur from time to time, that do not have any statistical stability. The notion of probability is inapplicable to such events, and therefore, the term “random” cannot be used here too. For instance, we cannot assign a probability to the event of an individual pupil getting an unsatisfactory mark in a concrete subject. We cannot, even hypothetically, devise a set of uniform trials that might yield the event as one outcome. There would be no sense in conducting such a trial with a group of pupils because each pupil has his or her own individual abilities and level of preparation for the exam. The trials cannot be repeated with the same pupil because he will obviously get better and better in the subject from trial to trial. Similarly there is no way we can discuss the proba- bility of the outcome of a game between two equally matched chess players. In all such situations, there can be no set of uniform trials, and so there is no stability which can be expressed in terms of a probability. We have “bad” uncertainty in all such situations.

Iam afraid we do not consider the notion “statistical stability” and often use expressions such as “improbable”, “probable”, “most probable”, and “in all probability” to refer to events that cannot be assigned by any probability. We are apt to ascribe a probability to every event even though it might not be predictable. This is why it became necessary to refine the notion of probability early this century. This was done by A.N. Kolmogorov when he developed

an axiomatic definition of probability.

Options and the measure of effectiveness. When we speak of decision making, we assume that different patterns of behaviour are possible. They are called options. Let me emphasize that in the more important problems the number of options is very great. Let X be the set of options in a particular situation. A decision is made when we select one option x from this set. How do we determine which option is the most preferable or the most efficient? A quantitative criterion is needed to allow us to compare different options in terms of their effectiveness. Let us call this criterion the measure of effective-

THESE DIFFICULT DECISIONS

56

DECISION MAKING

ness. This measure is selected for each particular purpose, e.g., not to be late for school, to solve a problem correctly and quickly, or to reach the cinema. A doctor wants to find an efficient method of treating his patient. A factory manager is responsible for the fulfilment of a production plan. The most efficient option is the one that suits its purpose best.

Suppose we work in a shop and our target is to maximize the receipts. We could choose profit as the measure of effectiveness and strive to maximize this measure. The selection of the measure in this example is evident. However, there are more complicated situations, when several goals are pursued simul- taneously, for example, we wish to maximize profit, minimize the duration of the sales, and distribute the goods to the greatest number of customers. In such cases we have to have several measures of effectiveness; these problems are called multi-criterial.

Let W be a single measure of effectiveness. It would seem that our task is now to find. an option x at which W is ata maximum (or, the other way round, at a minimum). However, we should remember that decision making occurs under conditions of uncertainty. There are unknown (random) factors (let us use £ to denote them), which influence the end result and therefore affect the measure of effectiveness W. There is also always a set of factors known beforehand (let us designate them @). Therefore the measure of effectiveness is dependent on three groups of factors: known factors a, unknown (random) factors £, and the selected option x:

W =W (a, &,x).

In the sales example, the a set is goods on sale, the available premises, the season, etc. The £ factors include the number of customers per day (it varies randomly from day to day), the time customers arrive (random crowding is possible, which leads to long queues), the goods chosen by the customers (the demand for a given commodity varies randomly in time), etc.

Since the £, factors are random, the measure of effectiveness W is a ran-

57

dom variable. Now, how is it possible to maximize (minimize) a random variable? The answer quite clearly is that it is naturally impossible. Whichever option x is chosen, W remains random, and it cannot be maximized or mini- mized. This answer should not discourage the reader. It is true that under conditions of uncertainty we cannot maximize (minimize) the measure of effectiveness with a hundred per cent probability. However, an adequate selec- tion of an option is possible with a reasonably large probability. This is where we should tackle the techniques used in decision making under conditions of stochastic uncertainty.

Substitution of random factors by means. The easiest technique is merely to substitute the random factors by their means. The result is that the problem becomes completely determined and the measure of effectiveness W can be calculated precisely. It can, in particular, be either maximized or minimized. This technique has been widely used to solve problems in physics and technology. Almost every parameter encountered in these fields (e.g., temperature, potential difference, illuminance, pressure) is, strictly speak- ing, a random variable. As a rule, we neglect the random nature of physical parameters and use their mean values to solve the problems.

The technique is justified if the deviation of a parameter from its mean value is insignificant. However, it is not valid if the random factor significantly affects the outcome. For instance, when organizing the jobs in a motor-car repair shop, we may not neglect the randomness in the way cars fail, or the random nature of the failures themselves, or the random time needed to complete each repair operation. If we are dealing with the noise arising in an electronic device, we cannot neglect the random behaviour of electron flows. In these examples, the £ factors must indeed be considered as random factors, we shall say they are essentially random.

Mean value optimization. If the £ factors are essentially random, we can use a technique called mean-value optimization. What we do is to use the expected value E(W) as the measure of effectiveness, rather than the random

THESE DIFFICULT DECISIONS

58

DECISION MAKING

variable W and the expected value is maximized or minimized.

Naturally, this approach does not resolve the uncertainty. The effective- ness of an option x for concrete values of random parameters £ may be very different from the expected one. However, using mean-value optimization means that we can be sure that after many repeated operations we shall gain overall. It should be borne in mind that mean-value optimization is only ad- missible when the gains of repeated operations are totalled, so that “minuses” in some operations are compensated by the “pluses” in others. Mean-value optimization would be justified should we be trying to increase the profit obtained, for instance, in a sales department. The profit on different days would be totalled, so that random “unlucky” days would be compensated by the “lucky” days,

But here is another example. Suppose we consider the effectiveness of the ambulance service in a large city. Let us select the elapsed time between summoning help and the ambulance arriving as the measure of effectiveness. It is desirable that this parameter be minimized. We cannot apply mean-value optimization because if one patient waits too long for a doctor, he or she is not compensated by the fact that another patient received faster attention.

Stochastic constraints. Let us put forward an additional demand. Sup- pose we desire that the elapsed time W till the arrival of help after a call for an ambulance be less than some value W%. Since W is a random variable, we cannot demand that the inequality W < W be always true, we can only demand that it be true for some large probability, for instance, no less than 0.99. In order to take this into account we delete from the X set those options x, for which the requirement is not satisfied. These constraints are called stochastic. Naturally, the use of stochastic constraints noticeably complicates decision making.

59 RANDOM PROCESSES WITH DISCRETE STATES

Random Processes with Discrete States

A random process can be thought of as the transition of a system from one state to another occurring in a random fashion. We shall consider random processes with discrete states in this chapter and so our system will be sup- posed to have a set of discrete states, either finite or infinite. The random transitions of the system from one state to another are assumed to take place instantaneously.

State graphs. Random processes with discrete states can be conveniently considered using a diagram called a state graph. The diagram shows the possible states a system may be in and indicates the possible transitions using arrows.

Let us take an example. Suppose a system consists of two machine tools, each of which produces identical products. If a tool fails its repair is started immediately. Thus, our system has four states: 5, both tools are operating; S, the first tool is under repair after a failure while the second is operating; S;, the second tool is under repair while the first is operating; ,, both tools are

being repaired.

The state graph is given in Figure 2.1. The transitions S; > 5,, 5S, > 53, S, S,and S; §, occur as a result of failures in the system. The reverse transitions take place upon termination of the repairs. Failures occur at unpredictable moments and the moments when the repairs are terminated are also random. Therefore, the system’s transition from state to state is random.

Note that the figure does not show transitions 5, > S,and S, > S,. The former corresponds to the simultaneous failure of both tools and the latter to the simultaneous termination of repair of both tools. We shall assume that the probabilities of these events are zero.

Event arrival. Suppose that we have a situation in which a stream of

Figure 2.1: A state graph for system with four states.

60 DECISION MAKING

uniform events follow each other at random moments. They may be tele- phoned orders for taxi, domestic appliances being switched on, the failures in the operation of a device, etc.

Figure 2.2: A record of taxi orders at a taxi depot. Suppose the dispatcher at a taxi depot records the time each taxi order

is made over an interval of time, for instance, from 12 a.m. to 2 p.m. We can show these moments as points on the time axis, and so the dispatcher might get the pattern illustrated in Figure 2.2 (a). This is the realization of

61 RANDOM PROCESSES WITH DISCRETE STATES

the taxi-call arrivals during that interval of time. Three more such realizations are shown in Figure 2.2 (b), (c), and (d), and they are patterns recorded on different days. The moments when each taxi order is made in each realization are random. At the same time, the taxi-order arrivals possess statistical stability, that is, the total number of events in each interval of time varies only slightly from experiment to experiment (from one arrival realization to another). We can see that the number of events in the arrival realisations presented are 19, 20, 21, and 18.

In the preceding chapter, a random event in an experiment was an out- come which has a definite probability. When we are considering arrivals of events, we must have another meaning for the term “event”. There is no use speaking about the probability of an outcome (event) because each event is uniform, i.e. indistinguishable from the others. For instance, one taxi-order is a single event in a stream and is indistinguishable from another event. Now let us consider other probabilities, for instance, the probabilities that an event will occur during a given interval of time (suppose, from ¢ to ¢ + At, as shown in the figure) exactly once, twice, thrice, etc.

The notion of “event arrival” is applied to random processes in systems with discrete states. It is assumed that the transitions of a system from one state to another occur as a result of the effect of event arrivals. Once an event arrives, the system instantaneously changes state. For the state graph in Figure 2.1 transitions S$; > S, and. S; S, occur due to the arrival of events corresponding to failures in the first tool, while transitions S, S; and. S, S, occur due to failures of the second tool. The reverse transitions are caused by the arrival of events corresponding to the “terminations” of repair: transitions S, > S, and S, 53, are caused by the arrivals of repair terminations of the first tool, and transitions 5, S, and.S, S, to the arrivals of repair terminations of the second tool.

The system transfers from state S; to state S, every time the next event related to the transition arrives. The natural conclusion is that the probability

62

Figure 2.3: A state graph for system with four states with arrival rates.

DECISION MAKING

of transition 5; > S; at a definite moment in time ¢ should equal the proba- bility of an event arrival at this moment. There is no sense in speaking of the probability of a transition at a concrete moment ¢. Like the probability of any concrete value of a continuous random variable, this probability is zero, and this result follows from the continuity of time. It is therefore natural to discuss the probability of a transition (the probability of an event arrival) occurring during the interval of time from ¢ to ¢ + Az, rather than its occur- rence at time ¢. Let us designate this probability P(t, At). As Ar tends to

zero, we arrive at the notion of a transition probability density at time t, i.e.

P,, (t, At) ah) = Bye ae

(2.1)

This is also called the arrival rate of events causing the transition in question.

In the general case, the arrival rate depends on time. However, it should be remembered that the dependence of the arrival rate on time is not related to the location of “dense” or “rare” arrival realisations. For simplicity’s sake, we shall assume that the transition probability density and therefore the event arrival rate does not depend on time. i.e. we shall consider steady-state arrivals.

The Chapman-Kolmogorov equations for steady state. Let us use p; to denote the probability that a system is in state S, (since our discussion is only for steady-state arrivals, the probabilities p,, are independent of time), Let us consider the system whose state graph is given in Figure 2.1. Suppose A, is the arrival rate for failures of the first tool and A, that for the second tool; let w, be the arrival rate for repair terminations of the first tool and yz, that for the second tool. We have labelled the state graph with the appropriate arrival rates, see Figure 2.3.

Suppose there are N identical systems described by the state graph in Figure 2.3. Let N > 1. The number of systems with state S;, is Np; (this statement becomes more accurate the larger N is). Let us consider a concrete

63 RANDOM PROCESSES WITH DISCRETE STATES

state, say, S;. Transitions are possible from this state to states S, and S, with probability 2, + 2,, per unit time. (Under steady state, the probability density is the probability for the finite time interval At divided by At.) Therefore, the number of departure: from state 5,, per unit time in the considered set of systems is Np, (A, + A), We can discern a general rule here: the number of transitions 5; 5S; per unit time is the product of the number of systems with state S, (the initial state) by the probability of the transition per unit time, We have considered departures from state S,. The system arrives at this state from S, and S;, The number of arrivals at S, per unit time is Np, uy +N ps 4y- Since we are dealing with steady states, the number of departures and arrivals for each particular state should be balanced. Therefore.

Np (A, + Ay) = Np ey + Np bo.

By setting up similar balances of arrivals and departures for each of the four states and eliminating the common factor N in the equations, we obtain the

following equations for probabilities p,, p), p; and py: for state Sy: (Ay +2,) py = apo + Mop (A, + 1) ho = Api + Po for state 5, : (A, + Uy) p3 = Ay Py + La Pes (4 + #2) Ps = Afr + Arps.

for state S, :

for state S, :

It is easy to see that the fourth equation can be obtained by summing the first three. Instead of this equation, let us use the equation

Pit Prt Ps + Py = 1,

which means that the system must be in one of the four states. Therefore, we

64

DECISION MAKING

have the following system of equations:

Sy? (Ay +g) py = Hafo + ops»

Sy? (A, + fy) Po = AP + oP

53 (Ay + fy) Ps = Arp + Pr Pit Prt P3 + Py = 1.

(2.2)

These are the Chapman-Kolmogomv equations for the system whose state graph is shown in Figure 2.3.

Which innovation should be chosen? Let us analyze a concrete situa- tion using equations (2.2). The state graph (see Figure 2.3) corresponding to these equations describes a system which, we assumed, consists of two machine tools each producing identical goods Suppose the second tool is more modern an its output rate is twice that o the first tool. The first tool generates (per unit time) an income of five conventional units, while the second one generates one of ten units. Regretfully, the second tool fails, on the average, twice as frequently as does the first tool: hence 2; = Landa, = 2. The arrival rates for repair termination are assumed to be uw, = 2 and # = 3. Using these arrival rates for failure and repair termination. let us rewrite (2.2) thus

3p, = 2p, + 3ps;

4p, = pi + 3p,

4p; = 2p, + 2p) Pit Prt pst My=l.

This system of equations can be solved to yield p, = 0.4, p) = 0.2, p, = 0.27 and p, = 0.13. This means that, on the average, both tools operate simultaneously (state S, in the figure) 40 per cent of the time, the first tool operates while the second one is being repaired (state S,) 20 per cent of the time, the second tool operates while the first one is being repaired (state 5S; ) 27 per cent of the time, and both tools are simultaneously being repaired

65 RANDOM PROCESSES WITH DISCRETE STATES

(state §,) 13 per cent of the time. It is easy to calculate the income this tool system generates per unit time: (5 + 10) x 0.4+5 x 0.2 + 10 x 0.27 = 9.7 conventional units.

Suppose an innovation is suggested which would reduce the repair time of either the first or second tool by a factor of two. For technical reasons, we can only apply the innovation to one tool. Which tool should be chosen, the first or the second? Here is a concrete example of a practical situation when, using probability theory, we must justify our decision scientifically

Suppose we choose the first tool. Following the introduction of the innovation, the arrival rate of its repair termination increases by a factor of two, whence #, = 4 (the other rates remain the same, i. e. 2, = 1,4, = 2 and Ly = 3). Now equations (2.2) are

3p, = 4pr + 3p3; Pr = Pi + 3a 4p; = 2p, + 4p, Pit Prt Pst Py = 1. After solving this system, we find that p, = 0.48, p) = 0.12, p; = 0.32, and Ps = 0.08. These probabilities can be used to calculate the income our system

will now generate: (5 + 10) x 0.48 + 5 x 0.12 + 10 x 0.32 = 11 conventional units.

If we apply the innovation to the second tool, the rate 2), will be doubled. Now 4, = 1, 2, = 2, wy, = 2 and wy = 6, and equations (2.2) will be 3p, = 2pr + Ops; 4p, = py + Of, 7 Pz = 2p, + 2Pay Pit prt Pst Py = 1. This system yields: p, = 0.5, p, = 0.25, p, = 0.17, and p, = 0.08, whence the

66

DECISION MAKING

Income is (5 + 10) x 0.5 +5 x 0.25 + 10 x 0.17 = 10.45 conventional units. Therefore it is clearly more profitable to apply the innovation to the first too.

Queueing Systems

The problem of queueing. Modern society cannot exist without a whole network of queueing systems. These include telephone exchanges, shops, polyclinics, restaurants, booking offices, petrol stations, and hairdressers. Despite their diversity, these systems have several things in common and common problems.

When we seek the assistance of a doctor or service from a cafe, restaurant, or barber, we must wait for our turn in a queue, even if we telephone to make an appointment, that is, reserve our place in a queue without actually attending physically. Clearly, we wish to be served straight away and waiting can be frustrating.

It is clear that the source of the problem is the 7zdom nature of the demands for attention in queueing systems. The arrival of calls at a telephone exchange is random as is the duration of each telephone conversation. This randomness cannot be avoided. However, it can be taken into account and, as a consequence, we can rationally organize a queueing system for all practical purposes. These problems were first investigated in the first quarter of this century. The mathematical problems for simulating random processes in systems with discrete states were formulated and considered, and a new field of investigation in probability theory was started.

Historically, queueing theory originated in research on the overloading of telephone exchanges, a severe problem in the early 20th century. The initial period in the development of the queueing theory can be dated as corresponding to the work of the Danish scientist A. Erlang in 1908-1922. Interest in the problems of queueing rapidly increased. The desire for more

67

rational servicing of large numbers of people led to investigations of queue formation. It soon became evident that the problems dealt with in queueing theory went well beyond the sphere of rendering service and the results are applicable to a wider range of problems.

Suppose a workman is operating several machine tools. Failures requiring urgent repairs occur at random moments, and the duration of each repair is a random variable. The result is a situation similar to a common queueing system. However, this is a problem of servicing many tools by a worker rather than servicing many people by a queueing system.

The range of practical problems to which queueing theory can be applied is uncommonly wide. We need the theory when we want, say, to organize the efficient operation of a modern sea port, when, for instance, we analyze the servicing rate of a large berth. We apply to queueing theory when we look at the operation of a Geiger-Miiller counter. These devices are used in nuclear physics to detect and count ionizing particles. Each particle entering a tube in the counter ionizes gas in the tube, the ionization being roughly independent of the particle‘s nature and energy, and so a uniform discharge across the tube is generated. But when one discharge is under way, a new particle cannot be registered (“serviced”) by the same counter. The moment each particle enters the tube 5 random, as is the duration of the discharge (the “servicing” time). This is a situation typical for queueing systems.

Basic notions. A queueing system is set up to organize the service of a stream of requests. The request may be a new passenger in a booking office, a failure in a machine tool, a ship mooring, or a particle entering a Geiger- Miiller counter. The system may have either one or several servers. When you go to a large barbershop or hairdresser and want to know the number of barbers or hairdressers, you are in effect asking for the number of servers in the establishment. In other situations, the servers may be the number of cashiers in a booking office, the number of telephones at a post office for making trunk calls, the number of berths in a port, or the number of pumps

QUEUEING SYSTEMS

68

DECISION MAKING

at a petrol station. If, on the other hand, we wish to see a particular doctor, we are dealing with a single-server queueing system.

When we consider the operation of a queueing system, we must first take into account the number of servers, the number of requests arriving at the system per unit time, and the time needed to service a request. The number of requests arriving at the system, the moments they arrive, and the time needed to service a request are, as a rule, rzndom factors. Therefore, queueing theory isa theory of random processes.

Random processes of this type (i. e. with discrete states) were discussed in the preceding section. A system transfers from state to state when each request arrives at the system and when the requests are serviced. The latter is given by the rate at which requests can be served by a single, continuously occupied server.

Queueing systems. There are two sorts of queueing system: systems with losses and. systems with queues. If a request arrives at a system with losses when all the servers are occupied, the request is “refused” and is then lost to the system. For example, if we want to telephone someone and the number is engaged, then our request is refused and we put down the receiver. When we dial the number again, we are submitting a new request.

The more common types of system are those with queues or systems with waiting. This is why it is called the theory of queueing. In sucha system, if a request (or customer) arrives when all the servers are occupied, the customer takes a place in a queue and waits for a server to become free. There are systems with zn/finite queues (a queueing customer is eventually served and the number of places in the queue is unlimited) and systems with /nzte queues. There are different sorts of restriction, i.e. the number of customers queueing at the same time may be limited (the queue cannot be longer than a certain number of customers and any new customer is refused); the duration of a customer’s stay in the queue may be limited (after a certain length of time queueing, an unserved customer will leave the queue); or the time the

69

system operates for may be restricted (customers may only be served for a certain interval of time).

The service order is also important. Customers are commonly served “first come first served”. However, priority servicing is also possible, i.e. a newcomer to a queue is served first irrespective of the queue. A customer with a high priority may arrive at the system and interrupt the servicing of a customer with a lower priority, which may already start, or the higher priority customer may have to wait until the servicing has been completed. The priority is zbsolute in the first case and relative in the second. Queueing systems are always mu/ti-critical, that is, they have a set of measures by which their effectiveness can be estimated. These may be the average number of customers served by the system per unit time, the average number of occupied servers, the average number of customers in the queue, the average time of waiting for servicing, the average percentage of refused customers, and the probability a customer arriving at the system is immediately served. There are other measures of such systems’ effectiveness. It is quite natural that when organizing the operation of a queueing system we should strive to reduce the average number of customers in the queue, and to reduce the time of waiting for servicing. It is also desirable to maximize the probability that a customer arriving at the s stem is served immediately, to minimize the average percentage of refused customers, and so on.

This eventually means that the productivity of the system must be in- creased (i.e. the time needed to service each customer be decreased), the system’s operation be rationalized, and the number of servers made as large as possible. However, by raising the number of servers, we cannot avoid decreas- ing the average number of occupied servers. This means that the duration of the time for which a server is not occupied will increase, i.e. the server will be idle for some time. The result is that the system’s operational efficiency is lowered. Therefore we must in some way optimize the system’s operation. The number of servers should not be too small (to eliminate long queues and to keep the number of refusals small), but it should also not be too large (so

QUEUEING SYSTEMS

70

Figure 2.4: State graph of a system with losses.

DECISION MAKING

that the number and duration of idle periods for each server is small).

Systems with losses. The simplest type of queueing system is a szvgle- server system with losses. Here are some examples: a system with only one telephone line or a particle detector consisting of only one Geiger-Miiller counter. The state graph for such a system is shown in Figure 2.4 (a). When the server is unoccupied, the system is in state Sy, and when the server is occupied, it is in state S,. The customer’s arrival rate is 1, and the service completion rate is it u. This state graph is very simple. When the system is in state Sy a customer arriving at the system transfers it to state S,, and the servicing starts. Once the servicing is completed, the system returns to state So and is ready to serve a new customer.

——— (a) So Sy <____ le

We shall not go into detail on this type of system and go straight over to a re general case, an 2-serversystem with losses. An example is a system consisting of z telephone lines. Erlang, the founder of the queueing theory, considered. precisely this system. The corresponding state graph is given in Figure 2.4 (b). The states of the system are designated as follows: Sy when all servers are unoccupied, 5, when one server is occupied and the others are unoccupied, S, when two servers are occupied while the others are unoccupied, and so

71

on, and S, is the state when all x servers are occupied. As in the preceding example, J is the customer arrival rate, and is the service-completion rate.

Suppose the system is in state Sy. When a customer request arrives, one of the servers becomes occupied, and the system is transferred to state S;. If the system is in state S,, and a new customer arrives, two servers become occupied, and the system is transferred from 5, to S,. Thus, each customer (with the rate of arrivals 2) transfers the system from one state to the adjacent one from left to right (see the state graph in the figure). The arrival of events leading to transitions to adjacent states from right to left is somewhat more complicated. If the system is in the state 5; (only one server is occupied), the next service-completion event will disengage the server and transfer the system to state Sy. Let me remind you that the service-completion rate is yz. Now suppose the system is in 5}, i.e. two servers are occupied. The average time of service for each server is the same. Each sewer is disengaged with the rate it when services are completed. As to the transition of the system from S, to 5}, it is indifferent as to which of the two servers is unoccupied. Therefore. events which transfer the system from S, to S, arrive at the rate 2u. As to the transition of the system from S, to S;, it is indifferent as to which of the three occupied servers is disengaged. Events which transfer the system from 5; to S, arrive at the rate 3, and so forth. It is easy to see that the rate of event arrival which transfers the system from S, to S,_, is ku.

Let us assume that the system is in a steady state. Applying the rule from the preceding section and using the state graph in Figure 2.4 (b), we can compile the Chapman-Kolmogorov equations for the, probabilities Po> Pr> Pas Pas (recall that p, is the probability that the system is in the state S,). We obtain the following system of equations:

QUEUEING SYSTEMS

72

DECISION MAKING

Apo = LPL (A+ &)p1 = Apo + 2uepr (A + 2u)p, = Ap, + 3 ups,

(A+ ke) pp = APea + (R+ le Perr [A + (n = 1)u]p,-1 = MPn-2 + NEPn> Pot Prt pot. +p, = 1.

This set of equations can be solved easily, Using the first equation, we can express p, in terms of fy and substitute it into the second equation. Then we can express p, in the second equation in terms of p, and substitute it into the third one, and so forth. At the last but one stage, we express p, in terms of py. And finally, the results obtained at each stage can be substituted into the last equation to find the expression for py. Thus

(alw)” 1 a ley . Alu)? 7 Po= Bea =: + - tee ;

2 k Pe = Ne (k = 1, 2, 3n).

(2.4)

A customer’s request is refused if it arrives when all 7 servers are engaged, i.e. when the system is in state S,. The probability that the system is in 5, equals Py This is the probability that a customer arriving at the system is refused and the service‘is not rendered. We can find the probability that a customer arriving at the system will he served,

Po: (2.5)

By multiplying Q by 2, we obtain the service-completion rate of the system. Each occupied server serves customers per unit time, so we can divide Q by

73

and find the average number of occupied servers in the system,

E(N) = (1 7 ae): (2.6)

How many servers are required? Let us consider a concrete example. Suppose a telephone exchange receives 1.5 requests per minute on the average, and the service completion rate is 0.5 request per minute (the average service time for one customer is two minutes). Therefore, A/“ = 3. Suppose the exchange has three servers (three telephone lines). Using formulas (2.4)—(2.6) for A/u = 3. and xn = 3, we can calculate that the probability of servicing the arriving customers is only 65 per cent. The average number of engaged lines is 1.96, which is 65 per cent of the total number of lines. Thus, 35 per cent of the customers are refused and not served. This is too much.

We may decide on increasing the number of servers. Suppose we add one more, a fourth line. Now the probability of a customer being served increases to 79 per cent (the probability of being turned away decreases to 21 per cent). The average number of engaged lines becomes 2.38, which is 60 per cent of the total number of lines. It would appear that the decision to install a fourth line is reasonable because a relatively small reduction in the percentage of occupied servers (from 65 to 60 per cent) results in a significant rise in the probability to be served, from 65 to 79 per cent. Any further increase in the number of lines may become unprofitable because the effectiveness of the system may fall due to the increasing idleness of the lines. A more detailed analysis would then be required to allow for the cost of installing each new line. Let me remark that at 2 = 5 we get Q = 89 per centand E(N)/n = 53 per cent, while for 2 = 6, Q = 94 per cent and E(N)/z = 47 per cent.

Single-server systems with finite queues. Suppose the number of queueing customers is restricted, and the queue may only accommodate m customers. If all places in the queue are occupied, a newcomer is turned away. For example, a petrol station with only one pump (only one server)

QUEUEING SYSTEMS

74 DECISION MAKING

and a parking area for no more than m cars. If all the places at the station are occupied, the next car arriving at the station will not stop and will go on to the next. The state graph for this system is shown in Figure 2.5 (a). Here Sy means

Figure 2.5: State graph for a system (a) with fi-

siite queassand (b)-yidh infinite qusues, the server is unoccupied, 5S, the server is occupied, S, the server is occupied

and there is one customer in the queue, S the server is occupied and there are two customers in the queue, ...,.5,,,; means the server is occupied and there are m customers in the queue. As before, J is the customer arrival rate and uw is the service completion rate. The Chapman-Kolmogorov equations for steady state are

Apo = LPL (A+ “)p, = Ap + 2upr, Sic tis tees. Aes (2.7) A+“) Pm = Pm + HPm+1> Pot Prt pot + Pn t+ Pmoy = 1.

By solving this system and introducing the designation p = 2/u we obtain

1 1l-p

k 7 = > = . 2.8 Ltptptpt+..t pet 1— pm Pe =F Po (2.8)

Po

75

A customer is turned away if the server is engaged and there are m customers in the queue, i.e. when the system is in the state S,,,,. Therefore, the probability a customer is turned away is p,,,;. The average number of customers in the queue is evidently

E(r) = > kPes1 k=1

(f,1 is the probability of k customers being in the queue). The average waiting time in the queue is the ratio E(v)/A.

Suppose one car arrives at the petrol station per minute (2 = 1 customer per minute) and a car is filled, on average, within two minutes (u = 1/2). Therefore, p = A/u = 2. If the number of places in the queue m = 3, it is easy to calculate that the probability of a customer being refused is 51.6 per cent while the average waiting time in the queue is 2.1 min. Suppose that in order to decrease the probability of a customer being refused we double the number of places in the queue. It turns out that at m = 6 the probability of refusal is 50.2 per cent, i. e. it is, in fact, the same, but the waiting time in the queue noticeably increases to 5 min. It is clear from (2.8) that if p > 1, the probability of being refused stabilizes with increasing m and tends, to (p 1)/,. In order to reduce the probability of being refused significantly, it

is necessary (if it is not possible to decrease ) to use multi-server systems.

Single-server systems with infinite queues. This sort of queueing system is rather common: for example, a doctor receiving patients, a single public telephone, or a port with only one berth at which a single ship can unload. The state graph for the system is given in Figure 2.5 (b). Here So means that the server is unoccupied, 5, the server is occupied, 5S, the server is occupied and there is one customer in the queue, S; the server is occupied and there are two customers in the queue, and 5, means that the server is occupied and there are & 1 customers in the queue, and so on.

Up till now, we considered graphs with a finite number of states. However, here is a system with an infinite number of discrete states. Is it possible to

QUEUEING SYSTEMS

76

DECISION MAKING

discuss a steady state for such a system? In fact we can. It is only necessary that the inequality p < 1 holds true. If so, then the sum 1+ + p + e +..+ Yi in (2.8) can be substituted by the sum of the decreasing geometric progression

l+ptp tpt... =1/(1-,/). The result is

Po=zl-p and p, = p* ty. (2.9)

If p > 1, then the system does not have a steady state, i.e. the queue increases infinitely as t > 0,

Method of Statistical Testing

A statistical testing involves numerous repetitions of uniform trials. The result of any individual trial is random and is not of much interest. However, a large number of results is very useful. It shows some stability ( statistical stability) and so the phenomenon being investigated in the trials can be de- scribed quantitatively. Let us consider a special method for investigating a random process based on statistical testing. The technique is commonly called the Monte Carlo method.

In fact neither the city of Monte Carlo, the capital of the independent principality of Monaco nor its inhabitants nor guests are in any way related to the considered method. Instead, the city is known for its casinos where tourists pay good money playing roulette, and a roulette wheel could be the city’s emblem. At the same time, a roulette is a generator of random numbers and this is what is involved when the Monte Carlo method is used.

Two examples indicating the usefulness of statistical testing.

First example. Look at Figure 2.6. It contains a square with side 7 in which a quarter circle of radius 7 is inscribed. The ratio of the yellow area to the area of the square is (xr*) / 4 =7 /4. This ratio and, therefore, the value of can be obtained using the following statistical test. Let us place a sheet

77

of paper with the figure on a horizontal surface and let us throw small grains on this paper. We should not aim so that any grain can fall on any part of the paper with equal probability. It is possible, for instance, to blindfold the person throwing the grains. The grains will be distributed over the surface of the paper in a random fashion (Figure 2.6 (b)). Some will land outside the square, but we shall not consider them. We now count the number of grains within the square (and call this number N,) and count the grains within the yellow area (calling it N,). Since any grain may land with equal probability on any part of the figure, the ratio N, /N, when the number of trials is large, will approximate the ratio of the yellow area to the area of the square, i.e. the number 2/4. This approximation will become more accurate as the number of trials increases. This example is interesting because a definite number (the

(a)

number 7) can be found following a statistical testing. It can be said that randomness is used here to obtain a deterministic result, an approximation of the real number z.

Second example. Statistical testing is used much more commonly to investigate random events and random processes. Sappose someone assembles

METHOD OF STATISTICAL TESTING

Figure 2.6: Finding out value of z using a ran- dom distribution.

78

DECISION MAKING

a device consisting of three parts (4, B, and C). The assembler has three boxes containing parts A, B, and C, respectively. Suppose half the parts of each type are larger than the standard and the other half are smaller. The device cannot operate when all three parts are larger than the norm. The assembler takes the parts from the boxes at random. What is the probability that a normally operating device will be assembled?

Naturally, this example is rather simple and the probability can easily be calculated. The probability of assembling a device that does not work is the probability that all three parts will be larger than the norm, and this equals 1/2x 1/2 x 1/2 = 1/8. Therefore, the probability that a normally operating device will be assembled is 1 - 1/8 = 0.875.

Let us forget for a time that we can calculate the probability and instead use statistical testing. We should choose trials such that each one has equally probable outcomes, for instance, tossing a coin. Let us take three coins: A, B, and C’. Each coin corresponds to a part used to assemble the device. Heads will mean that the respective part is larger than the norm while tails will mean that it is smaller. Having agreed on this, let us start the statistical testing. Each trial involves tossing all three coins. Suppose after N trials (NV > 1) three heads were recorded in x trials. It is easy to see that the ratio (N 2)/N is the approximation of the probability in question.

Naturally, we could use any other random number generator instead of coins. It would also be possible, for instance, to throw three dice, having agreed to relate three faces of each die with larger than normal parts and three faces with smaller parts.

Let me emphasize that the randomness in these examples was a positive factor rather than a negative one, and was a tool which allowed us to obtain a needed quantity. Here chance works for us rather than against us.

Random number tables come into play. Nobody uses statistical testing in simple practical situations like the ones described above. It is used when it

79 METHOD OF STATISTICAL TESTING

is difficult or even impossible to calculate the probability in question. Nat- urally you might ask whether a statistical testing would be too complicated and cumbersome. We threw grains or three coins in the examples. What will be required in complicated situations? Maybe, there will be practically unsurmountable obstacles?

In reality, it is not necessary to stage a statistical experiment with random trials. Instead of real trials (throwing grains, dice, etc.), we need only use random number tables. Let me show how this can be done in the above two examples.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x

Figure 2.7: Finding out value of z using a ran-

Bes <i : feos dom number table. First example. Let us again discuss the picture in Figure 2.6. We now

plot two coordinate axes along the sides of the square and select the scales

80

Table 2.1: Coordinates of fifteen random num- bers shown in red in the Figure 2.7.

(0.0655, 0.5255) (0.6314, 0.3157) (0.9052, 0.4105) (0.1437, 0.4064) (0.1037, 0.5718) (0.5127, 0.9401) (0.4064, 0.5458) (0.2461, 0.4320) (0.3466, 0.9313) (0.5179, 0.3010) (0.9599, 0.4242) (0.3585, 0.5950) (0.8462, 0.0456) (0.0672, 0.5163) (0.4995, 0.6751)

DECISION MAKING

such that the side of the square equals unity (Figure 2.7). Now instead of throwing grains, we take the random number table in Figure 1.6 and divide each number by 10 000 so that we obtain a set of random numbers between 0 and 1. We take the numbers in the odd lines as x-coordinates and the ones directly below as the y-coordinates of random points. We plot the points onto the diagram, systematically moving along the random number table (for instance, first down the first column from top to bottom, and then down the second column, and so on). The first fifteen random points are shown in the picture in red, and they have the coordinates as shown in Table 2.1. The figure contains 85 random points in black. From the diagram, it is easy to calculate that using the first fifteen points N,/N, = 13/15 and therefore 7 = 3.47 while for a hundred points N, /N, = 78/100 and therefore z = 3.12.

Second example. Instead of tossing coins, we can use the same random number table (see Figure 1.6). Each number over 5000 can be replaced by a “+” sion and the rest replaced by a “—” sign. The result is a table consisting of a random set of pluses and minuses. We divide these signs into triples as shown in Figure 2.8. Each triple corresponds to a set of three parts. A “+” sign means that a part is larger than the norm while a “—” sign means it is smaller. The approximation of the sought probability is the ratio (NV 2) /N, where WN is the total number of triples and z is the number of triples with three pluses (they are shaded in the figure). It can be seen that (N 2)/N = 0.9 in this

case, and this is close enough to the accurate value 0.875.

Thus, we have reduced statistical testing to operations on a random num- ber table and used our desk instead of an experimental bench. Rather than performing very many trials, we just look at a random number table.

Computers come into play. Instead of sweating over a random number table, we could program a computer to do the job. We place a random num- ber table in the computer’s memory and program it to search the random numbers and sort them as necessary. In our two examples, we would do the following.

81

DOES Sd ese === De RSE Ca ns = RE 2 ey SE DE

+ = + = + + + + = + +

me be ES ee

senses of Cee ES Os ae OO aC ct

eae a > ES Cem a CEE

iB + [RA aca + ] —- = || ele or + + zt ia +|4

° CE +R + ] =|5 mo cE + Re

First example. The computer has to check the coordinates of each ran- dom point to see whether x* + y” < 1. It counts the number of points for which this is true (the number is N,) and the number of points for which it is false (this number of points will be the difference N, N).

Second example. Allrandom numbers in the computer’s memory must be divided into triples and the triples checked to find ones in which all three numbers are over 5000. The number of such triples is 7.

METHOD OF STATISTICAL TESTING

Figure 2.8: Using a random number table in- stead of coin tosses for statistical testing.

82

DECISION MAKING

The Monte Carlo method. The world changed when the computer came into play. By processing a random number table the computer simu- lates the statistical testing and it can do this many times faster than could be done either experimentally or by working manually with a random number table. And now we come to the Monte Carlo method, a very useful and efhi- cient method of probabilistic calculation which is applied to many problems, primarily those that cannot be solved analytically.

Let me emphasize two points. Firstly, the Monte Carlo method utilizes randomness not chance. We do not try to analyze the complicated random processes, nor even simulate them. Instead, we use randomness, as it were, to deal with the complications chance has engendered.

Chance complicates our investigation and so randomness is used to in- vestigate it. Secondly, this method is universal because it is not restricted by any assumption, simplification, or model. There are two basic applications. The first is the investigation of random processes which cannot be dealt with analytically due to their complexity. The second is to verify the correctness and accuracy of an analytical model applied in concrete situations.

The Monte Carlo method was first widely used in operations research, in looking for optimal decisions under conditions of uncertainty, and in treating complicated multi-criterial problems. The method is also successfully used in modern physics to investigate complex processes involving many random events.

A Monte Carlo simulation of a physical process. Let us consider the flow of neutrons through the containment shield of a nuclear reactor. Uranium nuclei split in the core of the reactor and this is accompanied by the creation of high-energy neutrons (of the order of several million electron volts). The reactor is surrounded by a shield to protect the working areas (and therefore, the personnel) from the radiation. The wall is bombarded by an intense flow of neutrons from the reactor core. The neutrons penetrate into the wall and collide with the nuclei of the atoms of the wall. The result is that

83

the neutrons may either be absorbed or scattered. If scattered, they give up some of their energy to the scattering nuclei.

This is a complicated physical process nvolving many random events. The energy and the direction of a neutron when it leaves the reactor core and enters the wall are random, the length of the neutron path before it first collides is random, the nature of collision (absorption or scattering) is random, the energy and the direction of the scattered neutron are random, etc. Let me show in general how the Monte Carlo method is applied to analyze the process. Obviously the computer is first programmed with data on the elementary collisions between neutrons and the wall nuclei (the probabilities of absorption and scattering) the parameters of the neutron flow into the wall, and the properties of the wall. The computer model simulates a neutron with a randomly selected energy and direction (when it leaves the reactor core and enters the wall) in line with appropriate probabilities. Then it simulates (bearing in mind the relevant probabilities) the flight of the neutron until it first collides. Then the first collision is simulated. If the neutron is not absorbed, subsequent events are simulated, i.e. the neutron’s flight until its second collision, the collision itself, and so on. The “history” of the neutron is determined from the moment it penetrates the wall until it is either absorbed, scattered back into the reactor core, or scattered into the working area.

The computer simulation is repeated for very many neutrons until a set of possible trajectories of neutrons within the wall is obtained (Figure 2.9). Each trajectory is the result of one statistical trial simulating the “history” of Chance complicates our investigation and so randomness is used to investi- gate it. Secondly, this method is universal because it is not restricted by any assumption, simplification, or model. There are two an individual neutron. Given an enormous set of trials the neutron flow through the containment wall as a whole can be analyzed and recommendations for the thickness of the wall and its composition can be made so as to guarantee the safety of the personnel working at the reactor.

METHOD OF STATISTICAL TESTING

84

Figure 2.9: A set of all possible trajectories for the neutron.

DECISION MAKING

Modern physics requires the Monte Carlo method on many occasions. Physicists use it to investigate cosmic-ray showers in the Earth’s atmosphere, the behaviour of large flows of electrons in electron discharge devices, and the progress of various chain reactions.

Games and Decision Making

What is the theory of games? Suppose we must make a decision when our objectives are opposed by another party, when our will is in conflict with another will. Such situations are common, and they are called conflict situations. They are typical for military actions, games, and every-day life. They often arise in economics and politics.

A hockey player makes a decision that takes into account the current situation and the possible actions of the other players. Every time a chess player makes a decision, he (or she) has to consider the counteraction of the opponent. A military decision should allow for the retaliation of the enemy. In order to decide at what price to sell a product, a salesman must think over the responses of the buyer. In any election campaign, each political party in a capitalist country tries to foresee the actions of the other parties that are

85

competing for power. In each case, there is a collision of opposing interests, and the decision must be related with overcoming a conflict.

Decision making in a conflict situation is hampered by uncertainty about the behaviour of the opponent. We know that the opponent will try to act in a way that is least advantageous for us in order to ensure the greatest advantage for himself. However, we do not know to what extent our opponent is able to evaluate the situation and the possible consequences and, in particular, how he evaluates our options and intentions. We cannot predict the actions of the opponent accurately, and the opponent cannot predict our actions. But nonetheless, we both have to make decisions.

Because some way of justifying an optimal decision was needed in conflict situations, a new mathematical discipline arose, the theory of games. The “game” here is a mathematical model of a conflict situation. Unlike a real conflict, a game has definite rules which clearly indicate the rights and duties of the participants and the possible outcomes of the game (a gain or loss for each participant). Long before the emergence of game theory, simple models of conflicts were used. widely. I mean games in the literal sense of the word: chess, checkers or draughts, dominoes, card games, etc. In fact, the name of the theory and the various terms used in it are all derived from these simple models. For instance, the conflicting parties are called players, a realization of a game is a match, the selection of an action by a player (within the rules) is a move.

There are two kinds of move, personal and chance ones. A personal move is when the player conscientiously selects an action according to the rules of the game. A chance move does not depend on the player’s will: it may be determined by tossing a coin, throwing a die, taking a card from a pack, etc. Games consisting of only chance moves are called games of chance, or games of hazard. Typical examples are lotteries and bingo. Games with personal moves are called strategic. There are strategic games consisting exclusively of personal moves, for instance, chess. There are also strategic games consisting

GAMES AND DECISION MAKING

86

DECISION MAKING

of both personal and chance moves, for instance, certain card games. Let me remark that the uncertainty in games with both personal and chance moves involve both sorts of randomness: the uncertainty of the result of the chance moves and the uncertainty of the opponent’s behaviour in his personal moves.

Game theory is not interested in gambles. It only deals with strategic games. The aim of the game theory is to determine the player’s strategy so as to maximize his chances of winning. The following basic assumption underlies the search for optimal strategies. It is assumed that the opponent is as active and as reasonable as the player, and he or she also takes attempts to succeed.

Naturally, this is not always true. Very often our actions in real conflicts are not as good as they could be when we assume reasonable behaviour from our adversary; it is often better to guess at the “soft spots” of the opponent and utilize them. Of course, we take a risk when doing so. Itis risky to rely too much on the soft spots of the opponent, and game theory does not consider risk. It only detects the most cautious, “safe” versions of behaviour in a given situation. It can be said that game theory gives wise advice. By taking this advice when we make a practical decision, we often take a conscientious risk. E.S. Wentzel writes in Operations Research:

“Game theory is primarily valuable in terms of the formulation of the problem, which teaches us never to forget that the opponent also thinks and to take into account his possible tricks and traps. The recommen- dations following from the game approach are not always concrete or realizable, but it is still useful, while taking a decision, to utilize a game model as one of several possible ones. But the conclusions proceeding from this model should not be regarded as final and indisputable.”

The payoff matrix of a game Finite two-person zero-sum games are the best investigated types in game theory. A /wo-person game is a game in which there are exactly two players or conflicting interests. A game is finite if both players have a finite number of possible strategies, i.e. a finite number of behaviours. When making a personal move, a player follows a strategy. A

87

zero-sum game is a game where the gain by one player equals the loss by the other.

Suppose there is a finite two-person zero-sum game where player 4 has m strategies and player B has 7 strategies (an mx game). We use A,, A,...,4,, to denote the strategies available to player A and B,, B,..., B, the strategies available to Payer B. Suppose player A makes a personal move and selects a strategy A,(1 < 7 < m), and player B at the same time selects strategy B(l <j <n). We use 4,; to denote the gain of player A. Let us identify ourselves with player A and consider each move from his viewpoint. The gain

4,; may be either a real gain or a loss (a loss would be a negative gain). The set of gains a;; for different values of / and / can be arranged in matrix form with the rows corresponding to player A strategies and the columns to player B strategies (Figure 2.10). This is called the payoff matrix for the game.

Consider the following game. Each player, A and B, writes, simultane- ously and independently, one of three numbers 1, 2, or 3. If the sum of the

GAMES AND DECISION MAKING

Figure 2.10: Strategies in a finite two-person zero-sum game.

88

Figure 2.11: A payoff matrix in a3 x 3 finite two- person zero-sum game.

DECISION MAKING

numbers is even, player B pays player 4 the sum, while if the sum is odd, A pays it to B. Player A has three strategies: A, to write 1, A, to write 2, and A; to write 3. Player B has the same strategies. The game is a 3 x 3 one because its payoff matrix contains three rows and three columns. This matrix is given in Figure 2.11(a). Note that a gain by player A of, for instance, —3 is a loss in reality because 4 pays 3 units to B.

Some of the elements are positive and the others are negative in the matrix in Figure 2.11(a). It is possible to make all the elements of the payoff matrix positive by adding some number, say 6, to each element of the matrix. We obtain the matrix in Figure 2.11(b). This matrix is equivalent to the initial one from the viewpoint of analyzing optimal strategies.

The minimax principle Let us analyze the game using. the payoff matrix in Figure 2.11(b). Suppose we (player 4) pick strategy 4,. Then, depending on the strategy selected by player B, our gain may be either 8 or 3 or 10. Thus, strategy A, yields a gain of 3 in the worst case. If we choose either 4, or 43, the worst gain is 1. Let us write down the minimum possible gains for each strategy A, as an additional column in the payoff matrix (Figure 2.12). It is clear that we should choose a strategy whose minimum possible gain is greatest (as compared with the other strategies). This is strategy A, in this case. Three is the largest one out of the minimum gains for each strategy (viz. 3, 1, and 1). This is called the maximin gain, or the maximin, or just the

89

maxim. Itis also sometimes called the lower value of the gain. Thus, if we

select the maximin strategy (strategy 4, in this case), our gain is guaranteed to be, whatever the behaviour of the opponent, at least the lower value of the game (a gain of 3 in this case). The opponent will reason in a similar way. If he selects strategy B,, he will have to give us a gain of 10, which is his worst case.

The same can be said of strategy B,. Strategy B, yields the worst case for the opponent corresponding to a gain of 12 for us. Numbers 10, 10, and 12 are the maximum values of our gains corresponding to the opponent’s strategies B,, By, and B3, respectively. Let us write these values as a row in the payoff matrix (see Figure 2.12). It is clear that our opponent should select the strategy which mnimizes our maximum possible gain. This is either strategy B, or B,. Both strategies are minimax ones and both guarantee that our opponent limits our gain to the n/max, or, in other words, the upper value of the game is 10.

Our maximin strategy and the minimax strategy of the opponent are the most cautious “safe” strategies. The principle of being cautious dictating that the players select such strategies is called the minimax principle.

Now let us return to the matrix in Figure 2.12 and try some reasoning. The opponent has two minimax strategies, B, and B,. Which strategy should he choose? If he knows that we are cautious and have selected the maximin strategy A, he would not select strategy B, because this would yield a gain of 8. Therefore, it is likely that he would choose strategy B,, and our gain would then be 3. But if we perceived our opponent’s ideas correctly, shouldn’t we take a risk and choose strategy 4,? If the opponent then selects strategy By, our strategy A, will give us a gain of 10. However, our deviation from the minimax principle may cost us dearly. If the opponent is even cleverer and reasons in a similar way, he would answer our strategy 4, with strategy B, rather than B,. And then, instead of a gain of 10, we would only gain 1.

Does this mean that game theory only recommends we adhere to a min-

GAMES AND DECISION MAKING

Figure 2.12: A payoff matrix in a3 x3 finite two- person zero-sum game.

90

Figure 2.13: A 3 x 3 game with saddle point.

DECISION MAKING

imax (maximin) strategy? It depends on whether the payoff matrix has a

saddle point.

A game with a saddle point. Consider the 3 x 3 game, whose payoff matrix is given in Figure 2.13. Here both the maximin and minimax gain 4. In other words, the lower and the upper value of the game coincide and both are equal to 4. A gain of 4 is simultaneously the maximum of the minimum gains for strategies A,, A,,and_A, and the minimum of the maximum gains for strategies B,, By, and B;. In geometry, the point on a surface which is at the same time a minimum along one coordinate axis and a maximum along the other is called a saddle point. Point C on the surface in Figure 2.13 isa saddle point. It is the maximum along the x-axis and the minimum along the y-axis. It is easy to see that the surface in the vicinity of this point is actually like a saddle. Just as in geometry, element a,, = 4 of the payoff matrix in question is called the saddle point of the matrix, and the game is said to have a saddle point.

We need only look through the matrix in Figure 2.13, to see that each player should adhere to his maximin (minimax) strategy. These strategies are optimal in a game with a saddle point. Any deviation from them will be

91

disadvantageous for the player who took the risk.

However, if a game does not have a saddle point (see the matrix in Fig- ure 2.11), neither of strategies 4; or B is optimal.

The necessity of a random change of strategy in a game without a saddle point. Suppose that we and our opponent repeatedly play the game whose matrix is given in Figure 2.11. If we choose a definite strategy, for instance, the maximin strategy 4,, and adhere to it turn after turn, our opponent will see it and select strategy B, each time, so that our gain will not exceed the lower value of the game, i.e. it will equal 3. However, if we suddenly (for the opponent) choose strategy 4, instead of A,, we receive a gain of 10. Having guessed our new strategy (naturally, if we later adhere to it), our opponent will go from strategy B, to strategy B, right away, thus decreasing our gain to 1. And so forth. We can see here a general rule for games without a saddle point: a player using a certain strategy will be worse off than a player who changes strategy at random.

However, the random changes in strategies should be done wisely rather than haphazardly. Suppose A,, 4, ..., A, are the possible strategies of player A (see Figure 2.10). To obtain the greatest benefit, the strategies should be chosen at random but with different (specially calculated) probabilities. Suppose strategy A, is used with probability p,, strategy 4, with probability py etc. Player A is now said to havea mixed strategy S4(Pys Pos Pm)- Unlike 54, the A, strategies are called pure strategies. By correctly selecting the probabilities pa mixed strategy may be optimal. The gain of player A will then be no less than a certain value y called the value of the game. This value is greater than the lower value of the game, but less than the upper one.

Player B should behave in a similar manner. His optimal strategy is also a mixed strategy. Let us designate it Sp (4,5 qs...» Y,)» where q; are specially selected probabilities with which player B uses strategies B i When player B selects an optimal mixed strategy, the gain of player A will be no more than game value v.

GAMES AND DECISION MAKING

92.

DECISION MAKING

The search for an optimal mixed strategy. Let us use S4(p,, fo ++ > Pm) to denote an optimal mixed strategy for player 4. We must now find proba- bilities p,, p,,..-5 p,, and calculate the game value v once the payoff matrix of the game is known (see Figure 2.10). Suppose player B selects pure strategy B,. Then the average gain of player A will be 4.) p, + 41 P) +. + 4y1P, This gain should be no less than the game value v, and hence

Ay Py + 4p Po + + Ay Py 2 Y- If player B selects strategy B,, the average gain of player 4 should also be no less than the game value v, and hence

4172p; + 47 pr F sack an2Pm 2%.

Whichever strategy player B chooses, the gain of player A should always be no less than the game value v. Therefore, we can write the following system of z inequalities (recall that 7 is the number of B’s pure strategies):

4p) + 4g) Py + + Ani Pm 2 Y> 4y2P) + 2y2Py + + An2Pm 2 Y> Aes

Ay nPy + Agy fr + + + Any Pm 2 Y-

(2.10)

Recall that Pit fy t+ Py = 1. (2.11)

Introducing designations x, = p,/¥, % = py/V,...%, = P»/v we can rewrite (2.10) and (2.11) as

Ay Xy + Ag Xy te +X Py 2A; Aya X1 + Ag Xq + + Ayo Xp, 2 1,

eas (2.12)

Ay yh + Ay Hoe F Ay Xm 21.

93

1 My +X Fo + Yi = F (2.13)

It is desirable that the game value y should be as large as possible, and hence 1/v should be as low as possible. Therefore, the search for the optimal mixed strategy is thus reduced to the solution of the following mathematical prob- lem: find non-negative values x, x, ... x,, such that they meet inequalities (2.12) and minimize the sum x, + % +... +%,,-

Airplanes against antiaircraft guns. Let us find the optimal mixed strategy for a concrete game. Suppose “player” A wants to attack “player” B. A has two airplanes each carrying a large bomb. 8 has four antiaircraft guns defending an important military base. To destroy the base, it is sufficient for at least one airplane to approach it. To approach the base the airplanes may choose one of four air corridors (Figure 2.14, where 0 is the base and I, II, III, and IV are the air corridors). A may send both airplanes along the same corridor or along different corridors. B may place his four antiaircraft guns to cover the corridors in different ways. Each gun can only shoot once, but it will hit the airplane if it is in that corridor.

A has two pure strategies: strategy 4, to send the airplanes along different corridors (no matter which ones), and 4,, to send both airplanes along the same corridor. B’s strategies are B, to put an antiaircraft gun into each corri- dor, B, to put two guns into two corridors (leaving the other two corridors unprotected), B, to put two guns into one corridor and one gun into two of the other corridors, B, to put three guns into a corridor and one gun into another corridor, and B; to put all four guns into one corridor. Strategies B, and B, are certainly bad because three or four guns in a single corridor are not needed, since A only has two airplanes. Therefore, we need only discuss strategies By, B,, and B3.

Suppose A chooses strategy A, and B chooses strategy By. It is clear that neither airplane will reach base: the 4’s gain will be zero (a,, = 0). Suppose strategies 4, and B, are chosen. Let us assume that the guns are in corridors I and II. If the aircrafts are flying along different corridors, then six variants are

GAMES AND DECISION MAKING

Figure 2.14: Strategies with aeroplanes and anti- aircraft guns.

Figure 2.15: Matrix of probable with aeroplanes and anti-aircraft guns.

94

DECISION MAKING

equally probable: they fly along corridors I and II, along corridors I and II, along corridors I and IV, along II and II, along II and IV, or along III and IV. In only one of the six cases will neither plane reach the base (when they fly along corridors I and II). Whichever corridor B chooses to place his guns in, airplanes will always have six equally probable variants and only one does not yield a winning move. Therefore, if strategies 4, and B, are chosen, the probable gain for A will be 5/6 (a, = 5/6). Reasoning in the same manner, it is easy to find the rest of the elements of the payoff matrix for this game. The resultant 2 x 3 matrix is shown in Figure 2.15. Note that the elements of the matrix are probable gains; so here even the pure strategies involve chance. The lower value of the game is 1/2, and the upper one is 3/4. The maximin strategy is 4, while the minimax strategy is B,. There is no saddle point, and the optimal solution for the game will be a mixed strategy.

In order to find the optimal mixed strategy, let us use the payoff matrix and relations (2.12) and (2.13). The relations for this case are 1

% 21, sytam 21, y+

Z 5 x, 21, (2.14)

(2.15)

The solution can be conveniently represented as a diagram. We plot the positive values x, and x, along the coordinate axes (Figure 2.16). The first inequality in (2.14) corresponds to the area above the straight line CC; the second inequality is the area above DD; and the third inequality in (2.14) is the area above EE. All three inequalities are satisfied inside the area shaded red in the figure. The equation x, + x) = const defines a family of straight lines, some of which are shown in figure as dash lines. The straight line FF has the least sum x, + x, of all the lines in the family with at least one point within the red area. Point G indicates the solution corresponding to the optimal mixed strategy. The coordinates of this point are x, = 3/5 and 4X = 1. Hence we find y = 5/8, p, = 3/8, and p, = 5/8. Thus, 4’s optimal mixed strategy would be to use strategy 4, with probability 3/8 and strategy

95

A, with probability 5/8.

F 2 mi

1 5 6, 5

How could we use this recommendation in practice? If there is only

one bombing raid in the “game”. A clearly should select strategy 4, because

po > p;. Suppose now the game has many raids (for instance, raids on many

bases). If the game is run N times (N > 1), then 4 should choose strategy A, 3N/8 times and strategy 4, 5/8 times.

We have so far only discussed the behaviour of A, allowing B to act arbi- trarily. If-A selects his optimal mixed strategy, his average gain will be between the upper game value of 3/4 and the game value v = 5/8. If B behaves un- reasonably, the 4’s gain may rise to the upper value of the game (or even greater). However, if B in turn adheres to his optimal mixed strategy, the A’s gain will equal the game value v. The optimal mixed strategy for B pre- cludes his use of strategy B, and is to use strategy B, with probability 1/4 and strategy B, with probability 3/4. That strategy B, should not be used can be seen from Figure 2.16: the straight line EE corresponding to this strategy

GAMES AND DECISION MAKING

Figure 2.16: Solution for the game of aircrafts and anti-aircraft guns.

DECISION MAKING

does not have any points in the red area. To determine the probabilities with which to apply strategies B, and B,, we use the game value »y = 5/8, and get g, x 0+ (1-4,) x 5/6 = 5/8. Itis clear from this that g, = 1/4 and h®=1-4q, =3/4.

Chapter 3

Control and Self-control

Cybernetics penetrated and continues to penetrate every area of man’s work and daily life. This is the science of the optimal control over complex processes and systems.

A.I. Berg

The Problem of Control

Control against disorganization. Although the world around us is full of chance, it nonetheless proves to be organized and ordered in many ways.

The disorganizing effect of chance ts countered by the organizing influence of control and self-control.

Suppose an airplane flies from Moscow to Leningrad. Various random factors affect it during the flight. Therefore, all three space coordinates of the airplane are random functions of time. The flight trajectory is a realization

97

98

CONTROL AND SELF-CONTROL

of these random functions. However, these “subtleties” do not bother the passengers; they fasten their belts before takeoff confident that whatever thun- derstorms might occur on the way and whichever winds affect the airplane, it will arrive at Leningrad airport. The basis for this confidence lies in the air- craft’s control system and the actions of the pilot. We met queueing systems above, and although there is a great deal of chance, they comply with their objectives. This is because the organization of the system and the control of its operation is well-designed.

Controls take on a variety of guises. Suppose we want a set of books to serve public for a long time. This is impeded by chances both purely physical in nature and those related to the attitudes of some readers. So we control matters: we take care of the binding, regulate the temperature, humidity, and illuminance in the rooms where the books are stored, give the book a library card, and set up the rules governing the use of the books.

No oneis safe from disease, and although each disease has a definite cause, the prevalence and lethality of a disease on the scale, say, of a town is governed by chance. When fighting it, we must control matters by improving working and living conditions, taking preventive medical measures, constructing sta- diums, swimming pools, sport complexes, ordering pharmacies to supply the necessary drugs, etc.

Thus, there is a confrontation of two powerful factors in the world, two basic trends. On the one hand, there is randomness, a tendency to disorganization, disorder, and destruction in the long run. On the other hand, there is control and self-control, a tendency to organization, order, development, and progress.

Choice as a prerequisite of control. Ifall the processes and phenomena in the world were strictly predetermined, it would be meaningless even to speak of the possibility of control. [x order to control something, there must be some choice. How may we make a decision if everything is predetermined in advance? Every phenomenon must have several probable lines of devel-

99

opment. One may say that a world built on probability is the only world in which control is possible.

Control acts against chance, even though the possibility of control is brought about by the existence of chance. It is random occurrences that help us avoid predetermination. We can say that randomness “brings to life” its own “grave-digger”, i. e. control. This is a manifestation of the dialectic unity of the necessary and the random in the real world.

Control and feedback. Two different control schemes are shown in Figure 3.1, where S is the controlled system, CU is the control unit, V is the input to the controlled system (the control signal), P are random pertur- bations affecting the controlled system, and w is the final output from the system. Scheme (4) differs from scheme (a) in having a feedback loop, that is the control unit receives information about the results of control.

P v Ww S (a) | p Uv Ww AS (b) Feedback

What is feedback for? In answering this question, let me remark that the

THE PROBLEM OF CONTROL

Figure 3.1: Two different control schemes.

100

CONTROL AND SELF-CONTROL

“relationship” between randomness and control is one of active confrontation. Control acts against chance, and chance acts against control. The latter fact requires flexible control, the possibility for adjustment. The control unit must be able continuously to receive data about the results of the control and correct its signals to the system appropriately.

In point of fact, any real control system supposes the presence of a feed- back loop. Control without feedback is not only ineffective, it is actually unviable.

Take for example someone driving a motor-car. Imagine for a minute that the feedback suddenly disappeared, that is, the driver stopped attending to the motion of the car. The car would continue to be controlled, but without any feedback. The car is immediately affected by a variety of random events. A small bump or bend in the road, a car moving in the opposite direction all are random and could lead to an accident in only a few seconds.

The control algorithm. Now what should be done and how should the system be controlled? It depends on the situation and the goal being pursued. In fact the answer lies in the algorithm of control.

A control algorithm 1s a sequence of actions that must be carried out to reach a set of goals.

In the example with the car and a driver, the control algorithm contains rules on how to start the engine, how to brake, how to turn, how to shift gears, and so on. The algorithm also contains the traffic regulations and good driving practice.

In some cases the control algorithm is simple. For instance, in order to use a coffee machine, only the following two actions need be carried out: put a coin in the slot, and press the appropriate buttons. This is the complete control algorithm for this machine. In other cases, the control algorithm is

101 FROM THE “BLACK BOX"! TO CYBERNETICS

much more complicated. For instance, it is more difficult to drive a car, while flying a jet is even more complicated. In very complicated cases, the control algorithm cannot even be defined in full. For instance, complete control algorithms for managing a large enterprise or industry simply do not exist.

From the “Black Box” to Cybernetics

Despite the diversity of algorithms, the processes of control can be investigated

from general positions, irrespective of the details of the considered system. A 8 P p y'

typical example is the simulation of a system using the “black box” model.

What is a “black box”? Suppose we consider a controlled system, where V,,V,,...,V,, are its inputs (control signals), P is a random perturbation, and W,,W,, ..., W, are its outputs (Figure 3.2). Now let us suppose that we do not know or do not care what is inside the system. We only need investi- gate the relationships between the inputs (4,4, ...,V,,) and the outputs (W,, W,, ..., W,). It is said in this case that the given system is a “black box”.

Figure 3.2: A black box is a system whose inputs and outputs are known, but internal structure is not.

102

CONTROL AND SELF-CONTROL

Any controlled system is a “black box” if its internal structure is not con- sidered, and only the responses of the outputs to the inputs are investigated.

Man surrounded by black boxes. The advance of science and technol- ogy has surrounded mankind by a vast number of controlled systems. As a rule, we are not a bit bothered by this because we quickly get accustomed (sometimes unconsciously) to considering these systems as black boxes. We find out how, what, and where to turn, press, or switch the buttons to obtain the desired effect. If you want to watch a TV show, there is no need to know the structure or workings of a television. We need only press the proper but- ton and select the channel. To make a telephone call, we do not have to be telephone engineers; we just pick up the receiver, wait for the call signal, and dial the telephone number. We use television, telephone, and many other systems and consider them to be black boxes. Naturally, we could learn what is inside the system and how it works if we want to, but in our modern world we often think it’s a waste of time to study what we can quite do without in practice. More and more often we prefer to use black boxes and when they fail we call in a professional technician.

We should recognize the validity of the complaints that as modern people we have become less curious, that we do not want to see things in depth because there are too many things to see, and it is not difficult to use them. However, I should not make things appear to be worse than they are. Firstly, there is a system of universal secondary education at least in the developed countries, which ensures each person has a basic minimum knowledge. Sec- ondly, from the viewpoint of the development of society, the knowledge available to the society as a whole is more important than what a single person may know.

Complex systems as black boxes. Modern systems are becoming more and more sophisticated as their functional capacities become more and more diverse. Naturally, the more we need to know about the functions of a system, the further we push our investigation of its inner structure into the back-

103 FROM THE “BLACK BOX"! TO CYBERNETICS

ground, and in many cases such a total investigation would prove infeasible because of the complexity of the system.

This shift of emphasis leads us to a qualitatively new viewpoint, in which the main aim is to investigate control and self-control as general processes irrespective of the concrete devices comprising the systems. This point of view brings about cybernetics as the science of control (self-control) in complex systems.

Curiously this point of view reveals an interesting fact and makes us look at the black-box model in another way. It turns out that we do not need under- stand every structural subtlety of a complex system, indeed its separation into component parts can obscure essential information. The black-box model becomes fundamental as the only acceptable way of analyzing a complex system.

What is cybernetics? The science of cybernetics was founded by the American scientist Norbert Wiener (1894-1964) and dates from 1948 when he published his famous book Cybernetics, or Control and Communication in the Animal and the Machine. Wiener wrote:

“We have decided to call the entire field of control and communication theory, whether in the machine or in the animal, by the name cyber- netics, which we form from the Greek xuGepryt7é, or steersman.”

It should be noted that the term “cybernetics” was not new. Plato used it meaning the art of controlling ships. The French physicist Ampére classified sciences in the first half of the 19" century and placed a science, which was the study of the methods of government, in section 83. Ampére called this science cybernetics. Today we only use the term “cybernetics” in the sense given to it by Wiener.

Cybernetics ts the science of the control and communication in complex systems, be they machines or living organisms.

104

CONTROL AND SELF-CONTROL

The Soviet scientist L.A. Rastrigin wrote a book called This Chancy, Chancy, Chancy World (Mir Publishers, Moscow, 1984), in which he re- marked:

“Until cybernetics made its appearance, control processes in an elec- tric generator were investigated by electrical engineering, control of the motion of a clock pendulum (in effect a swing) was dealt with in mechanics, and control of population dynamics in biology. Norbert Wiener was the first to point to the universal nature of control and to show that the organizing of an object (the lowering of its entropy) could be achieved by means of standard procedures, that is, by applying the methods of cybernetics independently of the physical characteristics of the object.”

L.A. Rastrigin imaginatively calls cybernetics a science which fights random- ness, thus emphasizing the idea of control counteracting disorganization and destruction caused by diverse random factors.

Cybernetics and robots. One of the central topics of cybernetics con- cerns process automation, in particular, self-control in complex systems. In- vestigations into this area resulted in the appearance of a discipline called “robotics”. Modern cybernetics literature discusses the possibility of design- ing automata that can reproduce and teach themselves. Artificial intelligence is also a topic being investigated. The following questions are being studied: Is the machine capable of creativity? Could a machine become cleverer than its designer? Could the machine think?

The more sophisticated types of robots are still in the realms of science fiction, although we often hear discussions about the possibilities of robotics, or rather whether artificial “men” might be possible. The layman now seems to believe that cybernetics is indeed simply the science of robots, automata, or thinking machines. The true purpose of cybernetics as the science of control is now masked by the fantastic technological promise.

True, cybernetics does include the problems of automation, and thus

105

contributes to scientific and technological progress. The automation of various processes, the design of automatic Lunar explorers, automatic space docking are all achievements of cybernetics. Cybernetics also investigates computer creativity and artificial intelligence. However, this is not so as to evolve an artificial person. When we programme computers to “compose” music or “write” a poem or play chess or give a “talk”, we are attempting to simulate creativity and so find out more about these processes. It could be said that we are investigating the limit of computer abilities, but not that we want to substitute them for human beings in the future: we just want to understand several important topics thus making it possible to go deeper into the control processes occurring in human beings. The reader should remember this and not consider cybernetics to be just the “science of robots”.

We may now start discussing the central notion of cybernetics, i. e. 7- formation. Let me say right away that cybernetics investigates control and self-control primarily from the viewpoint of information. It investigates the collection, conversion, transmission, storage, and retrieval of information. In a certain sense of the word, cybernetics can be regarded as the “science of information”.

Information

Let me begin with an excerpt from the immortal poem De Rerum Natura (On the Nature of Things) by Carus Lucretius (ca. 99-55 B.C.):

if things came to being from nothing,

Every kind might be born from all things,

Nought would need a seed.

First men might arise from the sea, and from the land, The race of scale creatures, and birds burst forth

The sky. Cattle and other herds, and all the tribe

Of wild beasts with no law of birth,

INFORMATION

106

CONTROL AND SELF-CONTROL

Would haunt tilth and desert ...”

It is interesting that there is here a hint of the conservation of not only matter and energy, but also of something else, which is neither matter nor energy. There is no shortage of energy and matter in the sea, but people do not appear in the sea. Nor too does the dry land produce fish. Truly, “if things came to being from nothing, ... nought would need a seed”. In the modern terminology of science, we might say that this is a hint of the conservation of information. The information needed by plants and animals to live and reproduce cannot appear “from nothing”. It is stored in “seeds” and thus handed down from generation to generation.

The term “information” is now encountered everywhere in science and everyday life. In fact, every activity is related to the collection, conversion, transmusston, storage, and retrieval of information. We live in a world filled with information, and our very existence is impossible without it. Academi- cian A.J. Berg once said: “Information penetrates every pore of the life of human beings and their societies ... Life is impossible in a vacuum of either mass-energy or information.”

The bit, the unit of information. What is information? What units is it measured in? Let us start with a simple example. A train approaches a station. By remote control, a signalman can switch a train from one track (A) to another (B). If the switch is up, the train goes along track A, and if it is down, the train goes along track B. Thus, the signalman, by moving the switch up or down, is sending a control signal containing 1 bit of information. The word “bit” is an abbreviation of “binary digit”.

To see what we mean by “binary digit”, recall how digits are used to write numbers. We commonly use the decimal number system, i.e. a system with ten digits (0, 1, 2, ..., 9). Take a number written in the decimal system, say 235. We say “two hundred and thirty five” and, as a rule, do not pause to think that this means the sum of two hundreds, three tens, and five units, i.e. 2x10? +3. x 10+5 x 10°. The same number (235) can also be in the

107

binary system, which only has two digits, 0 and 1, as 11101011, which means 1x27 41x 2°41x2° + Otimes2* +1x27+0x2? +1x 2) 4 Itimes2®. Since 2’ = 128, = 64, = 32, 2? = 8,2’ = 2, and = 1, we have our number 128 + 64+ 32+ 8+2+4 1 = 235. Any number can be written in either the decimal or the binary system. If you don’t follow this explanation try looking at Figure 3.3.

23>

1x107+3x10'+5%10°

11101011

1% 274+1«2°+1«2°-+0x24+1 x2740x274+1x2)41 x2”

Let us return to the railway example. Remember we have two choices: the switch is either up (track 4) or down (track B). We could write the digit 0 for switch up and digit 1 for switch down. It can be said that the control signal can thus be coded by one of the two binary digits, zero or unity. The signal thus contains one binary digit, or 1 bit of information.

Consider a more interesting example. The railway lines near a station are shown in Figure 3.4.

The railway switches are labelled by the letters a, b, c, d, e, f, and g. If a switch receives a control signal of 0, it opens the left-hand track, and if it receives a signal of 1, it opens the right-hand track. The signalman has three control switches: the first one sends a signal (0 or 1) to railway switch a, the

INFORMATION

Figure 3.3: Representing the number 235 in decimal and binary systems.

108 CONTROL AND SELF-CONTROL

Figure 3.4: Controlling railway lines by using switches. second one sends a signal simultaneously to switches 5 and c, and the third

one simultaneously to switches d, e, f, and g. The station has eight tracks: A,B,C,D,£,F,G, and H. To send a train along track A, all three control switches must be turned to the 0 position, i.e. send the three-digit signal 000. To direct a train to track B, it is necessary to send the three-digit signal 001. Each track thus has its own three-digit signal, i.e.

A B C D E F G H 000 O01 O10 O11 100 101 110° 111

We see that to select one of the eight outcomes requires a set of elementary

109

signals, each of which carries 1 bit of information. Therefore, to choose a track in this example requires three bits of information.

Thus, in order to select one option out of two, 1 bit of information is required; in order to select one option out of eight, 3 bits of information are required. In order to select one of N options, J bits of information are required, where

I= log, N. (3.1)

This is the Hartley formula. It was suggested in engineer Ralph Hartley, who was interested information.

The Bar Kohba game. A rebellion against Romans broke in 135 A. D. in the ancient Judea led by one Bar Kohba. As the legend has it, Bar Kohba sent a spy into the camp of Romans, and the spy discovered a great deal before being caught. He was tortured and his tongue was cut out. However, the spy managed to escape, but without his tongue he could not report what he had found out in the enemy’s camp. Bar Kohba resolved the problem by asking the spy questions that only required a “yes” or ‘no” answer (it was only necessary to nod or shake the head). Bar Kohba was able to obtain all the information he wanted from his spy, even though the spy had no tongue.

A similar situation is described in Le comte de Monte Christo by Alexan- dre Dumas pére. An old man in the novel had been paralyzed and could neither speak nor move his hands. Nonetheless, his relatives were able to communicate with him asking him questions which required only a “yes” or a “no”. If “yes”, the old man would close his eyes; if he blinked several times, it was “no”.

It turns out that any information can be transmitted in the form of “yes” and “no” answers if the questions are constructed properly. This idea underlies the Bar Kohba game, which first appeared at the turn of the century in Hungary and then spread to other countries. A player thinks of something. He may, for instance, make a wish or even think up a sentence. The other

INFORMATION

110

CONTROL AND SELF-CONTROL

player must guess the wish or sentence by asking questions, which must be honestly answered. However, the questions may only require a “yes” or “no” answer. The quantity of information needed for a correct guess can be measured by the number of questions, given that the most rational method of interrogation is used. Each answer can be enciphered by a binary digit, for instance, we could use a one for a “yes” and a zero for a “no”. Then the information needed for a correct guess would be a combination of zeroes and unities.

Let us play a Bar Kohba game with the railway signalman at the station whose tracks are given in Figure 3.4. The signalman thinks of a track along which a train should travel to the station. We want to guess the track. The game would go as follows.

QUESTION: Should switch a open the track on the right? ANSWER: No. (let us cipher this answer by digit 0). QUESTION: Should switch 4 open the track on the right? ANSWER: Yes (we cipher: 1).

QUESTION: Should switch e open the track on the right?

ANSWER: Yes (we cipher: 1).

Having asked these three questions, we see that the signalman decided on

track D. The information needed to answer was the chain of answers “no-

yes-yes” or, in other words, by the set of binary digits 011. We know that the 2 (cs

information capacity of the signalman’s “riddle” was three bits long. Each of the signalman’s three answers contained one bit of information.

Let me cite one more example of the Bar Kohba game. There are 32 pupils in a class. The teacher decides on one of them. How can we find out which one? Let us take the class register, in which the surnames of all the

11 INFORMATION

pupils are listed in alphabetical order and enumerated. Let us start asking questions.

QUESTION: Is the pupil among those listed from 17 to 32?

ANSWER: Yes (we cipher: 1).

QUESTION: Is the child among those listed from 25 to 32?

ANSWER: No(0).

QUESTION: Is the child among those listed from 21 to 24?

ANSWER: No(0).

QUESTION: Is the child among those listed either 19 or 20?

ANSWER: Yes (1).

QUESTION: Is it number 20?

ANSWER: No(0).

Consequently, the teacher meant pupil number 19 in the class register. This information required the chain of answers “yes-no-no-yes-no” or, in other words, the set of binary digits 10010. It is clear from Figure 3.5 that the area in which the surname was searched for gradually decreased with each answer. To solve the problem, it only required to ask five questions. According to the Hartley formula, the selection of the option out of 32 requires log, Z=5

bits of information. Therefore, each of the answers in this game contained 1 bit of information.

Perhaps I have created the impression that each answer in the Bar Kohba game always contains 1 bit of information. It is easy to see that this is not so. Suppose that we established that a surname was listed from 17 to 32 and then ask: Is it the surname listed from 9 to 16? It is dear that the answer to this question must be negative. The fact that the answer is obvious means that it

112 CONTROL AND SELF-CONTROL

After answering 7 32

LTTE TTC

The first question 7 24 The second question 7 20 The third question 19 20

The fourth question 19

ATTA

The fifth question

Figure 3.5: Finding out the selected pupil from

a group of 32. . . . . . . pons does not contain any information at all. Naturally, we might have a situation

without “silly” questions. QUESTION: Is the surname listed from 1 to 8? ANSWER: No. QUESTION: Is it listed from 25 to 32? ANSWER: No. QUESTION: Isit listed from 9 to 16?

ANSWER: No.

113

QUESTION: Is it listed from 17 to 24? ANSWER: Yes.

QUESTION: Is it listed either 23 or 24? ANSWER: No.

QUESTION: Is it listed either 19 or 20? ANSWER: Yes.

QUESTION: Is it listed 19?

ANSWER: Yes.

Having chosen this strategy, we extracted the needed information using eight questions rather than five. The quantity of information in the final answer equals 5 bits as before. Therefore, each individual answer in this case con- tained, on the average, 5/8 bit of information.

Thus, we see that “yes-no” answers do not always contain 1 bit of infor- mation. Running ahead of ourselves, we can note that 1 bit is the maximum information that such an answer may contain.

“Just a minute,” you might say, “if this is so, does then a binary digit not always carry one bit of information?”

“Quite true,” I would answer.

“Then how about the definition of a bit of information given above? Can we use the Hartley formula?”

All that has been said about a bit of information (and about the Hartley formula) remains valid, although with a reservation that every option should be equally probable. 1 did not want to discuss this topic too early, but now the time has come to do so.

INFORMATION

114

CONTROL AND SELF-CONTROL

Information and probability. The Shannon formula. I have empha- sized that control is only possible in a world where necessity is dialectically confronted with chance. In order to control something, there must be choice. Any situation we want to control carries with it uncertainty. This uncertainty can be compared with a shortage of information. While we control an object, we introduce information and thus decrease the uncertainty.

For instance, a train may arrive along any of the eight tracks in our example above, so there is uncertainty. By sending a control signal with three bits of information, the signalman eliminates this uncertainty, and the train is directed along one particular track. The teacher could have thought of any of his 32 pupils, so there was uncertainty which surname had been chosen. Having listened to the answers for a number of questions with an overall quantity of information of five bits, we can eliminate this uncertainty and

identify the pupil.

Now let us return to the starting point of our reasoning and to the pres- ence of choice. Until now, we assumed that each option was equally probable. The signalman could have chosen any of the eight tracks with equal proba- bility. The teacher could have picked anyone of his 32 pupils. However, we often have to choose between options that are not equally probable, and then it is necessary to pay due attention to the probability associated with each option. Suppose the answer to a question may be either “yes” or “no” and both outcomes are equally probable. The answer then will carry precisely 1 bit of information. However, if the “yes” or “no” outcomes have different probabilities, then the answer will contain less than 1 bit of information. And the greater the difference between the probabilities of the two outcomes, the smaller the quantity of information. In the limit of the probability of a “yes” (or a “no”) being unity, the answer will not contain any information at all.

Now, let us look at what happens when different outcomes (different options) have different probabilities. I do not want to cram this book with mathematics, so I shall only discuss the basic results. Suppose £ is a random

115

discrete variable that may assume the values x, , 7, x3, ... » %y with probabilities Pr» Pr P3> +> Px» Tespectively. We have N outcomes (N different values of the random variable) which appear with different probabilities. Given an observation of the variable £ and its value, how much information does this observation carry?

This problem was investigated by the American scientist Claude Shannon in the mid-1940s. He came to the conclusion that we obtain the quantity of information equal (in bits) to

N 16) = Y prlog, =. (3.2) t=1 a

This is a fundamental relation in information theory. Itis called the Shannon formula.

Suppose that the outcomes are equally probable, and the random variable may take on the values x, with the same probability p. This probability is clearly 1/N and so from (3.2) we obtain

1< 1 I= N 2_ log, N= vy NX log, N = log, N,

ie. the Hartley formula (3.1). Consequently, we see that the Hartley for- mula is a special case of the Shannon formula when all outcomes are equally

probable.

Using the Shannon formula, let us find how much information can be contained in a “yes” or “no” answer. Suppose p is the probability of a “yes”. Then the probability of a “no” answer is 1 p. According to (3.2), the information obtained from the answer to a question is

1 1 I =p log, ? ok (1 -p) log, ie (3.3)

The graph of J versus p, as defined by (3.3), is given in Figure 3.6.

INFORMATION

116

0 1/2 1

Figure 3.6: Graph of Shannon distribution.

CONTROL AND SELF-CONTROL

Maximum information (1 bit) is obtained when p = 1/2, i.e. when a “yes” and a “no” are equally probable. Now we can refine our notion of “1 bit of information”. This 7s the information contained in a digit that may take on only two values provided both values are equally probable.

It follows that the best strategy in the Bar Kohba game is to ask “yes” or “no” questions, the answers to which are nearly or equally probable. Recall the question: “Is the surname listed from 17 to 32?” Here the answers “yes” and. “no” are equally probable because there are 32 pupils and the numbers from 17 to 32 cover half of the pupils. Therefore, the answer to this question gives 1 bit of information. But for the question: “Is the surname listed from 1 to 8?” the range of numbers only covers a quarter of all the numbers and therefore the probability of a “yes” is 1/4, while that of a “no” is 3/4. The answer to this question would contain less than 1 bit of information. According to (3.3), in which we substitute P = 1/4, each answer contains 0.8 bit of information.

Once again I emphasize that control processes should be regarded in a dialectical unity with the random processes of disorganization. There is a deep relationship between information theory and probability theory. The Shannon formula (3.2) illustrates this point. The probabilistic approach pro- vides a scientific, objective notion of information that is free from a subjective substitution of the quantity of information by its significance or importance.

Information in communication channels with noise. When infor- mation is transmitted, some loss is unavoidable. This happens because of the action of random factors, which are commonly lumped together as noise. A

communication channel for transmitting information from input set 4 to output set B is represented in Figure 3.7. The information is affected by noise P as it is transmitted. Suppose that £ is an input discrete random variable which may assume values «, 2, 03, ... 5%. with probabilities p,, p), P35...» Py and y is the output variable, which may assume values y,, 7, 35 .--» Jag. With probabilities 9,, 4, 43...» 9yg- Let P(7) denote the probability that £ = Ij is the output variable if £ = x, was transmitted. The probability P.(/) is

117

determined by noise in the communication channel. It has been proved in information theory that the quantity of information about the random vari- able £ that can be obtained by observing the random variable 7 is described by the formula

M . 1,6) = >.> RU) p bog, © (3.4)

Here the information J is in terms of two types of probability, the probabili-

ties p, and g; on the one hand and the probability 2.(7) on the other. While the first two probabilities reflect the probabilistic nature of the information at the input of the communication channel and that “yes” or “no” questions, the answers to which are nearly or equally probable. Recall the question: “Is the surname listed from 17 to 32?” received at the output, probability P.(/) reflects the random nature of the noise in the channel.

Suppose there is no noise. Then the random variable values at the input and the output of the channel will be the same. Hence

N=M, p; = 4 and P(/) = 4, (3.5)

where 3, = 1 ford = jandd,, = 0 ford = 1 = 7. Substituting (3.5) into (3.4) and noting that lim z log, z = 0, we get the Shannon formula. This should have been expected because when there is no noise, there is no loss of information in its transmission.

Protection against noise in a communication channel. There are many sorts of communication channel. Information can be transmitted by

INFORMATION

Figure 3.7: A communication channel for trans- mitting information.

118

Figure 3.8: A communication channel with a

filter.

CONTROL AND SELF-CONTROL

sound waves propagating in a medium, electric signals running along wires, electromagnetic waves propagating in a medium or in vacuum, etc. Each communication channel is affected by its own sorts of noise. There are general techniques for handling noise that can be applied to any communication channel. First of all, it is desirable to minimize the level of noise and maximize the amount of information in the signals, so that the signal-to-noise ratio is large. The ratio can be increased by coding the transmitted information ap- propriately, e. g. transmitting it in terms of “symbols” (for instance, impulses of a certain shape) which can be distinctly identified against the background of noise. Coding a signal increases its “noise immunity” or performance in terms of error probability for the transmission.

A B

(a

(b) (

A special measure against noise is filtering (both smoothing and correla- tion) the information received at the output of communication channels. If

) (d)

c)

119

the characteristic noise frequency in a communication channel is substan- tially greater than the frequency typical for the time change in the signal, we could use a smoothing filter at its output to “cut out” the high-frequency os- cillations superimposed on the signal as it was transmitted. This is illustrated in Figure 3.8, in which (a) is a diagram of the communication channel with a filter (4 is the channel input, B is the channel output, P is noise, and F is a smoothing filter), (b) is the signal at the input, (c) is the signal at the output before filtering, and (d) is the signal after filtering.

GS @ P S

(a) (b) (c)

Suppose we want to find out whether the output contains a signal of a given shape. If the signal is very different (for instance, by frequency) from the noise background, it will be easily identified. The situation is worse when the signal is “masked” by noise. Correlation filtering is applied in these cases: a device is placed at the output which mu/ziplies the output signal by the known signal. If the desired signal is present in the output signal, the multiplication creates a very clear (large) final (correlation) signal; otherwise no correlation signal will appear. This is illustrated in Figure 3.9, in which (a) is a diagram of the channel (A is the signal multiplier, P is noise, and S is the signal shape to be recognized), (b) is the multiplied signal if the recognized signal S is present in the output (the correlation signal), and (c) is the multiplied signal if signal

INFORMATION

Figure 3.9: A communication channel with a filter and signal multiplier.

120

CONTROL AND SELF-CONTROL

S is absent in the output. Correlation filtering is used, for instance, in radar scanners to recognize the radiation signal emitted by the radar antenna.

Selection of Information from Noise

Where does information come from and some unsatisfactory answers. Any control signal carries certain information. The signal is formed using an algorithm which itself incorporates information, and this algorithm was compiled in turn using information contained in other algorithms. Thus we have a sort of relay race in which information is transmitted from algorithm to algorithm. This idea can be illustrated by a simple example. A teacher educates you, and in turn your teacher had a teacher, who had a teacher, and so on.

This argument leads inevitably to the questions: Whence the “original information”? Whence the first algorithm? An inability (or reluctance) to investigate scientifically the fundamental topic of where information comes from leads to serious misconceptions.

One such misguided hypothesis is that the original information was brought to the Earth by space travellers, who visited us in some long-forgotten past. This hypothesis is materialistic, but it is unsatisfactory because it begs the question of where the aliens got the information. Modern science indi- cates where the information comes from. The modern scientific answer is that there is no “original information”: the generation of information is a continuous and continuing process.

Chance at the forefront again. The idea of information being handed over like a relay baton in arace is simplistic. I pointed out that any transmission of information is accompanied by loss caused by random factors. However, random factors not only “steal” information, they also generate it.

121 SELECTION OF INFORMATION FROM NOISE

At first glance, this seems implausible. We witness the continuous creation of information as a result of human creativity. New machines are designed, spacecraft are launched, new books are published, and new drugs become available: these are all a testimony to the explosive generation of information in which everybody participates. So it would seem strange to speak of the fundamental role of chance in generating information.

However, consider the process of thinking, how a problem is solved, how an intuition appears, or how a melody or image emerges. If these examples are too philosophical, try and think at least about associative perception, that is how we recognize objects and distinguish them. Just try, and you will step into a domain of complicated links, probabilistic relationships, chance guesses, and sudden “revelations”. There are no deterministic algorithms for making discoveries or solving problems. Everything we know about the processes occurring in the brain indicates the fundamental role of random factors. Later I shall illustrate this by the example of perceptron, a cybernetic device which can recognize patterns.

Chance and selection. How can chance generate information? How can order appear from disorder? It turns out that the generation of information from noise can be easily observed. You can see this for yourself using the game of scrabble, or rather the small lettered blocks. Put one block with each letter of the alphabet into a bag, mix them, and take one out at random. Write down each randomly taken letter and return the block to the bag. Each time shake the bag. This simple generator of random letters can be used to generate along chaotic string. If you look closely, you will find some three-letter words, perhaps even words with more letters. Information is being generated from noise.

My son, for example, helped me do an experiment and in a string of 300 random letters found nine three-letter words and two four-letter. This argu- ment leads inevitably to the questions: Whence the “original information”? Whence the first algorithm? An inability (or reluctance) words. The more

122

CONTROL AND SELF-CONTROL

letters there are in a word, the smaller the probability of generating the word from “letter noise”. The generation of a sentence, let alone a line from a well-known work, is less probable. Nonetheless, the probability of doing is nonzero, and so there is the possibility of any information being generated randomly from noise.

Thus, we can say (although this sounds strange) that chance generates in- formation by chance. The greater the information, the smaller the probability of its random generation. That random information can be generated does not solve the basic problem. This randomly generated information must be detected from the enormous flow of meaningless “signals”. In other words, the znformation must be selected from the notse. In the example of taking lettered blocks out, the information is selected from the noise by the person who wrote out the letters and looked through the string.

Selection amplifier Is it possible to use chance conscientiously to gener- ate information? It is, so long as we amplify the selection.

You can do a simple experiment to demonstrate the amplification of selection using the random letter generator described above. In order to amplify the selection, we take into account the frequency with which letters appear in each word. Letter frequencies in English are often given when you buy a commercial game of scrabble. To allow for the frequencies, first eliminate the rare letters, e. g. Z, Q, J, V, X and. add extra blocks with frequent letters, e. g. four blocks with £ and T, three with A, J, O, L, N, G, R,.S, two with D, U, and one of all the rest. I cannot vouch that this selection is optimal, in a similar experiment I found 21 three-letter words, 4 four-letter words and 1 five-letter word in a succession of 300 random letters.

In order to amplify the selection still greater, we should use words rather

than /etters. It is curious that a similar device was suggested in the early ig? century by the English satirist Jonathan Swift in Gulliver’s travels. When Gulliver visited the Academy in Lagado (the capital of an imaginary kingdom), he met a professor who had an interesting apparatus. Swift wrote:

123 SELECTION OF INFORMATION FROM NOISE

“He then led me to the frame, about the sides whereof all his pupils stood in ranks. It was twenty feet square, placed in the middle of the room. The super faces were composed of several bits of wood, about the bigness of a die, but some larger than others. They were all linked together by slender wires. These bits of wood were covered on every square with papers pasted on them, and on these papers were written all the words of their language in their several moods, tenses, and declensions, but without any order. The professor then desired me to observe, for he was going to set his engine at work. The pupils at his command took, each of them, hold of an iron handle, there were forty fixed around the edges of the frame, and given then a sudden turn, the whole disposition of the word was entirely changed. He then commanded six and thirty of the lads to read the several lines softly as they appeared on the frame; and where they found three or four words together they might make part of a sentence, they dictated to the four remaining boys who were scribes. This work was repeated three or four times, and at every turn the engine was so contrived, that the words shifted into new places, as the square bits of wood moved upside down.”

True, Swift wrote satirically, laughing about such inventions. However, why should we not believe that a talented popular-science writer disguised