Entropy concept. Computer science. Randomness and uncertainty. Definition - what is combinatorics Amount of information as a measure of reducing the uncertainty of knowledge

Information is information or data that does not contain ambiguity for the addressee. If, nevertheless, the information contains some uncertainty, then it can be perceived only with a certain probability. Therefore, in the presence of uncertainty, the recipient of the information somehow tries to minimize the uncertainty itself. If at the same time it is possible for the recipient of information to eliminate uncertainty completely, then he has the information completely. Questions about how this can be done in practice are the subject of this chapter.

QUANTITATIVE UNCERTAINTY MEASURE

To compare uncertainties, consider the following examples or tests a, b and g , containing uncertainties H (a), H (b) and H (g), respectively:

1. Determine the next world chess champion (experience a).

2. Determine the number lottery ticket, which will receive the largest win in the upcoming lottery draw (experience b).

3. Determine the next President of the Russian Federation (experience g).

Obviously, the degree of uncertainty of each experiment differs from the other two, and most likely there are inequalities

H (b)> H (a)> H (g),

where H (a), H (b) and H (g) are quantitative measures of uncertainties (or entropy) of experiments a, b and g, respectively. In a particular case, if for some experiment d the equality H (d) = 0 holds, then the experiment d is said to be reliable, i.e. it does not contain ambiguity. In other words, the uncertainty of experience is nothing more than a lack of information or negative information.

I. Hartley's formula. Let a be an arbitrary experiment with k equiprobable outcomes А k

Events А 1 А 2. ... ... A k

The probabilities are 1 / k 1 / k. ... ... 1 / k.

For k = 1 H (a) = 0, and as k increases, H (a) also increases, i.e.

where f is some function of k. On the other hand, if b is another experiment independent of a with l equally probable outcomes B l, then for a complex experiment ab, consisting in the simultaneous execution of experiments a and b, it is natural to assume that the degree of uncertainty of experiment ab is equal to the sum of the uncertainties characterizing experiments a and b, those.

H (ab) = H (a) + H (b).

Thus, as a function f we can choose a logarithmic function and assume that

H (a) = log 2 k.

This is Hartley's formula and it is a measure of uncertainty about experience a, contained in experience a and having two equally probable outcomes (for example, "yes" or "no"). In other words, H (a) is that amount of information (a bit is counted as a unit of measurement of the amount of information), with the help of which the uncertainty of experience a turns into reliability.

So, for example, to guess the intended number in the range from 1 to 8, you need a maximum of 3 bits of information, i.e. there are three questions to ask.

II. Shannon's formula. Let a be an arbitrary experience with k uneven outcomes A k:

Events А 1 А 2. ... ... A k

Probabilities P (A 1) P (A 2). ... ... P (A k).

H (a) = - е P (A i) log 2 P (A i)

There is a measure of the uncertainty of experiment a according to Shannon. In particular, for Р (А i) = 1 / k, the Hartley formula follows from Shannon's formula.

3.1.1. Prove that H (ab) = H (a) + H (b).

3.1.2. How many questions must be asked to the students of the academic group to the teacher in order to determine the head of this group (the answers to the teacher's questions can be either "yes" or "no").

3.1.3. Consider the task 3.1.2. in case of one question.

3.1.4. Let x be an element of the set M of cardinality m. How many

information is needed to determine the element x?

3.1.5. Let x 1 and x 2 be two arbitrary elements of the sets M 1 and M 2 of cardinalities m 1 and m 2, respectively. How much information is needed to simultaneously determine the elements x 1 and x 2?

3.1.6. Let there be 27 gold coins, of which one is counterfeit (lighter than the real ones), and scales with cups. How many weighings do you need to carry out to determine a counterfeit coin?

3.1.7. Prove that any experiment a H (a) ³ 0, and H (a) = 0 if and only if one of the probabilities is 1, and the rest are 0.

3.1.8. Prove that H (a) ≤ log 2 k, where k is the number of outcomes of experiment a, and equality is achieved only if the outcomes are equally probable.

3.1.9. What properties does H (a) have if a has two outcomes?

CONDITIONAL UNCERTAINTY.

QUANTITY OF INFORMATION

Let a and b be two arbitrary experiments with k and l outcomes А k, В l, respectively. Then if a and b are independent, then

H (ab) = H (a) + H (b),

and if a and b are dependent, then

H (ab) = H (a) + H a (b),

where H a (b) is the conditional uncertainty of experiment b, provided that experiment a is performed and is determined by the equality k

H a (b) = å P (A i) H A i (b).

Here H A i (b) is the conditional uncertainty of experiment b under the condition of the outcome A i and is determined by the formula: l

H A i (b) = - е P A i (B j) log 2 P A i (B j), i = 1, k.

Obviously, if a and b are independent, then H a (b) = H (b), and H a ​​(b) ≤ H (b), if a and b are dependent.

The equality also holds

Consider the difference

I (a, b) = H (b) - H a (b),

which indicates how much the outcome of experience a reduces the uncertainty of experience b. The number I (a, b) is called the amount of information about experience b contained in experience a.

In particular, for a = b, we have I (a, a) = 0, which can be interpreted as the amount of information about experience a contained in itself. If a and b are independent, then

those. generally

I (a, b) ≥ 0,

which can be interpreted something like this: from everything that is taught at the university, there will be no harm, and in the worst case, there simply will be no benefit.

I (a, b) = I (b, a),

then I (a, b) can also be called mutual information of two experiments a and b

H (ab) = H (a) + H a (b),

I (a, b) = H (a) + H (b) - H (ab),

therefore, k l

I (a, b) = Σ Σ P (A i B j) log 2 P (A i B j) / (P (A i) P (B j)).

Thus, we have received the final formula for the amount of information I (a, b).

3.2.1. Prove that if a and b are arbitrary experiments, then;

but) H (ab) = H (a) + H a (b);

b) H (ab) ≤ H (a) + H (b);

in) 0 ≤ H a (b) ≤ H (b);

G) I (a, b) = I (b, a);

e) I (a, b) ≤ H (a);

3.2.2. Derive the formula for I (a, b).

3.2.3. Let experiment b consist in removing one ball from an urn containing m black and n white balls, experiment a k in preliminary extraction of k balls from the same urn (without returning it back). What is the uncertainty of experiment b and the information about the experiment contained in experiments a 6,

3.2.4. Suppose that m parts are removed from lots of n parts made on a conveyor for sample control. Let us denote by a the percentage of scrap of the entire batch, and through b - the percentage of scrap in the sample. Determine I (a, b).

3.2.5. (About cities of liars and honest people). Let it be known that the inhabitants of a certain city A always tell the truth, and the inhabitants of a neighboring city B always cheat. Observer H knows that he is in one of these two cities, but does not know which one. By interrogating the counterpart, he needs to determine in which city he is located, or in which city his interlocutor lives (residents of A can go to B and back), or both together. The question is, what is the smallest number of questions that must be asked by N (the counterparty answers only "yes" or "no" to all questions).

TRANSFER OF INFORMATION

Let us return once again to the general scheme of information transmission, considering real messages as some experiments with the corresponding tables of the probability distribution in them of individual letters or combinations of letters.

In particular, if x and x "are the transmitted and distorted messages, respectively, then we define the amount of information I (x", x) - the output of the channel relative to its input as:

I (x ", x) = H (x) - H x" (x),

where H (x), H (x ") are the entropies of messages x and x", respectively.

Meaning

C = max I (x ", x)

called the bandwidth of the channel, i.e. it characterizes the maximum amount of information that can be transmitted through the channel in one clock cycle. And in fact, the channel capacity is the upper limit of the rate R of reliable information transfer, and you can approach this limit as close as you like.

Theorem 1. (about coding). For any number R less than the channel capacity C and any e> 0, there is a block transmission method with a rate not less than R and an error probability P (e) not exceeding e.

At the same time, any method of transmitting information at a rate greater than the bandwidth leads to the fact that the error probability will be greater than a certain fixed value.

Theorem 2.(inversion of the coding theorem). If the value R exceeds the capacity of the channel C, then there is a constant e 0 (depending on R and C) such that for any method of block transmission of information with a speed not less than R, the inequality

P (e) ³ e 0 .

We denote by I (a i) the amount of information contained in the symbol a i and define it as:

I (a i) = - log 2 P (a i),

where P (a i) is the probability of the symbol a i appearing in the message text itself.

If the text of the messages is written in some natural language

(and it can be considered as a natural code that detects and corrects errors), then each letter of this language has its own frequency of occurrence in the text (for example, in Russian, the letters o, e, e are much more common (P o = 0.09, P f, e = 0.07) than the letters e and f (P e = 0.003, P f = 0.002)) and therefore the uncertainty HL of a natural language is defined as m

H L = - е P (a i) log 2 P (a i),

and the redundancy C L, respectively, as

C L = 1 - H L / log 2 m,

where m is the number of natural language letters.

Obviously, 0 ≤ С L ≤ 1, therefore, with optimal coding, a part of the text can be omitted without prejudice to understanding.

So, for example, С L = 0.5 for literary of English language, and the redundancy of other languages ​​is somewhat less.

Note that the redundancy of the language is not a disadvantage, but rather an advantage, since, for example, if C L = 50%, this means that half of the distorted text can be used to restore the entire text.

3.3.1. Determine the throughput of the DSK.

3.3.2. Find I (x ", x) for DSC.

3.3.3. Determine the redundancy and uncertainty of the Russian language.

3.3.4. Determine the amount of information of the letters of the English language.

3.3.5. Prove Shannon's theorems for block codes.

3.3.6. Recover text:

but) S ?? zd ts? Li ?? m? n? ln ??? eu od ?? ri? m ?? opr ?? t? i c? steam??? ? about ?? rib? ? ? a? o? om;

b)? about? ka? ae? ka? av? n ??? aet.

Introduction. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .3

1. The alphabet of discrete devices. End fields. ... ... ... ... ... ... ... ... ... 4

1.1. Simple Galois field GF (P). ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 4

1.2. Composite Galois field GF (P n). ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 6

2. Information coding. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .nine

2.1. Basic concepts. Examples of codes. ... ... ... ... ... ... ... ... ... ... ... ... ... ... nine

2.2. Linear codes. Methods for their assignment. ... ... ... ... ... ... ... ... ... ... ... fifteen

2.3. Linear code properties. Hamming codes. ... ... ... ... ... ... ... ... ... 22

2.4. Cyclic codes. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 27

2.5. BCH codes correcting two errors. ... ... ... ... ... ... ... ... ... ... ... .32

2.6. Non-linear codes. Hadamard codes. ... ... ... ... ... ... ... ... ... ... ... ... ... ... .36

2.7. The limits of the power of codes. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 40

3. Information and uncertainty. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 44

3.1. A quantitative measure of uncertainty. ... ... ... ... ... ... ... ... ... ... .45

3.2. Conditional uncertainty. The amount of information. ... ... ... .47

3.3. Transfer of information. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .fifty

Osipyan Valery Osipovich

ELEMENTS OF THE THEORY OF INFORMATION TRANSMISSION

Editor T.V. Shilova

Technical editor I.A. Zinovskaya

Proofreader M.E. Shulepova

ЛР No. 200378 dated 01.22.97


Signed for printing 01/29/97.

Format 60´84 1/16. Paper type. No. 3.

Screen printing. CONV. print l. 2.75.

Uch.-ed. l. 2.7. Circulation 300 copies. Order no.


Kuban State University

The greater the initial uncertainty of knowledge, the more information is conveyed by the message that removes this uncertainty.

Let us give examples to illustrate this statement.

Situation 1. A new computer science teacher has been assigned to your class; to the question "Is it a man or a woman?" you were told: "Man."

Situation 2. The Dynamo and Zenit teams played at the national football championship. From the sports news on the radio, you will learn that the game ended with Zenit's victory.

Situation 3. Four candidates were nominated for the city mayor elections. After summing up the voting results, you learned that N. N. Nikitin was elected.

Question: in which of the three situations does the received message carry more information?

The uncertainty of knowledge is the number of possible answers to the question that interests you. You can also say: possible outcomes developments. Here an event - for example, the election of a mayor; the outcome is the choice, for example, of N. N. Nikitin.

In the first situation, there are 2 possible answers: man, woman; in the second situation there are 3 options: Zenit won, draw, Dynamo won; in the third situation - 4 options: 4 candidates for the post of mayor.

According to the definition given above, the message carries the greatest amount of information in the third situation, since the uncertainty of knowledge about the outcome of the event in this case was the greatest.

In the 40s of the XX century, the problem of measuring information was solved by the American scientist Claude Shannon (1916-2001), the founder of information theory. According to Shannon, information is the removed uncertainty of a person's knowledge of the outcome of an event.

In information theory, the unit of measurement of information is defined as follows.

According to this definition, the message in the first of the described situations carries 1 bit of information, since one of the two possible answers was chosen.

Therefore, the amount of information received in the second and third situations is more than one bit. But how can this amount be measured?

Let's take another example.

The student has written a computer science test and asks the teacher about the grade. The grade can be any: from 2 to 5. To which the teacher answers: "Guess the grade for two questions, the answer to which can only be" yes "or" no. " After thinking, the student asked the first question: "Is the score higher than three?" “Yes,” the teacher replied. Second question: "Is it a five?" “No,” the teacher replied. The student realized that he got a four. Whatever the score, in this way it will be guessed!

Initially, the uncertainty of knowledge (the number of possible estimates) was four. With the answer to each question, the uncertainty of knowledge was reduced by 2 times and, therefore, according to the above definition, 1 bit of information was transmitted.

Having learned the grade (one of four possible), the student received 2 bits of information.

Let's look at one more particular example, and then deduce the general rule.

You are traveling on an electric train, in which there are 8 cars, and a friend meets you at the station. A friend called you on mobile phone and asked which carriage you were traveling in. You offer to guess the carriage number by asking the least number of questions, the answers to which can only be “yes” or “no”.

After a little thought, the comrade began to ask:

Is the car number more than four?

Is the car number more than six?

Is this the sixth carriage?

Well, now everything is clear! You are going in the fifth carriage!

Schematically, the search for the carriage number looks like this:

Each answer halved the uncertainty of knowledge. A total of three questions were asked. This means that 3 bits of information have been collected in total. That is, the message that you are traveling in the fifth carriage carries 3 bits of information.

The way of solving the problem, applied in the examples with grades and cars, is called the half-division method: answering each question reduces the uncertainty of knowledge before answering that question by half. Each such answer carries 1 bit of information.

Note that the solution of such problems by the half-division method is the most rational. In this way, you can always guess, for example, any of the eight options in 3 questions. If the search was carried out by sequential search: "Are you going in the first carriage?" "No", "In the second carriage?" “No,” etc., then you could find out about the fifth car after five questions, and about the eighth - after eight.

The "main formula" of informatics

Let us formulate one very important condition related to the considered examples. In all situations, it is assumed that all possible outcomes of the event are equally probable. It is equally likely that the teacher can be male or female; any outcome of a football match is equally probable, the choice of one of the four candidates for mayor of the city is equally probable. The same applies to examples with grades and wagons.

Then our results are described by the following formulations:

  • the message about one of two equally probable outcomes of some event carries 1 bit of information;
  • the message about one of the four equally probable outcomes of some event carries 2 bits of information;
  • the message about one of the eight equally probable outcomes of an event carries 3 bits of information.

Let us denote by the letter N the number of possible outcomes of an event, or, as we also called it, the uncertainty of knowledge. The letter i will denote the amount of information in a message about one of the N results.

In the example with the teacher: N = 2, i = 1 bit;

in the example with estimates: N = 4, i = 2 bits;

in the example with cars: N = 8, i = 3 bits.

It is easy to see that the relationship between these quantities is expressed by the following formula:

2 i = N. Indeed: 2 1 = 2; 2 2 = 4; 2 3 = 8.

You are already familiar with the resulting formula from the 7th grade computer science course and will meet with it more than once. The significance of this formula is so great that we called it the main formula of computer science. If the value of N is known, a i is unknown, then this formula becomes the equation for determining i. In mathematics, such an equation is called an exponential equation.

Example. Let's go back to the above example with carriages. Let the train not have 8, but 16 cars. To answer the question, how much information is contained in the message about the number of the desired carriage, you need to solve the equation:

Since 16 = 2 4, then i = 4 bits.

Example. The cinema has 16 rows, each row has 32 seats. How much information does the message that you have bought a ticket for the 12th row, 10th place?

Solution to the problem: there are only 16 32 = 512 seats in the cinema. The message about the purchased ticket unambiguously determines the choice of one of these seats. From equation 2 i = 512 = 29 we get: i = 9 bits.

But the same problem can be solved differently. The message about the number of the row carries 4 bits of information, since 2 4 = 16. The message about the number of the place carries 5 bits of information, since 2 5 = 32. In general, the message about the row and the place carries: 4 + 5 = 9 bits of information.

This example illustrates the fulfillment of the law of additivity of the amount of information (addition rule): the amount of information in a message simultaneously about several results of independent events is equal to the sum of the amounts of information about each event separately.

Let's make one important note. We have already met the formula 2 i = N when discussing the alphabetical approach to measuring information (see § 3). In this case, N was considered as the cardinality of the alphabet, and i - as the information weight of each symbol of the alphabet. If we assume that all symbols of the alphabet appear in the text with the same frequency, that is, equally probable, then the informational weight of the symbol i is identical to the amount of information in the message about the appearance of any symbol in the text. In this case, N is the uncertainty of knowledge about which character of the alphabet should be in a given position of the text. This fact demonstrates the connection between the alphabetic and meaningful approaches to measuring information.

Hartley's formula

If the value of N is equal to an integer power of two (4, 8, 16, 32, 64, etc.), then the exponential equation is easy to solve in your head, since i will be an integer. And what is the amount of information in the message about the result of the Dynamo - Zenit match equal to? In this situation, N = 3. One can guess that the solution to the equation

will be a fractional number between 1 and 2, since 2 1 = 2< 3, а 2 2 = 4 >3. What is the best way to find out this number?

In mathematics, there is a function that solves an exponential equation. This function is called the logarithm, and the solution to our equation is written as follows:

It reads like this: "logarithm of N to base 2". The idea is very simple: the logarithm base 2 of N is the power to which you need to raise 2 to get N. For example, the calculation of the values ​​you already know can be represented as follows:

log 2 2 = 1, log 2 4 = 2, log 2 8 = 3.

Logarithm values ​​are found using special logarithmic tables. You can also use an engineering calculator or spreadsheet. Let us determine the amount of information obtained from a message about one outcome of an event out of three equally probable, using a spreadsheet. Figure 1.4 shows two modes of a spreadsheet: formulas display mode and values ​​display mode.

Rice. 1.4. Determining the amount of information in spreadsheets using the logarithm function

In a Microsoft Excel spreadsheet processor, the logarithm function has the following form: LOG (apryment; base). The argument is the value of N is in cell A2, and the base of the logarithm is 2. As a result, we get with an accuracy of nine decimal places: i = log 2 3 = 1.584962501 (bits).

The formula for measuring the amount of information: i = log 2 N was proposed by the American scientist Ralph Hartley (1888-1970) - one of the founders of information theory.

This example showed that the amount of information determined using the meaningful approach can be a fractional value, while the information amount calculated using the alphabetical approach can only have an integer value.

System of basic concepts

Questions and tasks

  1. What is the uncertainty of knowledge about the outcome of an event?
  2. How is the unit of measure for the amount of information determined within the meaningful approach?
  3. Think of several situations in which a message carries 1 bit of information.
  4. In what cases and by what formula can you calculate the amount of information contained in a message using a meaningful approach?
  5. How many bits of information carry the message that the “Queen of Spades” was taken from the deck of 32 cards?
  6. When guessing by the method of half division of an integer from the range from 1 to N, 1 byte of information was obtained. What is N?
  7. There are two lotteries: "4 out of 32" and "5 out of 64". Which lottery results message carries more information?
  8. Using Hartley's formula and spreadsheets, determine the amount of information in messages about equally probable events:

      a) the number 3 dropped out on the hexagonal dice;

      b) next year, repairs at the school will begin in February;

      c) I bought a pool pass for Wednesday;

      d) Denis Skvortsov was appointed to be on duty in the school cafeteria out of 30 students in the class.

  9. Using the law of additivity of the amount of information, solve the problem of a ticket to a cinema with the following additional condition: there are 4 halls in the cinema. The ticket contains the hall number, row number and seat number. How much information is on the ticket?

Randomness and uncertainty

Combinatorics is a branch of mathematics that studies combinations, permutations, placement, and enumeration of elements of a set.

What is uncertainty?

Uncertainty is a lack or lack of information about something.

Randomness is a category for denoting connections between such phenomena. the real world, which in some conditions can be realized, and in others - not. The randomness of an event lies in the fact that the implementation of a particular outcome has a certain degree of uncertainty.

Randomness manifests itself in almost all areas of human activity.

An event is a phenomenon that occurs as a result of actions. Events are usually designated in capital Latin letters: A, B, C, etc.

A random event is an event that may or may not happen.

The sum of events A B is called event C, which consists in the appearance of event A or event B or both events at once:

The product of events A and B is called event C, which consists in the joint appearance of events A and B (their combination):

The probability of an event is a measure of the objective possibility of an event occurring.

Event A is called independent of event B if the probability of event A does not depend on whether event B has occurred or not. Otherwise, event A is called dependent on event B.

Events that cannot occur simultaneously are called inconsistent: the onset of one excludes the appearance of the other.

Pseudo-random numbers are numbers that are used in programming to simulate random numbers.

A pseudo-random number generator is an algorithm that generates a sequence of numbers whose elements are almost independent of each other and obey a certain distribution.

A pseudo-random sequence generator is an algorithm for constructing a sequence of pseudo-random numbers due to some external source of random values ​​(for example, noise). Knowing the i-th number in the sequence, using the formulas you can determine its (r + 1) th element.

Algorithms for generating pseudo-random sequences are periodic.

Examples. 1. Determine the probability of face occurrence dice with the number 6.

In this case, the number of total outcomes is 6, since the dice has 6 faces. However, there is only one favorable outcome, since the cube has only one face with the number 6, therefore

Example 2. Generate a random list of numbers from 1 to N.

1- th way

If the item position contains "O", the item can be positioned.

If the position is not "O", then a random number is generated for the element.

2- th way

We assign zero values ​​to the elements of the list.

We place the element in the sequence.

If the position is not "0", then we check all subsequent ones until we find "0".

3- th way

We assign zero values ​​to the elements of the list.

We place the element in the sequence.

If the item position contains "0", the item can be positioned.

Founder of information theory Claude Shannon identified information, how removed uncertainty... More precisely, obtaining information is a necessary condition for removing uncertainty. Uncertainty arises in a situation of choice. The task that is solved in the course of removing the uncertainty is a decrease in the number of options under consideration (decrease in diversity), and, as a result, the choice of one option corresponding to the situation from among the possible. Removing uncertainty enables you to make informed decisions and take action. This is the guiding role of information.

Imagine walking into a store and asking them to sell you some chewing gum. A saleswoman with, say, 16 brands of chewing gum is in a state of uncertainty. She cannot fulfill your request without additional information. If you specified, say, “Orbit,” and the saleswoman is now considering only 8 of the 16 initial options, you have halved her uncertainty (looking ahead, say that halving the uncertainty corresponds to receiving 1 bit of information). If you, without further ado, simply pointed your finger at the window - “this one!”, Then the uncertainty was completely removed. Again, looking ahead, let's say that with this gesture in this example, you told the saleswoman 4 bits of information.

Situation maximum uncertainty suggests several equiprobable alternatives (options), i.e. neither option is preferred. Moreover, the more equally probable options observed, the greater the uncertainty, the more difficult it is to make an unambiguous choice and the more information is required for that, get it. For N variants, this situation is described by the following probability distribution: (1 / N, 1 / N, ... 1 / N).

The minimum uncertainty is 0, i.e. this situation complete certainty, which means that the choice has been made and all the necessary information has been received. The probability distribution for a situation of complete certainty looks like this: {1, 0, …0} .

A quantity characterizing the amount of uncertainty in information theory is denoted by the symbol H and has a name entropy, more precisely information entropy.

Entropy (H)a measure of uncertainty, expressed in bits. Entropy can also be viewed as measure of uniformity of distribution random variable.

Figure 8. shows the behavior of entropy for the case of two alternatives, with a change in the ratio of their probabilities (p, (1-p)).

The entropy reaches its maximum value in this case when both probabilities are equal to each other and equal to ½, the zero value of entropy corresponds to the cases (p 0 = 0, p 1 = 1) and (p 0 = 1, p 1 = 0).

Amount of information I and entropy H characterize the same situation, but from qualitatively opposite sides. I is the amount of information required to remove the uncertainty H. As defined by Leon Brillouin information is negative entropy (negentropy).

When the uncertainty is completely removed, the amount of information received I equal to pre-existing uncertainty H.

With partial removal of uncertainty, the amount of information received and the remaining unrevealed uncertainty add up to the original uncertainty. H t + I t = H.

For this reason, the formulas that will be presented below for calculating the entropy H are also formulas for calculating the amount of information I, i.e. when it comes to complete removal of uncertainty, H in them can be replaced by I.

8 . A meaningful approach to measuring information

In a meaningful approachthe amount of information contained in a message is determined by the amount of knowledge that this message carries to the person who receives it.

Let's remember that from a “human” point of viewinformation is the knowledge that we receive from the outside world. The amount of information contained in a message should be the more, the more it adds to our knowledge.

You already know what the unit of measurement of information is taken1 bit.

1 bit - the minimum unit of measurement of the amount of information.

The problem of measuring information has been investigated in information theory, the founder of which -Claude Shannon .

In information theory, a bit is defined as follows:

A message that halves the uncertainty of knowledge carries 1 bit information.

Let us explain what the uncertainty of knowledge is with examples.

Let's say you are tossing a coin, wondering which comes up: heads or tails. There are only two possible outcomes of a coin toss. Moreover, none of these results are superior to the other. In this case, they say that theyequiprobable .

In the case of a coin before tossing it, the uncertainty of knowing the outcome is two.

A dice with six sides can fall on any of them with equal probability. This means that the uncertainty in knowing the result of the die roll is six.

Another example: before the race, skiers determine their serial numbers at the start by drawing lots. Let us assume that there is100 participants of the competition, then the uncertainty of the athlete's knowledge of his number before the draw is100 .

Therefore, we can say this:

The uncertainty of knowledge The result of some event (throwing a coin or dice, drawing lots, etc.) is the number of possible results.

Let's go back to the coin example. After you toss a coin and look at it, you get a visual message that, for example, heads have fallen. One of two possible outcomes was determined. The uncertainty of knowledge has halved: there were two options, only one remained. So, having learned the result of tossing a coin, you received 1 bit of information.

The message about one of two equally probable results of some event carries1 bitinformation.

Let some message contain information that one of theNequally probable events.

Then the amount of informationicontained in the message that one of theNequiprobable events can be determined fromHartley's formulas:

N=2 i.

This formula isexponential equationrelatively unknowni. by reason2 .

IfNis equal to an integer power of two (2,4,8,16 etc.), then such an equation can be solved "in the mind."