However, this characteristic alone is not enough to study a random variable. Let's imagine two shooters shooting at a target. One shoots accurately and hits close to the center, while the other... is just having fun and doesn’t even aim. But what's funny is that he average the result will be exactly the same as the first shooter! This situation is conventionally illustrated by the following random variables:

The “sniper” mathematical expectation is equal to , however, for the “interesting person”: – it is also zero!

Thus, there is a need to quantify how far scattered bullets (random variable values) relative to the center of the target (mathematical expectation). well and scattering translated from Latin is no other way than dispersion .

Let's see how this numerical characteristic is determined using one of the examples from the 1st part of the lesson:

There we found a disappointing mathematical expectation of this game, and now we have to calculate its variance, which denoted by through .

Let's find out how far the wins/losses are “scattered” relative to the average value. Obviously, for this we need to calculate differences between random variable values and her mathematical expectation:

–5 – (–0,5) = –4,5
2,5 – (–0,5) = 3
10 – (–0,5) = 10,5

Now it seems that you need to sum up the results, but this way is not suitable - for the reason that fluctuations to the left will cancel each other out with fluctuations to the right. So, for example, an “amateur” shooter (example above) the differences will be , and when added they will give zero, so we will not get any estimate of the dispersion of his shooting.

To get around this problem you can consider modules differences, but for technical reasons the approach has taken root when they are squared. It is more convenient to formulate the solution in a table:

And here it begs to calculate weighted average the value of the squared deviations. What is it? It's theirs expected value, which is a measure of scattering:

definition variances. From the definition it is immediately clear that variance cannot be negative– take note for practice!

Let's remember how to find the expected value. Multiply the squared differences by the corresponding probabilities (Table continuation):
– figuratively speaking, this is “traction force”,
and summarize the results:

Don't you think that compared to the winnings, the result turned out to be too big? That's right - we squared it, and to return to the dimension of our game, we need to take the square root. This quantity is called standard deviation and is denoted by the Greek letter “sigma”:

This value is sometimes called standard deviation .

What is its meaning? If we deviate from the mathematical expectation to the left and right by the standard deviation:

– then the most probable values ​​of the random variable will be “concentrated” on this interval. What we actually observe:

However, it so happens that when analyzing scattering one almost always operates with the concept of dispersion. Let's figure out what it means in relation to games. If in the case of arrows we are talking about the “accuracy” of hits relative to the center of the target, then here dispersion characterizes two things:

Firstly, it is obvious that as the bets increase, the dispersion also increases. So, for example, if we increase by 10 times, then the mathematical expectation will increase by 10 times, and the variance will increase by 100 times (since this is a quadratic quantity). But note that the rules of the game themselves have not changed! Only the rates have changed, roughly speaking, before we bet 10 rubles, now it’s 100.

The second, more interesting point is that variance characterizes the style of play. Mentally fix the game bets at some certain level, and let's see what's what:

A low variance game is a cautious game. The player tends to choose the most reliable schemes, where he does not lose/win too much at one time. For example, the red/black system in roulette (see example 4 of the article Random variables) .

High variance game. She is often called dispersive game. This is an adventurous or aggressive style of play, where the player chooses “adrenaline” schemes. Let's at least remember "Martingale", in which the amounts at stake are orders of magnitude greater than the “quiet” game of the previous point.

The situation in poker is indicative: there are so-called tight players who tend to be cautious and “shaky” over their gaming funds (bankroll). Not surprisingly, their bankroll does not fluctuate significantly (low variance). On the contrary, if a player has high variance, then he is an aggressor. He often takes risks, makes large bets and can either break a huge bank or lose to smithereens.

The same thing happens in Forex, and so on - there are plenty of examples.

Moreover, in all cases it does not matter whether the game is played for pennies or thousands of dollars. Every level has its low- and high-dispersion players. Well, as we remember, the average winning is “responsible” expected value.

You probably noticed that finding variance is a long and painstaking process. But mathematics is generous:

Formula for finding variance

This formula is derived directly from the definition of variance, and we immediately put it into use. I’ll copy the sign with our game above:

and the found mathematical expectation.

Let's calculate the variance in the second way. First, let's find the mathematical expectation - the square of the random variable. By determination of mathematical expectation:

In this case:

Thus, according to the formula:

As they say, feel the difference. And in practice, of course, it is better to use the formula (unless the condition requires otherwise).

We master the technique of solving and designing:

Example 6

Find its mathematical expectation, variance and standard deviation.

This task is found everywhere, and, as a rule, goes without meaningful meaning.
You can imagine several light bulbs with numbers that light up in a madhouse with certain probabilities :)

Solution: It is convenient to summarize the basic calculations in a table. First, we write the initial data in the top two lines. Then we calculate the products, then and finally the sums in the right column:

Actually, almost everything is ready. The third line shows a ready-made mathematical expectation: .

We calculate the variance using the formula:

And finally, the standard deviation:
– Personally, I usually round to 2 decimal places.

All calculations can be carried out on a calculator, or even better - in Excel:

It's hard to go wrong here :)

Answer:

Those who wish can simplify their life even more and take advantage of my calculator (demo), which will not only instantly solve this problem, but also build thematic graphics (we'll get there soon). The program can be download from the library– if you have downloaded at least one educational material, or receive another way. Thanks for supporting the project!

A couple of tasks to solve on your own:

Example 7

Calculate the variance of the random variable in the previous example by definition.

And a similar example:

Example 8

A discrete random variable is specified by its distribution law:

Yes, random variable values ​​can be quite large (example from real work), and here, if possible, use Excel. As, by the way, in Example 7 - it’s faster, more reliable and more enjoyable.

Solutions and answers at the bottom of the page.

To conclude the 2nd part of the lesson, we will look at another typical problem, one might even say a small puzzle:

Example 9

A discrete random variable can take only two values: and , and . The probability, mathematical expectation and variance are known.

Solution: Let's start with an unknown probability. Since a random variable can take only two values, the sum of the probabilities of the corresponding events is:

and since , then .

All that remains is to find..., it's easy to say :) But oh well, here we go. By definition of mathematical expectation:
– substitute known quantities:

– and nothing more can be squeezed out of this equation, except that you can rewrite it in the usual direction:

or:

I think you can guess the next steps. Let's compose and solve the system:

Decimals are, of course, a complete disgrace; multiply both equations by 10:

and divide by 2:

That's better. From the 1st equation we express:
(this is the easier way)– substitute into the 2nd equation:


We are building squared and make simplifications:

Multiply by:

The result was quadratic equation, we find its discriminant:
- Great!

and we get two solutions:

1) if , That ;

2) if , That .

The condition is satisfied by the first pair of values. With a high probability everything is correct, but, nevertheless, let’s write down the distribution law:

and perform a check, namely, find the expectation:

If the population is divided into groups according to the characteristic being studied, then the following types of variance can be calculated for this population: total, group (within-group), average of group (average of within-group), intergroup.

Initially, it calculates the coefficient of determination, which shows what part of the total variation of the trait being studied is intergroup variation, i.e. due to the grouping characteristic:

The empirical correlation relationship characterizes the closeness of the connection between grouping (factorial) and performance characteristics.

The empirical correlation ratio can take values ​​from 0 to 1.

To assess the closeness of the connection based on the empirical correlation ratio, you can use the Chaddock relations:

Example 4. The following data is available on the performance of work by design and survey organizations of various forms of ownership:

Define:

1) total variance;

2) group variances;

3) the average of the group variances;

4) intergroup variance;

5) total variance based on the rule for adding variances;


6) coefficient of determination and empirical correlation ratio.

Draw conclusions.

Solution:

1. Let us determine the average volume of work performed by enterprises of two forms of ownership:

Let's calculate the total variance:

2. Determine group averages:

million rubles;

million rubles

Group variances:

;

3. Calculate the average of the group variances:

4. Let's determine the intergroup variance:

5. Calculate the total variance based on the rule for adding variances:

6. Let's determine the coefficient of determination:

.

Thus, the volume of work performed by design and survey organizations depends by 22% on the form of ownership of enterprises.

The empirical correlation ratio is calculated using the formula

.

The value of the calculated indicator indicates that the dependence of the volume of work on the form of ownership of the enterprise is small.

Example 5. As a result of a survey of the technological discipline of production areas, the following data were obtained:

Determine the coefficient of determination

Let's calculate inMSEXCELsample variance and standard deviation. We will also calculate the variance of a random variable if its distribution is known.

Let's first consider dispersion, then standard deviation.

Sample variance

Sample variance (sample variance,samplevariance) characterizes the spread of values ​​in the array relative to .

All 3 formulas are mathematically equivalent.

From the first formula it is clear that sample variance is the sum of the squared deviations of each value in the array from average, divided by sample size minus 1.

variances samples the DISP() function is used, English. the name VAR, i.e. VARiance. From version MS EXCEL 2010, it is recommended to use its analogue DISP.V(), English. the name VARS, i.e. Sample VARiance. In addition, starting from the version of MS EXCEL 2010, there is a function DISP.Г(), English. the name VARP, i.e. Population VARiance, which calculates dispersion For population. The whole difference comes down to the denominator: instead of n-1 like DISP.V(), DISP.G() has just n in the denominator. Before MS EXCEL 2010, the VAR() function was used to calculate the variance of the population.

Sample variance
=QUADROTCL(Sample)/(COUNT(Sample)-1)
=(SUM(Sample)-COUNT(Sample)*AVERAGE(Sample)^2)/ (COUNT(Sample)-1)– usual formula
=SUM((Sample -AVERAGE(Sample))^2)/ (COUNT(Sample)-1) –

Sample variance is equal to 0, only if all values ​​are equal to each other and, accordingly, equal average value. Usually, the larger the value variances, the greater the spread of values ​​in the array.

Sample variance is a point estimate variances distribution of the random variable from which it was made sample. About construction confidence intervals when assessing variances can be read in the article.

Variance of a random variable

To calculate dispersion random variable, you need to know it.

For variances random variable X is often denoted Var(X). Dispersion equal to the square of the deviation from the mean E(X): Var(X)=E[(X-E(X)) 2 ]

dispersion calculated by the formula:

where x i is the value that a random variable can take, and μ is the average value (), p(x) is the probability that the random variable will take the value x.

If a random variable has , then dispersion calculated by the formula:

Dimension variances corresponds to the square of the unit of measurement of the original values. For example, if the values ​​in the sample represent part weight measurements (in kg), then the variance dimension would be kg 2 . This can be difficult to interpret, so to characterize the spread of values, a value equal to the square root of variancesstandard deviation.

Some properties variances:

Var(X+a)=Var(X), where X is a random variable and a is a constant.

Var(aХ)=a 2 Var(X)

Var(X)=E[(X-E(X)) 2 ]=E=E(X 2)-E(2*X*E(X))+(E(X)) 2 =E(X 2)- 2*E(X)*E(X)+(E(X)) 2 =E(X 2)-(E(X)) 2

This dispersion property is used in article about linear regression.

Var(X+Y)=Var(X) + Var(Y) + 2*Cov(X;Y), where X and Y are random variables, Cov(X;Y) is the covariance of these random variables.

If random variables are independent, then they covariance is equal to 0, and therefore Var(X+Y)=Var(X)+Var(Y). This property of dispersion is used in derivation.

Let us show that for independent quantities Var(X-Y)=Var(X+Y). Indeed, Var(X-Y)= Var(X-Y)= Var(X+(-Y))= Var(X)+Var(-Y)= Var(X)+Var(-Y)= Var( X)+(-1) 2 Var(Y)= Var(X)+Var(Y)= Var(X+Y). This dispersion property is used to construct .

Sample standard deviation

Sample standard deviation is a measure of how widely scattered the values ​​in a sample are relative to their .

A-priory, standard deviation equal to the square root of variances:

Standard deviation does not take into account the magnitude of the values ​​in sample, but only the degree of dispersion of values ​​around them average. To illustrate this, let's give an example.

Let's calculate the standard deviation for 2 samples: (1; 5; 9) and (1001; 1005; 1009). In both cases, s=4. It is obvious that the ratio of the standard deviation to the array values ​​differs significantly between samples. For such cases it is used The coefficient of variation(Coefficient of Variation, CV) - ratio Standard Deviation to the average arithmetic, expressed as a percentage.

In MS EXCEL 2007 and earlier versions for calculation Sample standard deviation the function =STDEVAL() is used, English. name STDEV, i.e. STandard DEViation. From the version of MS EXCEL 2010, it is recommended to use its analogue =STDEV.B() , English. name STDEV.S, i.e. Sample STandard DEViation.

In addition, starting from the version of MS EXCEL 2010, there is a function STANDARDEV.G(), English. name STDEV.P, i.e. Population STandard DEViation, which calculates standard deviation For population. The whole difference comes down to the denominator: instead of n-1 as in STANDARDEV.V(), STANDARDEVAL.G() has just n in the denominator.

Standard deviation can also be calculated directly using the formulas below (see example file)
=ROOT(QUADROTCL(Sample)/(COUNT(Sample)-1))
=ROOT((SUM(Sample)-COUNT(Sample)*AVERAGE(Sample)^2)/(COUNT(Sample)-1))

Other measures of scatter

The SQUADROTCL() function calculates with a sum of squared deviations of values ​​from their average. This function will return the same result as the formula =DISP.G( Sample)*CHECK( Sample) , Where Sample- a reference to a range containing an array of sample values ​​(). Calculations in the QUADROCL() function are made according to the formula:

The SROTCL() function is also a measure of the spread of a data set. The function SROTCL() calculates the average of the absolute values ​​of deviations of values ​​from average. This function will return the same result as the formula =SUMPRODUCT(ABS(Sample-AVERAGE(Sample)))/COUNT(Sample), Where Sample- a link to a range containing an array of sample values.

Calculations in the function SROTCL () are made according to the formula:

.

Conversely, if is a non-negative a.e. function such that , then there is an absolutely continuous probability measure on such that it is its density.

    Replacing the measure in the Lebesgue integral:

,

where is any Borel function that is integrable with respect to the probability measure.

Dispersion, types and properties of dispersion The concept of dispersion

Dispersion in statistics is found as the standard deviation of the individual values ​​of the characteristic squared from the arithmetic mean. Depending on the initial data, it is determined using the simple and weighted variance formulas:

1. Simple variance(for ungrouped data) is calculated using the formula:

2. Weighted variance (for variation series):

where n is frequency (repeatability of factor X)

An example of finding variance

This page describes a standard example of finding variance, you can also look at other problems for finding it

Example 1. Determination of group, group average, intergroup and total variance

Example 2. Finding the variance and coefficient of variation in a grouping table

Example 3. Finding variance in a discrete series

Example 4. The following data is available for a group of 20 correspondence students. It is necessary to construct an interval series of the distribution of the characteristic, calculate the average value of the characteristic and study its dispersion

Let's build an interval grouping. Let's determine the range of the interval using the formula:

where X max is the maximum value of the grouping characteristic; X min – minimum value of the grouping characteristic; n – number of intervals:

We accept n=5. The step is: h = (192 - 159)/ 5 = 6.6

Let's create an interval grouping

For further calculations, we will build an auxiliary table:

X"i – the middle of the interval. (for example, the middle of the interval 159 – 165.6 = 162.3)

We determine the average height of students using the weighted arithmetic average formula:

Let's determine the variance using the formula:

The formula can be transformed like this:

From this formula it follows that variance is equal to the difference between the average of the squares of the options and the square and the average.

Dispersion in variation series with equal intervals using the method of moments can be calculated in the following way using the second property of dispersion (dividing all options by the value of the interval). Determining variance, calculated using the method of moments, using the following formula is less laborious:

where i is the value of the interval; A is a conventional zero, for which it is convenient to use the middle of the interval with the highest frequency; m1 is the square of the first order moment; m2 - moment of second order

Alternative trait variance (if in a statistical population a characteristic changes in such a way that there are only two mutually exclusive options, then such variability is called alternative) can be calculated using the formula:

Substituting q = 1- p into this dispersion formula, we get:

Types of variance

Total variance measures the variation of a characteristic across the entire population as a whole under the influence of all factors that cause this variation. It is equal to the mean square of the deviations of individual values ​​of a characteristic x from the overall mean value of x and can be defined as simple variance or weighted variance.

Within-group variance characterizes random variation, i.e. part of the variation that is due to the influence of unaccounted factors and does not depend on the factor-attribute that forms the basis of the group. Such dispersion is equal to the mean square of the deviations of individual values ​​of the attribute within group X from the arithmetic mean of the group and can be calculated as simple variance or as weighted variance.

Thus, within-group variance measures variation of a trait within a group and is determined by the formula:

where xi is the group average; ni is the number of units in the group.

For example, intragroup variances that need to be determined in the task of studying the influence of workers’ qualifications on the level of labor productivity in a workshop show variations in output in each group caused by all possible factors (technical condition of equipment, availability of tools and materials, age of workers, labor intensity, etc. .), except for differences in qualification category (within a group all workers have the same qualifications).

The average of within-group variances reflects random variation, that is, that part of the variation that occurred under the influence of all other factors, with the exception of the grouping factor. It is calculated using the formula:

Intergroup variance characterizes the systematic variation of the resulting characteristic, which is due to the influence of the factor-attribute that forms the basis of the group. It is equal to the mean square of the deviations of the group means from the overall mean. Intergroup variance is calculated using the formula:

Dispersionrandom variable- measure of the spread of a given random variable, that is, her deviations from mathematical expectation. In statistics, the notation (sigma squared) is often used to denote dispersion. The square root of the variance equal to is called standard deviation or standard spread. The standard deviation is measured in the same units as the random variable itself, and the variance is measured in the squares of that unit.

Although it is very convenient to use only one value (such as the mean or mode and median) to estimate the entire sample, this approach can easily lead to incorrect conclusions. The reason for this situation lies not in the value itself, but in the fact that one value does not in any way reflect the spread of data values.

For example, in the sample:

the average value is 5.

However, in the sample itself there is not a single element with a value of 5. You may need to know the degree of closeness of each element in the sample to its mean value. Or in other words, you will need to know the variance of the values. Knowing the degree of change in the data, you can better interpret average value, median And fashion. The degree to which sample values ​​change is determined by calculating their variance and standard deviation.



The variance and the square root of the variance, called the standard deviation, characterize the average deviation from the sample mean. Among these two quantities, the most important is standard deviation. This value can be thought of as the average distance that elements are from the middle element of the sample.

Variance is difficult to interpret meaningfully. However, the square root of this value is the standard deviation and can be easily interpreted.

Standard deviation is calculated by first determining the variance and then taking the square root of the variance.

For example, for the data array shown in the figure, the following values ​​will be obtained:

Picture 1

Here the average value of the squared differences is 717.43. To get the standard deviation, all that remains is to take the square root of this number.

The result will be approximately 26.78.

Remember that standard deviation is interpreted as the average distance that items are from the sample mean.

The standard deviation measures how well the mean describes the entire sample.

Let's say you are the head of a PC assembly production department. The quarterly report states that production for the last quarter was 2,500 PCs. Is this good or bad? You asked (or there is already this column in the report) to display the standard deviation for this data in the report. The standard deviation figure, for example, is 2000. It becomes clear to you, as the head of the department, that the production line requires better management (too large deviations in the number of PCs assembled).

Recall that when the standard deviation is large, the data are widely scattered around the mean, and when the standard deviation is small, they cluster close to the mean.

The four statistical functions VAR(), VAR(), STDEV() and STDEV() are designed to calculate the variance and standard deviation of numbers in a range of cells. Before you can calculate the variance and standard deviation of a set of data, you need to determine whether the data represents a population or a sample of a population. In the case of a sample from a general population, you should use the functions VAR() and STDEV(), and in the case of a general population, the functions VAR() and STDEV():

Population Function

DISPR()

STANDOTLONP()
Sample

DISP()

STDEV()

Dispersion (as well as standard deviation), as we noted, indicates the extent to which the values ​​included in the data set are scattered around the arithmetic mean.

A small value of variance or standard deviation indicates that all data is concentrated around the arithmetic mean, and a large value of these values ​​indicates that the data is scattered over a wide range of values.

Dispersion is quite difficult to interpret meaningfully (what does a small value mean, a large value?). Performance Tasks 3 will allow you to visually, on a graph, show the meaning of the variance for a data set.

Tasks

· Exercise 1.

· 2.1. Give the concepts: dispersion and standard deviation; their symbolic designation for statistical data processing.

· 2.2. Complete the worksheet in accordance with Figure 1 and make the necessary calculations.

· 2.3. Give the basic formulas used in calculations

· 2.4. Explain all designations ( , , )

· 2.5. Explain the practical meaning of the concepts of dispersion and standard deviation.

Task 2.

1.1. Give the concepts: general population and sample; mathematical expectation and their arithmetic mean symbolic designation for statistical data processing.

1.2. In accordance with Figure 2, prepare a worksheet and make calculations.

1.3. Provide the basic formulas used in the calculations (for the general population and sample).

Figure 2

1.4. Explain why it is possible to obtain such arithmetic mean values ​​in samples as 46.43 and 48.78 (see file Appendix). Draw conclusions.

Task 3.

There are two samples with different sets of data, but the average for them will be the same:

Figure 3

3.1. Complete the worksheet in accordance with Figure 3 and make the necessary calculations.

3.2. Give the basic calculation formulas.

3.3. Construct graphs in accordance with Figures 4, 5.

3.4. Explain the obtained dependencies.

3.5. Carry out similar calculations for the data of two samples.

Original sample 11119999

Select the values ​​of the second sample so that the arithmetic mean for the second sample is the same, for example:

Select the values ​​for the second sample yourself. Arrange calculations and graphs similar to Figures 3, 4, 5. Show the basic formulas used in the calculations.

Draw appropriate conclusions.

Complete all tasks in the form of a report with all the necessary drawings, graphs, formulas and brief explanations.

Note: the construction of graphs must be explained with drawings and brief explanations.



This article is also available in the following languages: Thai

  • Next

    THANK YOU so much for the very useful information in the article. Everything is presented very clearly. It feels like a lot of work has been done to analyze the operation of the eBay store

    • Thank you and other regular readers of my blog. Without you, I would not be motivated enough to dedicate much time to maintaining this site. My brain is structured this way: I like to dig deep, systematize scattered data, try things that no one has done before or looked at from this angle. It’s a pity that our compatriots have no time for shopping on eBay because of the crisis in Russia. They buy from Aliexpress from China, since goods there are much cheaper (often at the expense of quality). But online auctions eBay, Amazon, ETSY will easily give the Chinese a head start in the range of branded items, vintage items, handmade items and various ethnic goods.

      • Next

        What is valuable in your articles is your personal attitude and analysis of the topic. Don't give up this blog, I come here often. There should be a lot of us like that. Email me I recently received an email with an offer that they would teach me how to trade on Amazon and eBay.

  • And I remembered your detailed articles about these trades. area
    I re-read everything again and concluded that the courses are a scam. I haven't bought anything on eBay yet. I am not from Russia, but from Kazakhstan (Almaty). But we also don’t need any extra expenses yet.