An equivalent approach to interpreting test results would be to assume that the null hypothesis is true, we can calculate how large probability get t- a criterion equal to or greater than the real value that we calculated from the available sample data. If this probability turns out to be less than a previously accepted significance level (for example, P< 0.05), мы вправе отклонить проверяемую нулевую гипотезу. Именно такой подход сегодня используется чаще всего: исследователи приводят в своих работах P-значение, которое легко рассчитывается при помощи статистических программ. Рассмотрим, как это можно сделать в системе R.

Suppose we have data on the daily energy intake from food (kJ/day) for 11 women (example taken from the book Altman D. G. (1981) Practical Statistics for Medical Research, Chapman & Hall, London):


The average for these 11 observations is:


Question: Is this sample average different from the established norm of 7725 kJ/day? The difference between our sample value and this standard is quite significant: 7725 - 6753.6 = 971.4. But how big is this difference statistically? A single sample will help answer this question. t-test. Like other options t-test, a one-sample t test is performed in R using the t.test() function:


Question: Are these averages statistically different? Let's check the hypothesis that there is no difference using t-test:

But in such cases, how can we evaluate the presence of an effect from an intervention statistically? In general, the Student's t-test can be represented as

​ Student's t-test is a general name for a class of methods for statistical testing of hypotheses (statistical tests) based on the Student distribution. The most common uses of the t-test involve testing the equality of means in two samples.

1. History of the development of the t-test

This criterion was developed William Gossett to assess the quality of beer in the Guinness company. Due to obligations to the company regarding non-disclosure of trade secrets, Gosset's article was published in 1908 in the journal Biometrics under the pseudonym "Student".

2. What is the Student's t-test used for?

Student's t test is used to determine the statistical significance of differences in means. Can be used both in cases of comparison of independent samples ( for example, groups of diabetics and healthy groups), and when comparing related populations ( for example, average heart rate in the same patients before and after taking an antiarrhythmic drug).

3. In what cases can the Student’s t-test be used?

To apply the Student t-test, it is necessary that the original data have normal distribution. In the case of applying a two-sample criterion for independent samples, it is also necessary to satisfy the condition equality (homoscedasticity) of variances.

If these conditions are not met, similar methods should be used when comparing sample means. nonparametric statistics, among which the most famous are Mann-Whitney U test(as a two-sample test for independent samples), and sign criterion And Wilcoxon test(used in cases of dependent samples).

4. How to calculate Student's t-test?

To compare average values, Student's t-test is calculated using the following formula:

Where M 1- arithmetic mean of the first compared population (group), M 2- arithmetic mean of the second compared population (group), m 1- average error of the first arithmetic mean, m 2- average error of the second arithmetic mean.

5. How to interpret the Student's t-test value?

The resulting Student's t-test value must be interpreted correctly. To do this, we need to know the number of subjects in each group (n 1 and n 2). Finding the number of degrees of freedom f according to the following formula:

f = (n 1 + n 2) - 2

After this, we determine the critical value of the Student’s t-test for the required level of significance (for example, p = 0.05) and for a given number of degrees of freedom f according to the table ( see below).

We compare the critical and calculated values ​​of the criterion:

  • If the calculated value of Student's t-test equal or greater critical, found from the table, we conclude that the differences between the compared values ​​are statistically significant.
  • If the value of the calculated Student's t-test less tabular, which means the differences between the compared values ​​are not statistically significant.

6. Example of calculating Student's t-test

To study the effectiveness of a new iron preparation, two groups of patients with anemia were selected. In the first group, patients received a new drug for two weeks, and in the second group they received a placebo. After this, hemoglobin levels in peripheral blood were measured. In the first group, the average hemoglobin level was 115.4±1.2 g/l, and in the second group - 103.7±2.3 g/l (data are presented in the format M±m), the populations being compared have a normal distribution. The number of the first group was 34, and the second - 40 patients. It is necessary to draw a conclusion about the statistical significance of the differences obtained and the effectiveness of the new iron preparation.

Solution: To assess the significance of differences, we use Student’s t-test, calculated as the difference in mean values ​​divided by the sum of squared errors:

After performing the calculations, the t-test value turned out to be 4.51. We find the number of degrees of freedom as (34 + 40) - 2 = 72. We compare the resulting Student's t-test value of 4.51 with the critical value at p = 0.05 indicated in the table: 1.993. Since the calculated value of the criterion is greater than the critical value, we conclude that the observed differences are statistically significant (significance level p<0,05).

The method allows you to test the hypothesis that the average values ​​of two general populations from which the compared ones are extracted dependent selections differ from each other. The assumption of dependence most often means that the characteristic is measured on the same sample twice, for example, before the intervention and after it. In the general case, each representative of one sample is assigned a representative from another sample (they are combined in pairs) so that the two data series are positively correlated with each other. Weaker types of sample dependence: sample 1 - husbands, sample 2 - their wives; sample 1 - one-year-old children, sample 2 is made up of twins of children in sample 1, etc.

Testable statistical hypothesis, as in the previous case, H 0: M 1 = M 2(mean values ​​in samples 1 and 2 are equal). If it is rejected, the alternative hypothesis is accepted that M 1 more (less) M 2.

Initial assumptions for statistical testing:

Each representative of one sample (from one general population) is associated with a representative of another sample (from another general population);

The data from the two samples are positively correlated (form pairs);

The distribution of the studied characteristic in both samples corresponds to the normal law.

Source data structure: there are two values ​​of the studied feature for each object (for each pair).

Restrictions: the distribution of the characteristic in both samples should not differ significantly from normal; the data of two measurements corresponding to one and the other sample are positively correlated.

Alternatives: Wilcoxon T test, if the distribution for at least one sample differs significantly from normal; t-Student test for independent samples - if the data for two samples do not correlate positively.

Formula for the empirical value of the Student's t test reflects the fact that the unit of analysis for differences is difference (shift) attribute values ​​for each pair of observations. Accordingly, for each of the N pairs of attribute values, the difference is first calculated d i = x 1 i - x 2 i.

where M d is the average difference of values; σ d - standard deviation of differences.

Calculation example:

Let’s assume that during testing the effectiveness of the training, each of the 8 members of the group was asked the question “How often does your opinion coincide with the opinion of the group?” - twice, before and after the training. A 10-point scale was used for responses: 1 - never, 5 - half the time, 10 - always. The hypothesis was tested that as a result of the training, the self-esteem of conformity (the desire to be like others in the group) of the participants would increase (α = 0.05). Let's create a table for intermediate calculations (Table 3).


Table 3

The arithmetic mean for the difference M d = (-6)/8 = -0.75. Subtract this value from each d (the penultimate column of the table).

The formula for standard deviation differs only in that d appears in it instead of X. We substitute all the necessary values, we get:

σ d = = 0.886.

Step 1. Calculate the empirical value of the criterion using formula (3): average difference Md= -0.75; standard deviation σ d = 0,886; t e = 2,39; df = 7.

Step 2. Using the table of critical values ​​of the t-Student criterion, we determine the p-level of significance. For df = 7 the empirical value is between the critical values ​​for r= 0.05 and p — 0.01. Hence, r< 0,05.

df R
0,05 0,01 0,001
2,365 3,499 5,408

Step 3. We make a statistical decision and formulate a conclusion. The statistical hypothesis of equality of average values ​​is rejected. Conclusion: the indicator of self-assessment of participants’ conformity after the training increased statistically significantly (at significance level p< 0,05).

Parametric methods include comparison of variances of two samples according to the criterion F-Fisher. Sometimes this method leads to valuable meaningful conclusions, and in the case of comparing means for independent samples, comparing variances is mandatory procedure.

To calculate F em you need to find the ratio of the variances of the two samples, and so that the larger variance is in the numerator, and the smaller one is in the denominator.

Comparison of Variances. The method allows you to test the hypothesis that the variances of the two general populations from which the compared samples are drawn differ from each other. Tested statistical hypothesis H 0: σ 1 2 = σ 2 2 (the variance in sample 1 is equal to the variance in sample 2). If it is rejected, the alternative hypothesis is accepted that one variance is greater than the other.

Initial assumptions: two samples are drawn randomly from different populations with a normal distribution of the characteristic being studied.

Source data structure: the characteristic being studied is measured in objects (subjects), each of which belongs to one of the two samples being compared.

Restrictions: the distributions of the trait in both samples do not differ significantly from normal.

Alternative method: Levene's test, the use of which does not require checking the assumption of normality (used in the SPSS program).

Formula for the empirical value of the Fisher's F test:

(4)

where σ 1 2 large dispersion, and σ 2 2 - smaller dispersion. Since it is not known in advance which dispersion is greater, then to determine the p-level it is used Table of critical values ​​for non-directional alternatives. If F e > F Kp for the corresponding number of degrees of freedom, then r< 0,05 и статистическую гипотезу о равенстве дисперсий можно отклонить (для α = 0,05).

Calculation example:

The children were given regular arithmetic problems, after which one randomly selected half of the students were told that they had failed the test, and the rest were told the opposite. Each child was then asked how many seconds it would take them to solve a similar problem. The experimenter calculated the difference between the time the child called and the result of the completed task (in seconds). It was expected that the message of failure would cause some inadequacy in the child's self-esteem. The hypothesis tested (at the α = 0.005 level) was that the variance of the aggregate self-esteem does not depend on reports of success or failure (H 0: σ 1 2 = σ 2 2).

The following data was obtained:

Step 1. Calculate the empirical value of the criterion and the number of degrees of freedom using formulas (4):

Step 2. According to the table of critical values ​​of the Fisher f-criterion for non-directional alternatives we find the critical value for df number= 11; df know= 11. However, there is a critical value only for df number= 10 and df know = 12. A larger number of degrees of freedom cannot be taken, so we take the critical value for df number= 10: For r= 0,05 F Kp = 3.526; For r= 0,01 F Kp = 5,418.

Step 3. Making a statistical decision and meaningful conclusion. Since the empirical value exceeds the critical value for r= 0.01 (and even more so for p = 0.05), then in this case p< 0,01 и принимается альтернативная гипо-теза: дисперсия в группе 1 превышает дисперсию в группе 2 (p< 0.01). Consequently, after a message about failure, the inadequacy of self-esteem is higher than after a message about success.

Student's t testfor independent samples

Student's t test ( t-Student's test or simply " t-test") is used if you need to compare only two groups quantitative characteristics with normal distribution (a special case of analysis of variance). Note: this criterion cannot be used when comparing several groups in pairs; in this case, analysis of variance must be used. Erroneous use of the Student's t test increases the likelihood of “revealing” differences that do not exist. For example, instead of recognizing several treatments as equally effective (or ineffective), one of them is declared better.

Two events are called independent if the occurrence of one of them does not in any way affect the occurrence of the other. Similarly, two collections can be called independent if the properties of one of them are in no way related to the properties of the other.

Execution example t-test in the STATISTICA program.

Women are on average shorter than men, however, this is not a result of men having any influence on women - it is a matter of genetic characteristics of the sex. By using t- The test needs to check whether there is a statistically significant difference between the mean height values ​​in the groups of men and women. (For educational purposes, we assume that height data follows a normal distribution and therefore t- test is applicable).

Figure 1. Example of data formatting for execution t-

Pay attention to how the data is formatted in Figure 1. As when constructing graphs likeWhisker plot or Box-whisker plot, there are two variables in the table: one of them is grouping (Grouping variable) (“Gender”) - contains codes (husband and wife) that allow the program to determine which of the height data belongs to which group; the second - the so-called dependent variable (Dependent variable) (“Growth”) - contains the actual data being analyzed. However, when executingt-test for independent samples in the STATISTICA program, another design option is possible - data for each of the groups (“Men” and “Women”) can be entered in separate columns (Figure 2).

Figure 2. Another option for formatting data for execution t- independent samples test

To perform t-For an independent samples test, you must do the following:

1-a. Launch module t- dough from the menu Statistics > Basic statistics/Tables > t-test, independent, by groups(if there is a grouping variable in the data table, see Figure 3)​

OR

1-b. Launch module t- dough from the menu Statistics > Basic statistics/Tables > t-test, independent, by variables(if the data is entered in independent columns, see Figure 4).

Below is a version of the test in which there is a grouping variable in the data table.

2. In the window that opens, click the button Variables and tell the program which of the table variables Sreadsheet is grouping, and which is dependent (Figures 5-6).

Figure 5. Selecting variables to include in t-test

Figure 6. Window with in selected variables for conducting t-test

3. Press the buttonSummary: T-tests.

Figure 7. Results t-test for independent samples

As a result, the program will produce a workbookWorkbook, containing a table with the resultst-test (Figure 7 ). This table has several columns:

  • Mean(male) - average height in the “Men” group;
  • Mean(female) - average height in the “Women” group;
  • t- value: value calculated by the program t-Student's test;
  • df- number of degrees of freedom;
  • P- the probability of validity of the hypothesis that the compared average values ​​do not differ. In fact, this is the most important result of the analysis, since it is the value P tells whether the hypothesis being tested is true. In our example, P > 0.05, from which we can conclude that there are no statistically significant differences between the heights of men and women.
  • Valid N(male) - sample size “Men”;
  • Valid N(female) - sample size “Women”;
  • Std. dev. (male) - standard deviation of the “Men” sample;
  • Std. dev. (female) - standard deviation of the “Women” sample;
  • F-ratio, Variances- the value of Fisher's F-test, with the help of which the hypothesis about the equality of variances in the compared samples is tested;
  • P,Variances- the probability of validity of the hypothesis that the variances of the compared samples do not differ.

Statistical hypothesis testing allows us to make strong inferences about the characteristics of a population based on sample data. There are different hypotheses. One of them is the hypothesis about the average (mathematical expectation). Its essence is to draw a correct conclusion, based only on the available sample, about where the general average may or may not be located (we will never know the exact truth, but we can narrow down the search).

The general approach to testing hypotheses has been described, so let's get straight to the point. Let us first assume that the sample is drawn from a normal population of random variables X with general average μ and variance σ 2(I know, I know that this doesn’t happen, but don’t interrupt me!). The arithmetic mean of this sample is obviously itself a random variable. If you extract many such samples and calculate their averages, then they will also have a mathematical expectation μ And

Then the random variable

The question arises: will the general average with a 95% probability be within ±1.96 s x̅. In other words, are the distributions of random variables

equivalent.

This question was first posed (and solved) by a chemist who worked at the Guinness beer factory in Dublin (Ireland). The chemist's name was William Seely Gossett and he took samples of beer for chemical analysis. At some point, apparently, William began to be tormented by vague doubts about the distribution of averages. It turned out to be a little more smeared than a normal distribution should be.

Having collected the mathematical basis and calculated the values ​​of the distribution function he discovered, the Dublin chemist William Gosset wrote a note that was published in the March 1908 issue of the Biometrics magazine (editor-in-chief - Karl Pearson). Because Guinness strictly forbade giving away brewing secrets; Gossett signed with the pseudonym Student.

Despite the fact that K. Pearson had already invented the distribution, the general idea of ​​normality still dominated. No one was going to think that the distribution of sample scores might not be normal. Therefore, W. Gosset’s article remained practically unnoticed and forgotten. And only Ronald Fisher appreciated Gosset's discovery. Fischer used the new distribution in his work and gave it the name Student's t-distribution. The criterion for testing hypotheses, accordingly, became Student's t-test. This is how a “revolution” occurred in statistics, which stepped into the era of analysis of sample data. This was a short excursion into history.

Let's see what W. Gosset could see. Let's generate 20 thousand normal samples from 6 observations with an average ( ) 50 and standard deviation ( σ ) 10. Then we normalize the sample means using general variance:

We will group the resulting 20 thousand averages into intervals of length 0.1 and calculate the frequencies. Let us depict on the diagram the actual (Norm) and theoretical (ENorm) frequency distribution of sample means.

The points (observed frequencies) practically coincide with the line (theoretical frequencies). This is understandable, because the data is taken from the same general population, and the differences are only sampling errors.

Let's conduct a new experiment. We normalize the averages using sample variance.

Let's count the frequencies again and plot them on the diagram in the form of points, leaving a line of the standard normal distribution for comparison. Let us denote the empirical frequency of the averages, say, by the letter t.

It can be seen that the distributions this time do not coincide very much. Close, yes, but not the same. The tails became more “heavy”.

Gosset-Student did not have the latest version of MS Excel, but this is exactly the effect he noticed. Why does this happen? The explanation is that the random variable

depends not only on the sampling error (numerator), but also on the standard error of the mean (denominator), which is also a random variable.

Let's take a little look at what distribution such a random variable should have. First, you will have to remember (or learn) something from mathematical statistics. There is Fisher's theorem, which states that in a sample from a normal distribution:

1. medium and sample variance s 2 are independent quantities;

2. the ratio of sample and population variance, multiplied by the number of degrees of freedom, has a distribution χ 2(chi-square) with the same number of degrees of freedom, i.e.

Where k– number of degrees of freedom (in English degrees of freedom (d.f.))

Many other results in the statistics of normal models are based on this law.

Let's return to the distribution of the average. Divide the numerator and denominator of the expression

on σ X̅. We get

The numerator is a standard normal random variable (we denote ξ (xi)). Let us express the denominator from Fisher's theorem.

Then the original expression will take the form

This is what it is in general form (Student relation). You can derive its distribution function directly, because the distributions of both random variables in this expression are known. Let's leave this pleasure to the mathematicians.

The Student t-distribution function has a formula that is quite difficult to understand, so there is no point in analyzing it. Nobody uses it anyway, because... probabilities are given in special tables of Student distributions (sometimes called tables of Student coefficients), or are included in PC formulas.

So, armed with this new knowledge, you can understand the official definition of the Student distribution.
A random variable subject to the Student distribution with k degrees of freedom is the ratio of independent random variables

Where ξ distributed according to the standard normal law, and χ 2 k obeys distribution χ 2 c k degrees of freedom.

Thus, the Student's t test formula for the arithmetic mean

There is a special case of the student relationship

From the formula and definition it follows that the distribution of Student’s t-test depends only on the number of degrees of freedom.

At k> 30 t-test practically does not differ from the standard normal distribution.

Unlike chi-square, the t-test can be one-tailed or two-tailed. Usually they use two-sided, assuming that the deviation can occur in both directions from the average. But if the problem condition allows deviation only in one direction, then it is reasonable to use a one-sided criterion. This increases the power slightly, because... at a fixed significance level, the critical value approaches zero slightly.

Conditions for using Student's t-test

Despite the fact that Student’s discovery at one time revolutionized statistics, the t-test is still quite limited in its application possibilities, because itself comes from the assumption of a normal distribution of the original data. If the data is not normal (which is usually the case), then the t-test will no longer have a Student distribution. However, due to the action of the central limit theorem, the average even for abnormal data quickly acquires a bell-shaped distribution.

Consider, for example, data that is clearly skewed to the right, such as a chi-square distribution with 5 degrees of freedom.

Now let’s create 20 thousand samples and observe how the distribution of averages changes depending on their volume.

The difference is quite noticeable in small samples of up to 15-20 observations. But then it quickly disappears. Thus, the non-normality of the distribution is, of course, not good, but not critical.

Most of all, the t-test is “afraid” of outliers, i.e. abnormal deviations. Let's take 20 thousand normal samples of 15 observations each and add one random outlier to some of them.

The picture turns out to be bleak. The actual frequencies of the averages are very different from the theoretical ones. Using the t-distribution in such a situation becomes a very risky undertaking.

So, in not very small samples (from 15 observations), the t-test is relatively resistant to non-normal distribution of the original data. But outliers in the data greatly distort the distribution of the t-test, which, in turn, can lead to errors in statistical inference, so anomalous observations should be eliminated. Often, all values ​​that fall within ±2 standard deviations from the mean are removed from the sample.

An example of testing a hypothesis about mathematical expectation using Student's t-test in MS Excel

Excel has several functions related to the t-distribution. Let's look at them.

STUDENT.DIST – “classical” left-sided Student t-distribution. The input is the t-criterion value, the number of degrees of freedom, and an option (0 or 1) that determines what needs to be calculated: density or function value. At the output we obtain, respectively, the density or the probability that the random variable will be less than the t-criterion specified in the argument.

STUDENT.DIST.2X – two-way distribution. The argument is the absolute value (modulo) of the t-test and the number of degrees of freedom. As a result, we obtain the probability of obtaining the same or even greater t-criterion value, i.e. actual significance level (p-level).

STUDENT.DIST.PH – right-sided t-distribution. So, 1-STUDENT.DIST(2;5;1) = STUDENT.DIST.PH(2;5) = 0.05097. If the t-test is positive, then the resulting probability is the p-level.

STUDENT.INR – used to calculate the left-sided inverse of the t-distribution. The argument is the probability and the number of degrees of freedom. At the output we obtain the t-criterion value corresponding to this probability. The probability count is on the left. Therefore, the left tail requires the significance level itself α , and for the right one 1 - α .

STUDENT.OBR.2X – the inverse value for the two-sided Student distribution, i.e. t-test value (modulo). The significance level is also supplied to the input α . Only this time the counting is carried out from both sides simultaneously, so the probability is distributed into two tails. So, STUDENT.ARV(1-0.025;5) = STUDENT.ARV.2X(0.05;5) = 2.57058

STUDENT.TEST is a function for testing the hypothesis about the equality of mathematical expectations in two samples. Replaces a bunch of calculations, because It is enough to specify only two ranges with data and a couple more parameters. The output is p-level.

CONFIDENCE.STUDENT – calculation of the confidence interval of the average taking into account the t-distribution.

Let's consider this training example. At the enterprise, cement is packaged in 50 kg bags. Due to randomness, some deviation from the expected mass is allowed in a single bag, but the general average should remain 50 kg. The quality control department randomly weighed 9 bags and obtained the following results: average weight ( ) was 50.3 kg, standard deviation ( s) – 0.5 kg.

Is this result consistent with the null hypothesis that the general mean is 50 kg? In other words, is it possible to obtain such a result by pure chance if the equipment is working properly and produces an average filling of 50 kg? If the hypothesis is not rejected, then the resulting difference fits into the range of random fluctuations, but if the hypothesis is rejected, then most likely there was a malfunction in the settings of the machine that fills the bags. It needs to be checked and configured.

A short condition in generally accepted notation looks like this.

H0: μ = 50 kg

H1: μ ≠ 50 kg

There is reason to assume that the distribution of bag fills follows a normal distribution (or is not very different from it). This means that to test the hypothesis about the mathematical expectation, you can use the Student t-test. Random deviations can occur in any direction, which means a two-sided t-test is needed.

First, we will use antediluvian means: manually calculating the t-criterion and comparing it with the critical table value. Calculated t-test:

Now let’s determine whether the resulting number exceeds the critical level at the significance level α = 0.05. Let's use the Student's t-distribution table (available in any statistics textbook).

The columns show the probability of the right side of the distribution, and the rows show the number of degrees of freedom. We are interested in a two-tailed t-test with a significance level of 0.05, which is equivalent to the t-value for half the significance level on the right: 1 - 0.05/2 = 0.975. The number of degrees of freedom is the sample size minus 1, i.e. 9 - 1 = 8. At the intersection we find the table value of the t-test - 2.306. If we used the standard normal distribution, then the critical point would be 1.96, but here it is larger, because The t-distribution in small samples has a more flattened appearance.

Let's compare the actual (1.8) and table value (2.306). The calculated criterion turned out to be less than the tabulated one. Consequently, the available data do not contradict the hypothesis H 0 that the general average is 50 kg (but do not prove it either). That's all we can learn using tables. You can, of course, also try to find the p-level, but it will be approximate. And, as a rule, it is the p-level that is used to test hypotheses. Therefore, we next move to Excel.

There is no ready-made function for calculating the t-test in Excel. But this is not scary, because the Student’s t-test formula is quite simple and can be easily built right in an Excel cell.

We got the same 1.8. Let us first find the critical value. We take alpha 0.05, the criterion is two-sided. We need the inverse t-distribution function for the two-sided hypothesis STUDENT.OBR.2X.

The resulting value cuts off the critical region. The observed t-test does not fall into it, so the hypothesis is not rejected.

However, this is the same way of testing a hypothesis using a table value. It would be more informative to calculate p-level, i.e. the probability of obtaining the observed or even greater deviation from the average of 50 kg, if this hypothesis is correct. You will need the Student distribution function for the two-sided hypothesis STUDENT.DIST.2X.

The P-level is 0.1096, which is greater than the acceptable significance level of 0.05 – we do not reject the hypothesis. But now we can judge the degree of evidence. The P-level turned out to be quite close to the level when the hypothesis is rejected, and this leads to different thoughts. For example, that the sample was too small to detect a significant deviation.

After some time, the control department again decided to check how the bag filling standard was being maintained. This time, for greater reliability, not 9, but 25 bags were selected. It is intuitively clear that the spread of the average will decrease, and, therefore, the chances of finding a failure in the system become greater.

Let's say the same values ​​of the mean and standard deviation for the sample were obtained as the first time (50.3 and 0.5, respectively). Let's calculate the t-test.


The critical value for 24 degrees of freedom and α = 0.05 is 2.064. The picture below shows that the t-test falls within the range of hypothesis rejection.

We can conclude that with a confidence probability of more than 95%, the general average differs from 50 kg. To be more convincing, let’s look at the p-level (the last line in the table). The probability of obtaining an average with the same or even greater deviation from 50, if the hypothesis is correct, is 0.0062, or 0.62%, which is practically impossible with a single measurement. In general, we reject the hypothesis as unlikely.

Calculating a Confidence Interval Using the Student's t-Distribution

Another statistical method is closely related to hypothesis testing - calculation of confidence intervals. If the resulting interval contains a value corresponding to the null hypothesis, then this is equivalent to the fact that the null hypothesis is not rejected. Otherwise, the hypothesis is rejected with the corresponding confidence level. In some cases, analysts do not test hypotheses in the classical form at all, but only calculate confidence intervals. This approach allows you to extract even more useful information.

Let's calculate confidence intervals for the mean for 9 and 25 observations. To do this, we will use the Excel function CONFIDENT.STUDENT. Here, oddly enough, everything is quite simple. You only need to specify the significance level in the function arguments α , sample standard deviation and sample size. At the output we get the half-width of the confidence interval, that is, the value that needs to be placed on both sides of the average. After carrying out the calculations and drawing a visual diagram, we get the following.

As you can see, with a sample of 9 observations, the value 50 falls within the confidence interval (the hypothesis is not rejected), and with 25 observations it does not fall within the confidence interval (the hypothesis is rejected). Moreover, in an experiment with 25 bags, it can be stated that with a probability of 97.5% the general average exceeds 50.1 kg (the lower limit of the confidence interval is 50.094 kg). And this is quite valuable information.

Thus, we solved the same problem in three ways:

1. Using an ancient approach, comparing the calculated and tabulated values ​​of the t-test
2. More modern, by calculating the p-level, adding a degree of confidence when rejecting the hypothesis.
3. Even more informative by calculating the confidence interval and obtaining the minimum value of the general average.

It is important to remember that the t-test refers to parametric methods, because is based on a normal distribution (it has two parameters: mean and variance). Therefore, for its successful application, at least approximate normality of the initial data and the absence of outliers are important.

Finally, I suggest watching a video on how to carry out calculations related to the Student t-test in Excel.



This article is also available in the following languages: Thai

  • Next

    THANK YOU so much for the very useful information in the article. Everything is presented very clearly. It feels like a lot of work has been done to analyze the operation of the eBay store

    • Thank you and other regular readers of my blog. Without you, I would not be motivated enough to dedicate much time to maintaining this site. My brain is structured this way: I like to dig deep, systematize scattered data, try things that no one has done before or looked at from this angle. It’s a pity that our compatriots have no time for shopping on eBay because of the crisis in Russia. They buy from Aliexpress from China, since goods there are much cheaper (often at the expense of quality). But online auctions eBay, Amazon, ETSY will easily give the Chinese a head start in the range of branded items, vintage items, handmade items and various ethnic goods.

      • Next

        What is valuable in your articles is your personal attitude and analysis of the topic. Don't give up this blog, I come here often. There should be a lot of us like that. Email me I recently received an email with an offer that they would teach me how to trade on Amazon and eBay. And I remembered your detailed articles about these trades. area I re-read everything again and concluded that the courses are a scam. I haven't bought anything on eBay yet. I am not from Russia, but from Kazakhstan (Almaty). But we also don’t need any extra expenses yet. I wish you good luck and stay safe in Asia.

  • It’s also nice that eBay’s attempts to Russify the interface for users from Russia and the CIS countries have begun to bear fruit. After all, the overwhelming majority of citizens of the countries of the former USSR do not have strong knowledge of foreign languages. No more than 5% of the population speak English. There are more among young people. Therefore, at least the interface is in Russian - this is a big help for online shopping on this trading platform. eBay did not follow the path of its Chinese counterpart Aliexpress, where a machine (very clumsy and incomprehensible, sometimes causing laughter) translation of product descriptions is performed. I hope that at a more advanced stage of development of artificial intelligence, high-quality machine translation from any language to any in a matter of seconds will become a reality. So far we have this (the profile of one of the sellers on eBay with a Russian interface, but an English description):
    https://uploads.disquscdn.com/images/7a52c9a89108b922159a4fad35de0ab0bee0c8804b9731f56d8a1dc659655d60.png