Testing the hypothesis of equality of means. Testing statistical hypotheses about equality of means

27.09.2019 | Hobbies and entertainment

8.1. The concept of dependent and independent samples.

Selecting a criterion for testing a hypothesis

is primarily determined by whether the samples under consideration are dependent or independent. Let us introduce the corresponding definitions.

Def. The samples are called independent, if the procedure for selecting units in the first sample is in no way connected with the procedure for selecting units in the second sample.

An example of two independent samples would be the samples discussed above of men and women working at the same enterprise (in the same industry, etc.).

Note that the independence of two samples does not at all mean that there is no requirement for a certain kind of similarity of these samples (their homogeneity). Thus, when studying the income level of men and women, we are unlikely to allow a situation where men are selected from among Moscow businessmen, and women from the aborigines of Australia. Women should also be Muscovites and, moreover, “businesswomen.” But here we are not talking about the dependence of samples, but about the requirement of homogeneity of the studied population of objects, which must be satisfied both when collecting and when analyzing sociological data.

Def. The samples are called dependent, or paired, if each unit of one sample is “linked” to a specific unit of the second sample.

This last definition will probably become clearer if we give an example of dependent samples.

Suppose we want to find out whether the father's social status is, on average, lower social status son (we believe that we can measure this complex and ambiguously understood social characteristic of a person). It seems obvious that in such a situation it is advisable to select pairs of respondents (father, son) and assume that each element of the first sample (one of the fathers) is “tied” to a certain element of the second sample (his son). These two samples will be called dependent.

8.2. Hypothesis testing for independent samples

For independent samples, the choice of criterion depends on whether we know the general variances s 1 2 and s 2 2 of the characteristic under consideration for the samples being studied. We will consider this problem solved, assuming that the sample variances coincide with the general ones. In this case, the criterion is the value:

Before moving on to discussing the situation when the general variances (or at least one of them) are unknown to us, we note the following.

The logic for using criterion (8.1) is similar to that which we described when considering the “Chi-square” criterion (7.2). There is only one fundamental difference. Speaking about the meaning of criterion (7.2), we considered an infinite number of samples of size n, “drawn” from our general population. Here, analyzing the meaning of criterion (8.1), we move on to considering an infinite number steam samples of size n 1 and n 2. For each pair, statistics of the form (8.1) are calculated. The totality of the obtained values of such statistics, in accordance with our notation, corresponds to a normal distribution (as we agreed, the letter z is used to denote such a criterion that the normal distribution corresponds to).

So, if the general variances are unknown to us, then we are forced to use their sample estimates s 1 2 and s 2 2 instead. However, in this case, the normal distribution should be replaced by the Student distribution - z should be replaced by t (as was the case in a similar situation when constructing a confidence interval for the mathematical expectation). However, with sufficiently large sample sizes (n 1, n 2 ³ 30), as we already know, the Student distribution practically coincides with the normal one. In other words, for large samples we can continue to use the criterion:

The situation is more complicated when the variances are unknown and the size of at least one sample is small. Then another factor comes into play. The type of criterion depends on whether we can consider the unknown variances of the characteristic under consideration in the two analyzed samples to be equal. To find out, we need to test the hypothesis:

H 0: s 1 2 = s 2 2. (8.3)

To test this hypothesis, the criterion is used

The specifics of using this criterion will be discussed below, but now we will continue to discuss the algorithm for selecting a criterion used to test hypotheses about the equality of mathematical expectations.

If hypothesis (8.3) is rejected, then the criterion of interest to us takes the form:

(8.5)

(i.e., it differs from criterion (8.2), which was used for large samples, in that the corresponding statistics do not have a normal distribution, but a Student distribution). If hypothesis (8.3) is accepted, then the type of criterion used changes:

(8.6)

Let us summarize how a criterion is selected to test the hypothesis about the equality of general mathematical expectations based on the analysis of two independent samples.

known

unknown

sample size is large

H 0: s 1 = s 2 rejected

Accepted

8.3. Hypothesis testing for dependent samples

Let's move on to considering dependent samples. Let the sequences of numbers

X 1, X 2, …, X n;

Y 1 , Y 2 , … , Y n –

these are the values of the considered random one for the elements of two dependent samples. Let us introduce the notation:

D i = X i - Y i , i = 1, ... , n.

For dependent sample criterion that allows you to test a hypothesis

as follows:

Note that the just given expression for s D is nothing more than a new expression for the well-known formula expressing the standard deviation. In this case we are talking about the standard deviation of the values of D i . A similar formula is often used in practice as a simpler (compared to the “head-on” calculation of the sum of squared deviations of the values of the value under consideration from the corresponding arithmetic mean) method of calculating dispersion.

If we compare the above formulas with those that we used when discussing the principles of constructing a confidence interval, it is easy to notice that testing the hypothesis of equality of means for the case of dependent samples is essentially testing the equality of the mathematical expectation of the values D i to zero. Magnitude

is the standard deviation for D i . Therefore, the value of the just described criterion t n -1 is essentially equal to the value of D i expressed as a fraction of the standard deviation. As we said above (when discussing methods for constructing confidence intervals), this indicator can be used to judge the probability of the considered value Di. The difference is that above we were talking about a simple arithmetic mean, normally distributed, and here we are talking about average differences, such averages have a Student distribution. But the reasoning about the relationship between the probability of deviation of the sample arithmetic mean from zero (with a mathematical expectation equal to zero) and how many units s this deviation constitutes remains valid.

Example. The income of pharmacies in one of the city's microdistricts for a certain period amounted to 128; 192; 223; 398; 205; 266; 219; 260; 264; 98 (conventional units). In the neighboring microdistrict for the same time they were equal to 286; 240; 263; 266; 484; 223; 335.
For both samples, calculate the mean, corrected variance, and standard deviation. Find the range of variation, the average absolute (linear) deviation, the coefficient of variation, linear coefficient variations, oscillation coefficient.
Assuming that a given random variable has a normal distribution, determine confidence interval for the general average (in both cases).
Using the Fisher criterion, check the hypothesis of equality of general variances. Using the Student's test, check the hypothesis about the equality of general means (the alternative hypothesis is about their inequality).
In all calculations, the significance level is α = 0.05.

We carry out the solution using the calculator Testing the hypothesis of equality of variances.
1. Find the variation indicators for the first sample.

x	\|x - x av \|	(x - x avg) 2
98	127.3	16205.29
128	97.3	9467.29
192	33.3	1108.89
205	20.3	412.09
219	6.3	39.69
223	2.3	5.29
260	34.7	1204.09
264	38.7	1497.69
266	40.7	1656.49
398	172.7	29825.29
2253	573.6	61422.1

.

Variation indicators.
.

R = X max - X min
R = 398 - 98 = 300
Average linear deviation

Each value of the series differs from the other by an average of 57.36
Dispersion

Unbiased variance estimator

.

Each value of the series differs from the average value of 225.3 by an average of 78.37
.

.

The coefficient of variation

Since v>30%, but v or

Oscillation coefficient

.
.

Using the Student's table we find:
T table (n-1;α/2) = T table (9;0.025) = 2.262

(225.3 - 59.09;225.3 + 59.09) = (166.21;284.39)

2. Find the variation indicators for the second sample.
Let's rank the row. To do this, we sort its values in ascending order.
Table for calculating indicators.

x	\|x - x av \|	(x - x avg) 2
223	76.57	5863.18
240	59.57	3548.76
263	36.57	1337.47
266	33.57	1127.04
286	13.57	184.18
335	35.43	1255.18
484	184.43	34013.9
2097	439.71	47329.71

To evaluate the distribution series, we find the following indicators:
Distribution center indicators.
Simple arithmetic average

Variation indicators.
Absolute variations.
The range of variation is the difference between the maximum and minimum values of the primary series characteristic.
R = X max - X min
R = 484 - 223 = 261
Average linear deviation- calculated in order to take into account the differences of all units of the population under study.

Each value of the series differs from the other by an average of 62.82
Dispersion- characterizes the measure of dispersion around its average value (a measure of dispersion, i.e. deviation from the average).

Unbiased variance estimator- consistent estimate of variance (corrected variance).

Standard deviation.

Each value of the series differs from the average value of 299.57 by an average of 82.23
Estimation of standard deviation.

Relative Variation Measures.
Relative indicators of variation include: coefficient of oscillation, linear coefficient of variation, relative linear deviation.
The coefficient of variation- a measure of the relative dispersion of population values: shows what proportion of the average value of this value is its average dispersion.

Since v ≤ 30%, the population is homogeneous and the variation is weak. The results obtained can be trusted.
Linear coefficient of variation or Relative linear deviation- characterizes the proportion of the average value of the sign of absolute deviations from the average value.

Oscillation coefficient- reflects the relative fluctuation of the extreme values of the characteristic around the average.

Interval estimation of the population center.
Confidence interval for general mean.

Determine the t kp value using the Student distribution table
Using the Student's table we find:
T table (n-1;α/2) = T table (6;0.025) = 2.447

(299.57 - 82.14;299.57 + 82.14) = (217.43;381.71)
With a probability of 0.95, it can be stated that the average value with a larger sample size will not fall outside the found interval.
We test the hypothesis of equality of variances:
H 0: D x = D y ;
H 1: D x Let's find the observed value of the Fisher criterion:

Since s y 2 > s x 2, then s b 2 = s y 2, s m 2 = s x 2
Number of degrees of freedom:
f 1 = n y – 1 = 7 – 1 = 6
f 2 = n x – 1 = 10 – 1 = 9
Using the table of critical points of the Fisher–Snedecor distribution at a significance level of α = 0.05 and given numbers of degrees of freedom, we find F cr (6;9) = 3.37
Because F obs. We test the hypothesis about the equality of general means:

Let's find the experimental value of the Student's criterion:

Number of degrees of freedom f = n x + n y – 2 = 10 + 7 – 2 = 15
Determine the t kp value using the Student distribution table
Using the Student's table we find:
T table (f;α/2) = T table (15;0.025) = 2.131
Using the table of critical points of the Student distribution at a significance level of α = 0.05 and a given number of degrees of freedom, we find tcr = 2.131
Because t obs.

Among the most important generalizing characteristics, about which hypotheses are most often put forward, is the average value. In order to test the hypothesis about the equality of means in the population, it is necessary to formulate a null hypothesis. In this case, as a rule, it is assumed that both samples are taken from a normally distributed population with mathematical expectation, equal X and with a variance equal to c0. If this assumption is correct, then x1 - x2 ~ x. In fact, the sample means X1 and X2 will not be equal due to the randomness of the sample. Therefore, it is necessary to find out the significance of the differences between x1 x2 - whether their difference is within the limits of possible random variation or whether it goes beyond these limits. Then the task of testing the hypothesis comes down to checking the significance of the difference

Each sample mean has its own error /And:

Having determined the variances and average error sample means, you can calculate the actual value of the I-criterion and compare it with the critical (tabular) value at the appropriate level of significance and the number of degrees of freedom of variation (for samples with n > 30, the I-criterion is used normal distribution, and for samples of size n< 30 - и-критерий Стьюдента).

The actual value of the i-criterion is determined by the formula

If the sample value of the criterion falls into the critical region (ifacq), the null hypothesis of equality of means is rejected; if the sample value of the criterion falls within the range of acceptable values (ifacg< їа), нулевая гипотеза принимается.

The null hypothesis that the means in two populations are equal can also be tested by comparing the actual mean difference [єФа,.т = ~~2 ) with a maximum random error at a given significance level (ea). If the actual difference between sample means is within the limits of random error (fact< еа), нулевая гипотеза принимается. Если же фактическая разница между средними выходит за пределы случайной ошибки (еф^т >ea), the null hypothesis is rejected.

When solving specific problems of testing statistical hypotheses regarding averages, it is necessary to take into account the following points: 1) the sampling scheme (independent and dependent samples); 2) equality or inequality of sample sizes; 3) equality or inequality of variances in general populations.

The algorithm for testing the hypothesis regarding two means changes slightly if the variances across the samples (512 and 522) differ significantly. In this case, when determining the number of degrees of freedom, a correction is introduced:

When, with unequal variances across samples, their numbers (n1 and n2) are also uneven, the tabulated value of Student’s r-test should be calculated using the formula

where u1 and u2 are the tabulated values of Student’s T-test, which are taken in accordance with p1- 1 and n2 - 1 degrees of freedom.

Let's consider an example of testing a statistical hypothesis about the equality of two average independent samples of equal size (p1=p2) and equal variances (SG;2 =).

Yes, there is data on the live weight of calves at birth in two groups of black-and-white cows (cows of the same age). The first group of cows had a normal lactation duration (305 days), and the second group was milked for 320 days. Each group included 5 cows. Observation data are given in table. 7.2.

Table 7.2. Live weight of calves at birth by groups of cows with different lactation durations

A comparison of the live weights of calves in two groups of cows shows that a higher live weight of calves is observed in cows of group I, which had a normal duration of lactation. However, due to the fact that the number of samples is small (n = 5), the possibility cannot be excluded that the differences between the living masses resulted from random causes.

It is necessary to statistically evaluate the difference between the averages for the two groups of cows.

Based on the results of testing the hypothesis, conclude that the difference between the means lies within the range of random fluctuations, or this difference is so significant that it is not consistent with the null hypothesis about the random nature of the differences between the means.

If the second position is proven and the first is rejected, it can be argued that the duration of lactation affects the live weight of calves.

The condition of the problem assumes that both samples are taken from a normally distributed population. The formation of groups is random (independent), so the difference between means must be assessed.

Let's determine the average live weight of calves for two groups of cows:

The actual difference between the averages is:

The significance of this difference must be assessed. To do this, it is necessary to test the hypothesis that the two averages are equal.

Let us consider in detail all the stages of the hypothesis testing scheme. 1. Let’s formulate the null and H alternative hypotheses:

2. Let us accept the significance level a = 0.05, guaranteeing acceptance or rejection of the hypothesis with the probability of error only in 5 cases out of 100.

3. The most powerful criterion for testing this kind of hypothesis H0 is the Student’s t-test.

4. Let us formulate a rule for making decisions based on the results

H0 checks. Because according to the alternative hypothesis x1 may be less or more x2, then the critical region must be established from two

sides: and - ~ia and and - ia, or in short: ia.

This form of specifying a criterion is called bilateral critical area. The critical region at a = 0.05 will be contained within - all values higher than the upper 2.5% and lower than the 2.5% point of the Student's t-test distribution.

Taking into account the above, the conclusions for testing H0 can be formulated as follows: the hypothesis H0 will be rejected if the actual value of the G-criterion turns out to be

greater than the table value, that is, if іfact > ia. Otherwise Ka must be accepted.

5. To check H0, you need to determine the actual value of the Student’s T-test and compare it with the table value.

To determine the actual value of the Student's T-test, we perform the following calculations.

6. Let us calculate the variance variations for each sample, corrected for the loss of degrees of freedom. To do this, let’s first square the values of xc and x2i:

7. Calculate the squared average errors for each sample and the generalized average error of the difference between the means:

8. Let’s calculate the actual value of Student’s G-test:

9. Let us determine the tabular value of the G-Student criterion based on the significance level a = 0.05 and the number of degrees of freedom for two samples:

Using the table "Critical points of the Student's distribution" (Add. 3), we find for a = 0.05 and k = 8: i005 = 2.31.

10. Let’s compare the actual and tabulated values of the Student’s t-test:

Since ifakkg< и^05 (выборочное значение критерия находится в области допустимых значений), нулевая гипотеза о равенстве средних генеральных совокупностях принимается.

So, the effect of lactation duration on the live weight of calves at birth appears to be underestimated.

However, attention should be paid to this significant point: the live weight of calves at birth in all experimental observations is higher in the first group of cows that have a normal lactation duration. Therefore, instead of the alternative hypothesis On x1 f x2 another can be taken. Since there is no reason to believe that with a normal duration of lactation the live weight of calves will be lower, it is obvious that a more appropriate form of the alternative hypothesis is: Na: x1 > x2.

Then the critical area, which is 0.05 of the entire area under the distribution curve, will be located only on one (right) side, since negative values of live weights are considered incompatible with the conditions of the problem. In this regard, the tabular criterion value should be determined at twice the significance level (i.e. at 2a; ia = 2 o 0.05 = 0.10). The hypothesis testing criterion is formulated as follows: the null hypothesis is rejected if > і2а.

This form of the critical region problem is called unilateral. The one-sided criterion is more sensitive to errors of the second type, but its use is permissible only if the validity of this alternative hypothesis is proven.

Using the tables (Appendix 3), we establish the table value-criteria at a = 0.10 and k = 8, i0Д0 = 1.86.

So, when using a one-sided test, the null hypothesis is rejected, i.e. the criterion will be in the critical region (ifakg > і0д0; 2.14 > 1.86). Thus, the live weight of calves at birth in the group of cows with normal lactation duration is significantly higher. This conclusion is more accurate than that obtained based on a two-sided test, since additional information is used to justify the correctness of the one-sided test.

We obtain the same conclusion by comparing the possible maximum error of two samples ea with the actual difference in means.

Let's calculate the possible maximum error of the difference between the averages for two samples: є0do = Г010 o /А_2 = 1.86 o 1.87 = 3.48 kg and compare it with the actual difference of the averages:

By comparing the maximum possible error with the actual difference in means, we can draw a similar conclusion that the hypothesis put forward about the equality of means is not consistent with the results obtained.

We will consider testing the hypothesis for the case of dependent samples with equal numbers and equal variances using the following example.

Yes, there is selective observation data on the productivity of mother cows and daughter cows (Table 7.3).

Table 7.3. Productivity of mother cows and daughter cows

It is necessary to test a statistical hypothesis regarding the mean difference between pairs of related observations in a population.

Since observations of two samples are pairwise interrelated (dependent samples), it is necessary to compare not the difference between the means, but the average value of the differences between pairs of observations (and). Let's consider all stages of the hypothesis testing procedure. 1. Let’s formulate the null and alternative hypotheses:

With this alternative, it is necessary to apply a two-sided test.

2. Let us take the significance level to be a = 0.05.

3. The most powerful criterion for checking H0 is the Student’s t-test.

4. Calculate the average difference

5. Calculate the adjusted variance of the mean difference:

6. Determine the average error of the average difference:

7. Let’s calculate the actual value of the Student’s t-test:

8. Let’s establish the number of degrees of freedom based on the number of pairs of interconnected differences:

9. Let’s find the table value of Student’s G-criterion at To= 4 and a = 0.05; V. = 2.78 (Appendix 3).

10. Let’s compare the actual and tabulated value of the criterion:

The actual value of the criterion is shown in the table above. Therefore, the magnitude of the average difference between the milk yields of the two samples is significant and the null hypothesis is rejected.

We get the same conclusions by comparing the possible marginal error with the actual average difference:

The maximum error shows that as a result of random variation, the average difference can reach 2.4 c. The actual average difference is higher:

So, based on the results of the study, it can be stated with a high degree of probability that differences in the average milk yields of mother cows and daughter cows are likely.

Sometimes it turns out that average result from the main series of experiments differs from the average result of another series of experiments. It is necessary to determine whether this difference is accidental or not, i.e. can we consider that the result of the experiment represents a sample of two independent general populations with the same means, or the means of these populations are not equal.

The formal formulation of this problem is as follows: two random variables, distributed according to the normal law:

, Where σ – standard deviation.

It is assumed that the variances are known, but the mathematical expectations are unknown.

Let there be two series of observations of the quantities Χ and Υ.

Χ: x 1, x 2, ..., x n 1.

Υ: y 1, y 2, …, y n 2.

We put forward the following hypothesis that m x =m y. Based on observations, it is necessary to confirm or refute this hypothesis. If the null hypothesis is confirmed, then we can say that the differences between the average values in the two samples are statistically insignificant, i.e. explained as a random error.

A z-test is used to test this hypothesis. For this purpose it is calculated

z-test (z-statistic), which is defined as follows:

Arithmetic mean of the series n observations.

The z-test is normally distributed with zero mathematical expectation and unit variance.

H 1: m x ≠ m y

The null hypothesis that the means are equal: H0: =

The alternative hypothesis that the means are not equal is as follows : H 1: ≠.

With an alternative hypothesis, the following options are possible: either< , либо >. Accordingly, we must apply a two-sided test. Thus, there are two critical points: and.

These points are selected from the condition:

(1) Р(-∞

(2) P(

By value we determine the left and right critical points.

where F(z) is the cumulative distribution function of the random variable Z, and F -1 (...) is the inverse function.

Definition: Let the function y = f(x) be defined on the segment , and let the set of values of this function be the segment [α, β]. Let, further, each y from the segment [α, β] corresponds to only one value x from the segment for which f(x) = y. Then on the segment [α, β] we can define the function x = f -1 (y), assigning to each y from [α, β] the value x from for which f(x) = y. The function x = f -1 (y) is called the inverse of the function y = f(x).

The values of critical points can be found using the function: =NORMSINV, by specifying the probability value () in the dialog box - to find the value , or the value (1 - ) - to find the value ).

Magnitude Z, normally distributed with parameters Z=N(0;1), distributed symmetrically:

0,05

Geometric interpretation: the probability of falling in the area where the hypothesis is rejected is equal to the sum of the shaded areas.

Testing sequence:

1. Calculate statistics Z.

2. We set the level of significance.

3. We determine the critical points based on conditions (1) and (2).

4. Compare the value calculated in step 1 Z with the value of critical points:

If the value Z- statistics will be greater in absolute value than the value of the critical point, then the null hypothesis is rejected at this level of significance. This means that the two populations from which the sample is drawn are different and, therefore, the means and expectations for these samples are not equal. Otherwise, the hypothesis of equality of means is accepted, and the two populations can be considered as one in common with the same mathematical value.

There is an analysis tool in EXCEL called “two-sample Z-test for averages" (Service - data analysis - two-sample Z- test for averages). It serves to test the hypothesis about the difference between the means (mathematical expectations) of two normal distributions with known variances.

When this tool is called, a dialog box appears in which the following parameters are set:

*Hypothetical mean difference: the number of the expected difference between the means for the general sequence being studied is entered. To test the hypothesis of equality of means, you must enter the value zero.

* Variance of variable 1 (known): a known value of the variance of the random variable X is introduced.

* Variance of variable 2 (known): a known value of the variance of the random variable Y is introduced.

* Labels: if activated, the first line is perceived as a heading and is not counted.

* Alpha: the significance level is set equal to the probability of making a type I error.

EXERCISE 1:

Selected data on the diameter of the rollers in millimeters produced by machines 1 and 2 are known.

Dispersion for machine 1: = 5 mm 2.

Dispersion for machine 2: =7 mm 2.

Significance level = 0.05.

1.Using two-sample Z- test for averages check the hypothesis of equality of average values for your option.

2. Check the same hypothesis using calculation formulas.

Let's consider the same problem as in the previous paragraph 3.4, but only under the condition that the sample sizes and are small (less than 30). In this case, replacing the general variances and included in (3.15) with corrected sample variances and can lead to a large error in the value of and, consequently, to a large error in establishing the area of acceptance of the hypothesis H0. However, if there is confidence that unknown general and Same(for example, if the average sizes of two batches of parts manufactured on the same machine are compared), then it is possible, using the Student distribution, and in this case to construct a criterion for testing the hypothesis H0 X And Y. To do this, introduce a random variable

, (3.16)

(3.17)

The mean of the corrected sample variances and , serving as a point estimate of both identical unknown population variances and . As it turns out (see, p. 180), if the null hypothesis is true H0 random value T has a Student distribution with degrees of freedom regardless of sample sizes and sizes. If the hypothesis H0 is true, then the difference should be small. That is, the experimental value T Exp. quantities T should be small. Namely, it must be within certain boundaries. If it goes beyond these boundaries, we will consider it a refutation of the hypothesis. H0, and we will assume this with a probability equal to the specified significance level α .

Thus, the area of acceptance of the hypothesis H0 will be a certain interval in which the values of the random variable T must hit with probability 1- α :

The value determined by equality (3.18) for different significance levels α and various numbers K degrees of freedom of quantity T can be found in the table of critical points of the Student distribution (Table 4 of the Appendix). Thus, the hypothesis acceptance interval will be found H0. And if the experimental value T Exp values T falls into this interval - hypothesis H0 accepted. If it doesn't hit, they won't accept it.

Note 1. If there is no reason to consider the general variances and values equal X And Y, then in this case to test the hypothesis H0 on the equality of mathematical expectations of quantities X And Y It is permissible to use the Student's t-test stated above. Only now the magnitude T number K degrees of freedom should be considered not equal, but equal (see)

(3.19)

If the corrected sample variances differ significantly, then the second term in the last bracket of (3.19) is small compared to 0.5, so that expression (3.19) compared to expression reduces the number of degrees of freedom of a random variable T almost doubled. And this leads to a significant expansion of the hypothesis acceptance interval H0 and, accordingly, to a significant narrowing of the critical area of non-acceptance of this hypothesis. And this is quite fair, since the degree of spread of possible values of the difference will be mainly determined by the spread of values of one of the quantities X And Y, which has a large dispersion. That is, information from a sample with less variance seems to disappear, which leads to greater uncertainty in conclusions about the hypothesis H0 .

Example 4. Using the data given in the table, compare the average milk yield of cows fed different diets. When testing the null hypothesis H0 on the equality of average milk yields, accept the level of significance α =0,05.

Number of cows receiving the diet

(Heads)

Average daily milk yield in terms of base fat content

(kg/head)

Standard deviation of daily milk production of cows

(kg/head)

. Since the tabular data presented were obtained on the basis of small samples with volumes =10 and =8, then to compare the mathematical expectations of the average daily milk yield of cows receiving both feed rations, we must use the theory outlined in this paragraph. To do this, first of all, we will find out whether the found corrected sample variances =(3.8)2=14.44 and =(4.2)2=17.64 allow us to consider the general variances and . To do this, we use the Fisher-Snedecor criterion (see paragraph 3.3). We have:

According to the table of critical points of the Fisher-Snedecor distribution for α =0,05; K1 =8-1=7 and K2 =10-1=9 we find

And since , then we have no basis at this level of significance α =0.05 reject the hypothesis H0 on the equality of general variances and .

Now, in accordance with (3.17) and (3.16), let us calculate the experimental value of the quantity T:

Further, according to the formula find the number K degrees of freedom of quantity T: K=10+8-2=16. After that for n0+8-2=16. oods (3.16) we calculate the experimental value of the value T: Y feed rations, we must use α =0.05 and K=16 from the table of critical points of the Student distribution (Table 4 of the Appendix) we find: =2.12. Thus, the hypothesis acceptance interval H0 about the equality of average milk yields of cows receiving rations No. 1 and No. 2, the interval = (-2.12; 2.12). And since = - 0.79 falls into this interval, we have no reason to reject the hypothesis H0 . That is, we have the right to assume that the difference in feed rations does not affect the average daily milk yield of cows.

Note 2. In paragraphs 3.4 and 3.5 discussed above, the null hypothesis was considered H0 about equality M(X)=M(Y) under alternative hypothesis H1 about their inequality: M(X)≠M(Y). But the alternative hypothesis H1 there may be another one, for example, M(Y)>M(X). In practice, this case will occur when some improvement (positive factor) is introduced, which allows us to count on an increase in the average value of a normally distributed random variable Y compared to the values of a normally distributed quantity X. For example, a new feed additive has been introduced into the diet of cows, which makes it possible to expect an increase in the average milk yield of cows; additional fertilizing has been added to the crop, allowing us to expect an increase in the average yield of the crop, etc. And I would like to find out whether this introduced factor is significant (significant) or insignificant. Then in the case of large volumes and Samples (see paragraph 3.4) as a criterion for the validity of the hypothesis H0 consider a normally distributed random variable

At a given significance level α Hypothesis H0 about equality M(X) And M(Y) will be rejected if the experimental value of the quantity is positive and larger, where

Since if the hypothesis is true H0 M(Z)= 0, then