Confidence intervals for mathematical expectation, variance, probability. Problem solving

you can use this form search to find the right task. Enter a word, a phrase from the task or its number if you know it.


Search only in this section


Confidence Intervals: List of Problem Solutions

Confidence intervals: theory and problems

Understanding Confidence Intervals

Let us briefly introduce the concept of a confidence interval, which
1) estimates some parameter of a numerical sample directly from the data of the sample itself,
2) covers the value of this parameter with probability γ.

Confidence interval for parameter X(with probability γ) is called an interval of the form , such that , and the values ​​are computed in some way from the sample .

Usually in applied tasks confidence level take equal to γ ​​= 0.9; 0.95; 0.99.

Consider some sample of size n, made from population, distributed presumably according to the normal distribution law . Let us show by what formulas are found confidence intervals for distribution parameters- mathematical expectation and dispersion (standard deviation).

Confidence interval for mathematical expectation

Case 1 The distribution variance is known and equal to . Then the confidence interval for the parameter a looks like:
t is determined from the Laplace distribution table by the ratio

Case 2 The distribution variance is unknown; a point estimate of the variance was calculated from the sample. Then the confidence interval for the parameter a looks like:
, where is the sample mean calculated from the sample, parameter t determined from Student's distribution table

Example. Based on the data of 7 measurements of a certain value, the average of the measurement results was found equal to 30 and the sample variance equal to 36. Find the boundaries in which the true value of the measured value is contained with a reliability of 0.99.

Decision. Let's find . Then the confidence limits for the interval containing the true value of the measured value can be found by the formula:
, where is the sample mean, is the sample variance. Plugging in all the values, we get:

Confidence interval for variance

We believe that, generally speaking, the mathematical expectation is unknown, and only a point unbiased estimate of the variance is known. Then the confidence interval looks like:
, where - distribution quantiles determined from tables.

Example. Based on the data of 7 trials, the value of the estimate for the standard deviation was found s=12. Find with a probability of 0.9 the width of the confidence interval built to estimate the variance.

Decision. The confidence interval for the unknown population variance can be found using the formula:

Substitute and get:


Then the width of the confidence interval is 465.589-71.708=393.881.

Confidence interval for probability (percentage)

Case 1 Let the sample size and sample fraction (relative frequency) be known in the problem. Then the confidence interval for the general fraction (true probability) is:
, where the parameter t is determined from the Laplace distribution table by the ratio .

Case 2 If the problem additionally knows the total size of the population from which the sample was taken, the confidence interval for the general fraction (true probability) can be found using the adjusted formula:
.

Example. It is known that Find the boundaries in which the general share is concluded with probability.

Decision. We use the formula:

Let's find the parameter from the condition , we get Substitute in the formula:


You can find other examples of problems in mathematical statistics on the page

Let a random variable (we can talk about the general population) is distributed according to the normal law, for which the variance D = 2 (> 0) is known. From the general population (on the set of objects of which a random variable is determined), a sample of size n is made. The sample x 1 , x 2 ,..., x n is considered as a set of n independent random variables distributed in the same way as (the approach explained above in the text).

Previously, the following equalities were also discussed and proved:

Mx 1 = Mx 2 = ... = Mx n = M;

Dx 1 = Dx 2 = ... = Dx n = D;

It is enough to simply prove (we omit the proof) that the random variable in this case is also distributed according to the normal law.

Let us denote the unknown value M by a and choose the number d > 0 according to the given reliability so that the following condition is satisfied:

P(- a< d) = (1)

Since the random variable is distributed according to the normal law with the mathematical expectation M = M = a and the variance D = D /n = 2 /n, we get:

P(- a< d) =P(a - d < < a + d) =

It remains to choose d such that the equality

For any one, one can find such a number t from the table that (t) \u003d / 2. This number t is sometimes called quantile.

Now from equality

define the value of d:

We obtain the final result by presenting formula (1) in the form:

The meaning of the last formula is as follows: with reliability, the confidence interval

covers the unknown parameter a = M of the population. It can be said differently: a point estimate determines the value of the parameter M with an accuracy of d= t / and reliability.

A task. Let there be a general population with some characteristic distributed according to the normal law with a dispersion equal to 6.25. A sample of volume n = 27 was made and the average sample value of the characteristic = 12 was obtained. Find the confidence interval covering the unknown mathematical expectation of the studied characteristic of the general population with reliability = 0.99.

Decision. First, using the table for the Laplace function, we find the value of t from the equation (t) \u003d / 2 \u003d 0.495. Based on the obtained value t = 2.58, we determine the accuracy of the estimate (or half the length of the confidence interval) d: d = 2.52.58 / 1.24. From here we obtain the desired confidence interval: (10.76; 13.24).

statistical hypothesis general variational

Confidence interval for mathematical expectation normal distribution with unknown variance

Let be a random variable distributed according to the normal law with an unknown mathematical expectation M, which we denote by the letter a . Let's make a sample of size n. Let us determine the average sample and corrected sample variance s 2 using known formulas.

Random value

distributed according to Student's law with n - 1 degrees of freedom.

The task is to find such a number t according to the given reliability and the number of degrees of freedom n - 1 so that the equality

or equivalent equality

Here, in parentheses, the condition is written that the value of the unknown parameter a belongs to a certain interval, which is the confidence interval. Its bounds depend on the reliability, as well as on the sampling parameters and s.

To determine the value of t by magnitude, we transform equality (2) into the form:

Now according to the table for random variable t, distributed according to Student's law, by probability 1 - and the number of degrees of freedom n - 1 we find t. Formula (3) gives the answer to the problem.

A task. In control tests of 20 electric lamps, the average duration of their operation was equal to 2000 hours with a standard deviation (calculated as the square root of the corrected sample variance) equal to 11 hours. It is known that the duration of the lamp operation is a normally distributed random variable. Determine with a reliability of 0.95 the confidence interval for the mathematical expectation of this random variable.

Decision. The value 1 - in this case is equal to 0.05. According to the Student's distribution table, with the number of degrees of freedom equal to 19, we find: t = 2.093. Let us now calculate the accuracy of the estimate: 2.093121/ = 56.6. From here we get the desired confidence interval: (1943.4; 2056.6).

In statistics, there are two types of estimates: point and interval. Point Estimation is a single sample statistic that is used to estimate a population parameter. For example, the sample mean is a point estimate of the population mean, and the sample variance S2- point estimate of the population variance σ2. it was shown that the sample mean is an unbiased estimate of the population expectation. The sample mean is called unbiased because the mean of all sample means (with the same sample size n) is equal to the mathematical expectation of the general population.

In order for the sample variance S2 became an unbiased estimator of the population variance σ2, the denominator of the sample variance should be set equal to n – 1 , but not n. In other words, the population variance is the average of all possible sample variances.

When estimating population parameters, it should be kept in mind that sample statistics such as , depend on specific samples. To take this fact into account, to obtain interval estimation the mathematical expectation of the general population analyze the distribution of sample means (for more details, see). The constructed interval is characterized by a certain confidence level, which is the probability that the true parameter of the general population is estimated correctly. Similar confidence intervals can be used to estimate the proportion of a feature R and the main distributed mass of the general population.

Download note in or format, examples in format

Construction of a confidence interval for the mathematical expectation of the general population with a known standard deviation

Building a confidence interval for the proportion of a trait in the general population

In this section, the concept of a confidence interval is extended to categorical data. This allows you to estimate the share of the trait in the general population R with a sample share RS= X/n. As mentioned, if the values nR and n(1 - p) exceed the number 5, binomial distribution can be approximated as normal. Therefore, to estimate the share of a trait in the general population R it is possible to construct an interval whose confidence level is equal to (1 - α)x100%.


where pS- sample share of the feature, equal to X/n, i.e. the number of successes divided by the sample size, R- the share of the trait in the general population, Z is the critical value of the standardized normal distribution, n- sample size.

Example 3 Suppose that a sample is extracted from the information system, consisting of 100 invoices completed within last month. Let's say that 10 of these invoices are incorrect. In this way, R= 10/100 = 0.1. The 95% confidence level corresponds to the critical value Z = 1.96.

Thus, there is a 95% chance that between 4.12% and 15.88% of invoices contain errors.

For a given sample size, the confidence interval containing the proportion of the trait in the general population seems to be wider than for a continuous random variable. This is because measurements of a continuous random variable contain more information than measurements of categorical data. In other words, categorical data that takes only two values ​​contain insufficient information to estimate the parameters of their distribution.

ATcalculation of estimates drawn from a finite population

Estimation of mathematical expectation. Correction factor for the final population ( fpc) was used to reduce standard error in time. When calculating confidence intervals for estimates of population parameters, a correction factor is applied in situations where samples are drawn without replacement. Thus, the confidence interval for the mathematical expectation, having a confidence level equal to (1 - α)x100%, is calculated by the formula:

Example 4 To illustrate the application of a correction factor for a finite population, let us return to the problem of calculating the confidence interval for the average amount of invoices discussed in Example 3 above. Suppose that a company issues 5,000 invoices per month, and =110.27 USD, S= $28.95 N = 5000, n = 100, α = 0.05, t99 = 1.9842. According to formula (6) we get:

Estimation of the share of the feature. When choosing no return, the confidence interval for the proportion of the feature that has a confidence level equal to (1 - α)x100%, is calculated by the formula:

Confidence intervals and ethical issues

When sampling a population and formulating statistical inferences, ethical problems often arise. The main one is how the confidence intervals and point estimates of sample statistics agree. Publication point estimates not specifying the appropriate confidence intervals (usually those with a 95% confidence level) and the sample size from which they are derived can be misleading. This may give the user the impression that a point estimate is exactly what he needs to predict the properties of the entire population. Thus, it is necessary to understand that in any research, not point, but interval estimates should be put at the forefront. Besides, Special attention should be given right choice sample sizes.

Most often, the objects of statistical manipulations are the results of sociological surveys of the population on various political issues. At the same time, the results of the survey are placed on the front pages of newspapers, and the sampling error and the methodology of statistical analysis are printed somewhere in the middle. To prove the validity of the obtained point estimates, it is necessary to indicate the sample size on the basis of which they were obtained, the boundaries of the confidence interval and its significance level.

Next note

Materials from the book Levin et al. Statistics for managers are used. - M.: Williams, 2004. - p. 448–462

Central limit theorem states that, given a sufficiently large sample size, the sample distribution of means can be approximated by a normal distribution. This property does not depend on the type of population distribution.

Let a sample be made from a general population subject to the law normal distribution XN( m; ). This basic assumption of mathematical statistics is based on the central limit theorem. Let the general standard deviation be known , but the mathematical expectation of the theoretical distribution is unknown m(average value ).

In this case, the sample mean , obtained during the experiment (section 3.4.2), will also be a random variable m;
). Then the "normalized" deviation
N(0;1) is a standard normal random variable.

The problem is to find an interval estimate for m. Let us construct a two-sided confidence interval for m so that the true mathematical expectation belongs to him with a given probability (reliability) .

Set such an interval for the value
means to find the maximum value of this quantity
and minimum
, which are the boundaries of the critical region:
.

Because this probability is
, then the root of this equation
can be found using the tables of the Laplace function (Table 3, Appendix 1).

Then with probability it can be argued that the random variable
, that is, the desired general mean belongs to the interval
. (3.13)

the value
(3.14)

called accuracy estimates.

Number
quantile normal distribution - can be found as an argument of the Laplace function (Table 3, Appendix 1), given the ratio 2Ф( u)=, i.e. F( u)=
.

Conversely, according to the specified deviation value it is possible to find with what probability the unknown general mean belongs to the interval
. To do this, you need to calculate

. (3.15)

Let a random sample be taken from the general population by the method of re-selection. From the equation
can be found minimum resampling volume n required to ensure that the confidence interval with a given reliability did not exceed the preset value . The required sample size is estimated using the formula:

. (3.16)

Exploring estimation accuracy
:

1) With increasing sample size n magnitude decreases, and hence the accuracy of the estimate increases.

2) C increase reliability of estimates the value of the argument is incremented u(because F(u) increases monotonically) and hence increases . In this case, the increase in reliability reduces the accuracy of its assessment .

Estimate
(3.17)

called classical(where t is a parameter that depends on and n), because it characterizes the most frequently encountered distribution laws.

3.5.3 Confidence intervals for estimating the expectation of a normal distribution with an unknown standard deviation 

Let it be known that the general population is subject to the law of normal distribution XN( m;), where the value root mean square deviations unknown.

To build a confidence interval for estimating the general mean, in this case, statistics are used
, which has a Student's distribution with k= n–1 degrees of freedom. This follows from the fact that N(0;1) (see item 3.5.2), and
(see clause 3.5.3) and from the definition of Student's distribution (part 1.clause 2.11.2).

Let us find the accuracy of the classical estimate of Student's distribution: i.e. find t from formula (3.17). Let the probability of fulfilling the inequality
given by reliability :

. (3.18)

Because the TSt( n-1), it is obvious that t depends on and n, so we usually write
.

(3.19)

where
is Student's distribution function with n-1 degrees of freedom.

Solving this equation for m, we get the interval
which with reliability  covers the unknown parameter m.

Value t , n-1 , used to determine the confidence interval of a random variable T(n-1), distributed by Student with n-1 degrees of freedom is called Student's coefficient. It should be found by given values n and  from the tables "Critical points of Student's distribution". (Table 6, Appendix 1), which are the solutions of equation (3.19).

As a result, we get the following expression accuracy confidence interval for estimating the mathematical expectation (general mean), if the variance is unknown:

(3.20)

Thus, there is a general formula for constructing confidence intervals for the mathematical expectation of the general population:

where is the accuracy of the confidence interval depending on the known or unknown variance is found according to the formulas respectively 3.16. and 3.20.

Task 10. Some tests were carried out, the results of which are listed in the table:

x i

It is known that they obey the normal distribution law with
. Find an estimate m* for mathematical expectation m, build a 90% confidence interval for it.

Decision:

So, m(2.53;5.47).

Task 11. The depth of the sea is measured by an instrument whose systematic error is 0, and random errors are distributed according to the normal law, with a standard deviation =15m. How many independent measurements should be made to determine the depth with errors of no more than 5 m with a confidence level of 90%?

Decision:

By the condition of the problem, we have XN( m; ), where =15m, =5m, =0.9. Let's find the volume n.

1) With a given reliability = 0.9, we find from tables 3 (Appendix 1) the argument of the Laplace function u = 1.65.

2) Knowing the given estimation accuracy =u=5, find
. We have

. Therefore, the number of trials n25.

Task 12. Temperature sampling t for the first 6 days of January is presented in the table:

Find Confidence Interval for Expectation m general population with confidence probability
and assess the general standard deviation s.

Decision:


and
.

2) Unbiased estimate find by formula
:

=-175

=234.84

;
;

=-192

=116


.

3) Since the general variance is unknown, but its estimate is known, then to estimate the mathematical expectation m we use Student's distribution (Table 6, Annex 1) and formula (3.20).

Because n 1 =n 2 =6, then ,
, s 1 =6.85 we have:
, hence -29.2-4.1<m 1 < -29.2+4.1.

Therefore -33.3<m 1 <-25.1.

Similarly, we have
, s 2 = 4.8, so

–34.9< m 2 < -29.1. Тогда доверительные интервалы примут вид: m 1 (-33.3;-25.1) and m 2 (-34.9;-29.1).

In applied sciences, for example, in construction disciplines, tables of confidence intervals are used to assess the accuracy of objects, which are given in the relevant reference literature.

Often the appraiser has to analyze the real estate market of the segment in which the appraisal object is located. If the market is developed, it can be difficult to analyze the entire set of presented objects, therefore, a sample of objects is used for analysis. This sample is not always homogeneous, sometimes it is required to clear it of extremes - too high or too low market offers. For this purpose, it is applied confidence interval. The purpose of this study is to conduct a comparative analysis of two methods for calculating the confidence interval and choose the best calculation option when working with different samples in the estimatica.pro system.

Confidence interval - calculated on the basis of the sample, the interval of values ​​of the characteristic, which with a known probability contains the estimated parameter of the general population.

The meaning of calculating the confidence interval is to build such an interval based on the sample data so that it can be asserted with a given probability that the value of the estimated parameter is in this interval. In other words, the confidence interval with a certain probability contains the unknown value of the estimated quantity. The wider the interval, the higher the inaccuracy.

There are different methods for determining the confidence interval. In this article, we will consider 2 ways:

  • through the median and standard deviation;
  • through the critical value of the t-statistic (Student's coefficient).

Stages of a comparative analysis of different methods for calculating CI:

1. form a data sample;

2. we process it with statistical methods: we calculate the mean value, median, variance, etc.;

3. we calculate the confidence interval in two ways;

4. Analyze the cleaned samples and the obtained confidence intervals.

Stage 1. Data sampling

The sample was formed using the estimatica.pro system. The sample included 91 offers for the sale of 1-room apartments in the 3rd price zone with the type of planning "Khrushchev".

Table 1. Initial sample

The price of 1 sq.m., c.u.

Fig.1. Initial sample



Stage 2. Processing of the initial sample

Sample processing by statistical methods requires the calculation of the following values:

1. Arithmetic mean

2. Median - a number that characterizes the sample: exactly half of the sample elements are greater than the median, the other half is less than the median

(for a sample with an odd number of values)

3. Range - the difference between the maximum and minimum values ​​in the sample

4. Variance - used to more accurately estimate the variation in data

5. The standard deviation for the sample (hereinafter referred to as RMS) is the most common indicator of the dispersion of adjustment values ​​around the arithmetic mean.

6. Coefficient of variation - reflects the degree of dispersion of adjustment values

7. oscillation coefficient - reflects the relative fluctuation of the extreme values ​​of prices in the sample around the average

Table 2. Statistical indicators of the original sample

The coefficient of variation, which characterizes the homogeneity of the data, is 12.29%, but the coefficient of oscillation is too large. Thus, we can state that the original sample is not homogeneous, so let's move on to calculating the confidence interval.

Stage 3. Calculation of the confidence interval

Method 1. Calculation through the median and standard deviation.

The confidence interval is determined as follows: the minimum value - the standard deviation is subtracted from the median; the maximum value - the standard deviation is added to the median.

Thus, the confidence interval (47179 CU; 60689 CU)

Rice. 2. Values ​​within confidence interval 1.



Method 2. Building a confidence interval through the critical value of t-statistics (Student's coefficient)

S.V. Gribovsky in the book "Mathematical methods for assessing the value of property" describes a method for calculating the confidence interval through the Student's coefficient. When calculating by this method, the estimator himself must set the significance level ∝, which determines the probability with which the confidence interval will be built. Significance levels of 0.1 are commonly used; 0.05 and 0.01. They correspond to confidence probabilities of 0.9; 0.95 and 0.99. With this method, the true values ​​of the mathematical expectation and variance are considered to be practically unknown (which is almost always true when solving practical evaluation problems).

Confidence interval formula:

n - sample size;

The critical value of t-statistics (Student's distributions) with a significance level ∝, the number of degrees of freedom n-1, which is determined by special statistical tables or using MS Excel (→"Statistical"→ STUDRASPOBR);

∝ - significance level, we take ∝=0.01.

Rice. 2. Values ​​within the confidence interval 2.

Step 4. Analysis of different ways to calculate the confidence interval

Two methods of calculating the confidence interval - through the median and Student's coefficient - led to different values ​​of the intervals. Accordingly, two different purified samples were obtained.

Table 3. Statistical indicators for three samples.

Index

Initial sample

1 option

Option 2

Average value

Dispersion

Coef. variations

Coef. oscillations

Number of retired objects, pcs.

Based on the calculations performed, we can say that the values ​​of the confidence intervals obtained by different methods intersect, so you can use any of the calculation methods at the discretion of the appraiser.

However, we believe that when working in the estimatica.pro system, it is advisable to choose a method for calculating the confidence interval, depending on the degree of market development:

  • if the market is not developed, apply the method of calculation through the median and standard deviation, since the number of retired objects in this case is small;
  • if the market is developed, apply the calculation through the critical value of t-statistics (Student's coefficient), since it is possible to form a large initial sample.

In preparing the article were used:

1. Gribovsky S.V., Sivets S.A., Levykina I.A. Mathematical methods for assessing the value of property. Moscow, 2014

2. Data from the estimatica.pro system



error: Content is protected!!