Mean square standard error of sample explanation for. Average sampling error formulas

Concept and calculation of sampling error.

The task of sample observation is to give correct ideas about the summary indicators of the entire population on the basis of some part of them subjected to observation. Possible deviation of the sample share and sample average from the share and average in population called sampling error or representativeness error. The larger the magnitude of this error, the more the sample observation indicators differ from the general population indicators.

They differ:

Sampling errors;

Registration errors.

Registration errors arise when a fact is incorrectly established during the observation process. They are characteristic of both continuous observation and selective observation, but in selective observation there are fewer of them.

By nature, errors are:

Tendentious – deliberate, i.e. either the best or worst units in the population were selected. In this case, observations lose meaning;

Random – the basic organizational principle of sampling observation is to avoid deliberate selection, i.e. ensure strict adherence to the principle of random selection.

The general rule of random selection is: individual units of the general population must have exactly the same conditions and opportunities to fall into the number of units included in the sample. This characterizes the independence of the sampling result from the will of the observer. The will of the observer gives rise to tendentious errors. Sampling error in random sampling is random. It characterizes the size of deviations of general characteristics from sample characteristics.

Due to the fact that the characteristics in the population under study vary, the composition of the units included in the sample may not coincide with the composition of the units of the entire population. It means that R and do not coincide with W And . The possible discrepancy between these characteristics is determined by the sampling error, which is determined by the formula:

where is the general variance.

where is the sample variance.

This shows where the general variance differs from the sample variance by a factor.

There is repeated and non-repetitive selection. The essence of repeated selection is that each unit included in the sample, after observation, returns to the general population and can be re-examined. When resampling, the average sampling error is calculated:

For the indicator of the share of an alternative characteristic, the sample variance is determined by the formula:

In practice, repeated selection is rarely used. With non-repetitive selection, the size of the general population N is reduced during sampling, the formula for the average sampling error for a quantitative characteristic has the form:



, Then

One of the possible values ​​in which the share of the studied characteristic may be equal to:

where is the sampling error of the alternative attribute.

Example.

When sampling 10% of the products in a batch finished products Using the method without repeated sampling, the following data on the moisture content in the samples were obtained.

Determine the average % humidity, dispersion, standard deviation, with a probability of 0.954 possible limits within which avg is expected. % moisture content of all finished products, with a probability of 0.987 possible limits of the specific gravity of standard products, provided that the non-standard batch includes products with a moisture content of up to 13 and above 19%.

Only with a certain probability can we say that the general share from the sample share and the general average from the sample mean deviate by t once.

In statistics these deviations are called maximum sampling errors and are designated .

The probability of judgments can be increased or decreased according to t once. At a probability of 0.683, at 0.954, at 0.987, then the indicators of the general population are determined from the indicators of the sample.

The discrepancy between the values ​​of indicators obtained from the sample and the corresponding parameters of the general population is called representativeness error. There are systematic and random sampling errors.

Random errors are explained by insufficiently uniform representation of various categories of units in the general population in the sample.

Systematic errors may be associated with violation of selection rules or sampling conditions.

Thus, when surveying household budgets, the sample population for more than 40 years was built on the basis of the territorial-sectoral selection principle, which was due to the main purpose of the budget survey - to characterize the standard of living of workers, employees and collective farmers. The sample population was distributed among regions and sectors of the economy of the RSFSR in proportion to the total number of employees; To create an industry sample, a typical sample with mechanical selection of units within groups was used.

The main selection criterion was the average monthly salary. The selection principle ensured proportional representation in the sample population of workers with different salary levels.

With the advent of new social groups(entrepreneurs, farmers, unemployed), the representativeness of the sample was violated not only due to differences with the structure of the general population, but also due to systematic error that arose due to the discrepancy between the selection unit (employee) and the observation unit (household). A household with more than one working member was also more likely to be selected than a household with one working member. Families not employed in the surveyed industries were excluded from the range of selected units (households of pensioners, households subsisting on individual income). labor activity, and so on.). Assessing the accuracy of the results obtained (limits of confidence intervals, sampling errors) was difficult, since probabilistic models were not used when constructing the sample.

In 1996–1997 A fundamentally new approach to sampling households was introduced. The data from the 1994 microcensus were used as the basis for its implementation. The general population for the selection included all types of households, with the exception of collective ones. And the sample population began to be organized taking into account the representativeness of the composition and types of households within each subject of the Russian Federation.

Measuring errors in the representativeness of sample indicators is based on the assumption of the random nature of their distribution with an infinitely large number of samples.

A quantitative assessment of the reliability of a sample indicator is used to get an idea of ​​the general characteristic. This is done either on the basis of a sample indicator, taking into account its random error, or on the basis of putting forward some hypothesis (about the value average variance, nature of distribution, connection) in relation to the properties of the general population.

To test a hypothesis, the consistency of empirical data with hypothetical data is assessed.

The magnitude of the random representativeness error depends on:

  • 1) on the sample size;
  • 2) the degree of variation of the characteristic being studied in the general population;
  • 3) the accepted method of forming a sample population.

There are average (standard) and maximum sampling errors.

Average error characterizes the measure of deviations of sample indicators from similar indicators of the general population.

Ultimate error It is generally accepted to consider the maximum possible discrepancy between sample and general characteristics, i.e. maximum error for a given probability of its occurrence.

Based on the data from the sample population, it is possible to estimate various indicators (parameters) of the general population. The most commonly used assessment is:

  • – the general average value of the characteristic being studied (for a multi-valued quantitative characteristic);
  • – general share (for an alternative characteristic).

The basic principle of using the sampling method is to ensure an equal opportunity for all units in the population to be selected in the sample population. With this approach, the requirement of random, objective selection is met and, therefore, the sampling error is determined primarily by its volume ( P ). As the latter increases, the average error decreases, and the characteristics of the sample population approach those of the general population.

Given the same number of sample populations and other equal conditions, the sampling error will be less in the one that is selected from the general population with less variation in the characteristic being studied. Reducing the variation of a characteristic means reducing the amount of dispersion (for a quantitative characteristic or for an alternative characteristic).

The dependence of the magnitude of the sampling error on the methods of forming the sample population is determined using the formulas for the average sampling error (Table 5.2).

Let's add the indicators in the table. 5.2 with the following explanations.

The sample variance is somewhat less than the general variance; it has been proven in mathematical statistics that

Table 5.2

Formulas for calculating the average sampling error in various ways selection

Sample type

repeat for

repeatable for

Actually

random

(simple)

Serial

(with equal dimensions

Typical (proportional to group size)

If the sample population is large (i.e. P is sufficiently large), then the ratio approaches unity and the sample variance practically coincides with the general variance.

The sample is considered unconditionally large when p> 100 and certainly small at P < 30. При оценке результатов малой выборки указанное соотношение выборочной и генеральной дисперсии следует принимать во внимание.

They can be calculated using the following formulas:

where is the average i -th series; – overall average for the entire sample population;

where is the share of units of a certain category in i -th series; – the proportion of units of this category in the entire sample population; r – number of selected episodes.

4. To determine the average error of a typical sample in the case of selecting units in proportion to the size of each group, the average of within-group variances(– for a quantitative characteristic, for an alternative characteristic). According to the rule for adding variances, the value of the average of intragroup variances is less than the value total variance. The average possible error of a typical sample is less than the error of a simple random sample.

Combined selection is often used: individual selection of units is combined with group selection, typical selection is combined with selection in series. With any selection method, with a certain probability it can be stated that the deviation of the sample average (or share) from the general average (or share) will not exceed a certain value, which is called extreme error samples.

The relationship between the sampling error limit (∆) guaranteed with some probability F(t), and the average sampling error has the form: or , where t – confidence factor, determined depending on the level of probability F(t).

Function values F(t) And t are determined on the basis of specially compiled mathematical tables. Here are some of them that are used most often:

T

Thus, the marginal sampling error answers the question about the accuracy of the sample with a certain probability, the value of which depends on the value of the confidence coefficient t. Yes, when t = 1 probability F(t ) deviation of sample characteristics from the general ones by the value of a single average error is 0.683. Consequently, on average, out of every 1000 samples, 683 will give generalizing indicators (average, share), which will differ from the general ones by no more than a single average error. At t = 2 probability F(t) is equal to 0.954, this means that out of every 1000 samples, 954 will give generalizing indicators that will differ from the general indicators by no more than two times the average sampling error, etc.

Along with the absolute value of the maximum sampling error, the relative error, which is defined as the percentage ratio of the marginal sampling error to the corresponding characteristic of the sample population:

In practice, it is customary to set the value of ∆, usually within 10% of the expected average level of the attribute.

Calculation of the average and maximum sampling errors allows us to determine the limits within which the characteristics of the general population will lie:

The limits within which the unknown value of the studied indicator in the general population will be contained with a given degree of probability are called confidence interval, and the probability F(t) confidence probability. The higher the ∆ value, the larger the confidence interval and, therefore, the lower the accuracy of the estimate.

Consider the following example. To determine the average size of a deposit in a bank, 200 foreign currency accounts of depositors were selected using repeated random sampling. As a result, it was established that the average deposit size was 60 thousand rubles, the variance was 32. At the same time, 40 accounts were on demand. It is necessary, with a probability of 0.954, to determine the limits within which the average size of deposits in foreign currency accounts in a bank and the share of demand accounts are located.

Let's calculate the average error of the sample mean using the formula for repeated selection

The maximum error of the sample mean with a probability of 0.954 will be

Consequently, the average size of a deposit in foreign currency accounts at a bank is within thousand rubles:

With a probability of 0.954, it can be stated that the average deposit in foreign currency accounts in a bank ranges from 59,200 to 60,800 rubles.

Let us determine the share of demand deposits in the sample population:

Average sample fraction error

The marginal error of the share with a probability of 0.954 will be

Thus, the share of demand accounts in the population is within the range w :

With a probability of 0.954, it can be stated that the share of demand accounts in the total number of foreign currency accounts in the bank ranges from 14.4 to 25.6%.

In specific studies, it is important to establish the optimal relationship between the measure of reliability of the results obtained and the amount of permissible sampling error. In this regard, when organizing sample observation, the question arises related to determining the sample size necessary to obtain the required accuracy of results with a given probability. The calculation of the required sample size is carried out on the basis of formulas for the maximum sampling error in accordance with the type and method of selection (Table 5.3).

Table 5.3

Formulas for calculating the sample size using a purely random sampling method

Let's continue with the example, which presents the results of a sample survey of personal accounts of bank depositors.

It is necessary to establish how many accounts need to be examined so that with a probability of 0.977 the error in determining the average deposit size does not exceed 1.5 thousand rubles. Let us express the sample size indicator from the formula for the maximum sampling error for repeated selection:

When determining the required sample size using the above formulas, difficulty arises in finding the values ​​of σ2 and yes, since these values ​​can only be obtained after conducting a sample survey. In this regard, instead of the actual values ​​of these indicators, approximate ones are substituted, which could be determined on the basis of any trial sample observations or from analytical previous surveys.

In cases where the statistician knows the average value of the characteristics being studied (for example, from instructions, legislation, etc.) or the limits within which this characteristic varies, the following calculation can be applied using approximate formulas:

and replace the product w(1 – w) with the value 0.25 (w = 0.5).

To get a more accurate result, take the maximum possible value of these indicators. If the distribution of a characteristic in the general population obeys the normal law, then the range of variation is approximately equal to 6σ (extreme values ​​are spaced in either direction from the average at a distance of 3σ). Hence , but if the distribution is obviously asymmetrical, then .

For any type of sample, its volume begins to be calculated using the repeated selection formula

If, as a result of the calculation, the selection share ( n ) exceeds 5%, then the calculation is carried out using the non-repetitive selection formula.

For a typical sample, it is necessary to divide the total sample size between the selected types of units. The calculation of the number of observations from each group depends on the previously mentioned organizational forms of a typical sample.

With a typical selection of units disproportionate to the number of groups, the total number of selected units is divided by the number of groups, the resulting value gives the number of selection from each typical group:

Where k – number of identified typical groups.

When selecting units in proportion to the number of typical groups, the number of observations for each group is determined by the formula

where is the sample size from i th group; - volume i th group.

When selecting for variation in a trait, the percentage of the sample from each group should be proportional to the standard deviation in this group (). Calculation of the number () is carried out according to the formulas

With serial selection, the required number of selected series is determined in the same way as with proper random selection:

Re-selection

Non-repetitive selection

In this case, variances and sampling errors can be calculated for the average value or proportion of the characteristic.

When using sample observation, characterization of its results is possible based on a comparison of the obtained error limits of sample indicators with the value of the permissible error.

In this regard, the task arises of determining the probability that the sampling error will not exceed the permissible error. The solution to this problem comes down to calculating, based on the formula for the maximum sampling error, the value t.

Continuing to consider the example of a sample survey of personal accounts of bank clients, we will find the probability with which it can be stated that the error in determining the average deposit size will not exceed 785 rubles:

appropriate confidence probability will be 0.95.

Currently, the practice of sample observation includes statistical observations carried out:

  • – Rosstat bodies;
  • – other ministries and departments (for example, monitoring of enterprises in the Bank of Russia system).

A well-known generalization of experience in organizing sample surveys of small enterprises, the population and households is presented in the Methodological Provisions on Statistics. They contain more broad concept selective observation than discussed above (Table 5.4).

In statistical practice, all four types of samples presented in table are used. 5.4. However, they usually give preference to the probability (random) samples described above, which are the most objective, since they can be used to assess the accuracy of the results obtained from the data of the sample itself.

Table 5.4

Sample types

In samples quasi-random type Probability sampling is assumed to exist on the basis that the sampler considers it acceptable. An example of the use of quasi-random sampling in statistical practice is the “Sample survey of small enterprises to study social processes in small businesses,” conducted in 1996 in some regions of Russia. Observation units (small enterprises) were selected by experts, taking into account the representation of economic sectors from an already formed sample of a survey of the financial and economic activities of small enterprises (form “Information on the main indicators of the financial and economic activities of a small enterprise”). When summarizing sample data, it was assumed that the sample population was formed using the method of simple random selection.

Direct use of expert judgment is the most common method of intentionally including units in a sample. An example of such a selection method is the monographic method, which involves obtaining information from only one observation unit, which is typical, in the opinion of the survey organizer - an expert.

Samples formed on the basis directed selection, are implemented using an objective procedure, but without using a probabilistic mechanism. The main array method is widely known, in which the sample includes the largest (significant) units of observation that provide the main contribution to the indicator, for example, the total value of a characteristic that represents the main purpose of the survey.

In statistical practice it is often used combined method of statistical observation. The combination of continuous and selective observation methods has two aspects:

  • alternation in time;
  • their simultaneous use (part of the population is observed on a continuous basis, and part is observed selectively).

Alternation periodic samples with relatively rare continuous surveys or censuses are necessary to clarify the composition of the population under study. In the future, this information is used as a statistical basis for sample observation. Examples include population censuses and interim household sample surveys.

In this case, the following tasks need to be solved:

  • – determination of the composition of signs of continuous observation that ensure the organization of the sample;
  • – justification for periods of alternation, i.e. when continuous data loses its relevance and costs are required to update it.

Simultaneous use within the framework of one survey of continuous and sample observations is due to the heterogeneity of populations encountered in statistical practice. This is especially true for surveys economic activity a set of enterprises characterized by skewed distributions of the studied characteristics, when a certain number of units have characteristics that are very different from the bulk of the values. In this case, such units are observed on a continuous basis, and the other part of the population is observed selectively.

With this organization of observations, the main tasks are:

  • – establishing their optimal proportion;
  • – development of methods for assessing the accuracy of results.

A typical example illustrating this aspect of the application of the combined method is general principle conducting surveys of a population of enterprises, according to which surveys of a population of large and medium-sized enterprises are carried out primarily using a continuous method, and small ones - using a sample method.

Further development of the sampling observation methodology is carried out both in combination with the organization of continuous observation, and through the organization of special surveys, the conduct of which is dictated by the need to obtain additional information to solve specific problems. Thus, the organization of surveys in the field of living conditions and living standards of the population is provided in two aspects:

  • – required components;
  • – additional modules within integrated system indicators.

Mandatory components may include annual surveys of income, expenditure and consumption (analogous to a survey of household budgets), which also include basic indicators of the living conditions of the population. Every year, according to a special plan, the mandatory components must be supplemented by one-time surveys (modules) of the living conditions of the population, aimed at in-depth study of any selected social topic from their total number (for example, household assets, health, nutrition, education, working conditions, housing conditions, leisure, social mobility, security, etc.) with varying frequency, determined by the need for indicators and resource capabilities.

During selective observation, it must be ensured accident selection of units. Each unit must have an equal chance of being selected. This is what a random sample is based on.

TO actual random sample refers to the selection of units from the entire population (without first dividing it into any groups) by drawing lots (mainly) or some other similar method, for example, using a table of random numbers. Random selection- this selection is not random. The principle of randomness assumes that the inclusion or exclusion of an object from the sample cannot be influenced by any factor other than chance. Example actually random winning draws can serve as selection: from the total number of issued tickets, a certain part of the numbers for which the winnings occur is selected at random. Moreover, all numbers are provided with an equal opportunity to be included in the sample. In this case, the number of units selected in the sample population is usually determined based on the accepted sample proportion.

Sample share is the ratio of the number of units in the sample population to the number of units in the general population:

So, with a 5% sample from a batch of parts of 1000 units. sample size P is 50 units, and with a 10% sample - 100 units. etc. With the correct scientific organization of sampling, errors in representativeness can be reduced to minimal values, as a result, sample observation becomes quite accurate.

Proper random selection “in its pure form” is rarely used in the practice of selective observation, but it is the initial one among all other types of selection; it contains and implements the basic principles of selective observation.

Let's consider some questions of the theory of the sampling method and the error formula for a simple random sample.

When using the sampling method in statistics, two main types of general indicators are usually used: average value of a quantitative characteristic And relative value of the alternative characteristic(share or specific gravity units in a statistical population that differ from all other units of this population only by the presence of the characteristic being studied).

Selective share (w), or frequency, is determined by the ratio of the number of units possessing the characteristic being studied T, To total number sample units P:

For example, if out of 100 sample details ( n=100), 95 parts turned out to be standard (T=95), then the sample fraction

w=95/100=0,95 .

To characterize the reliability of sample indicators, there are average And maximum sampling error.

Sampling error ? or, in other words, the representativeness error is the difference between the corresponding sample and general characteristics:

*

*

Sampling error is characteristic only of sample observations. The greater the value of this error, the more the sample indicators differ from the corresponding general indicators.

Sample mean and sample share are inherently random variables, which can take on different values ​​depending on which units of the population are included in the sample. Therefore, sampling errors are also random variables and can take different meanings. Therefore, the average of possible errors is determined - the average sampling error.

What does it depend on average sampling error? If the principle of random selection is observed, the average sampling error is determined first of all sample size: the greater the number, other things being equal, the smaller the average sampling error. By covering an increasing number of units of the general population with a sample survey, we characterize the entire general population more and more accurately.

The average sampling error also depends on degree of variation the trait being studied. The degree of variation, as is known, is characterized by dispersion? 2 or w(1-w)-- for an alternative sign. The smaller the variation of the characteristic, and therefore the dispersion, the smaller the average sampling error, and vice versa. With zero dispersion (the characteristic does not vary), the average sampling error is zero, i.e., any unit in the general population will accurately characterize the entire population according to this characteristic.

The dependence of the average sampling error on its volume and the degree of variation of the attribute is reflected in formulas that can be used to calculate the average sampling error under conditions of selective observation, when the general characteristics ( x,p) are unknown, and therefore, it does not seem possible to find the real sampling error directly using formulas (Form. 1), (Form. 2).

Sh With random re-sampling average errors theoretically calculated using the following formulas:

* for the average quantitative characteristic

* for a share (alternative attribute)

Since practically the variance of a trait in the population? 2 is not known exactly, in practice they use the value of the dispersion S2, calculated for the sample population on the basis of the law of large numbers, according to which the sample population, with a sufficiently large sample size, quite accurately reproduces the characteristics of the general population.

Thus, calculation formulas average sampling errors with random re-selection, the following will be:

* for the average quantitative characteristic

* for a share (alternative attribute)

However, the dispersion of the sample population is not equal to the dispersion of the general population, and therefore, the average sampling errors calculated using formulas (Form. 5) and (Form. 6) will be approximate. But in probability theory it has been proven that the general dispersion is expressed through the selective dispersion by the following relation:

Because P/(n-1) for sufficiently large P -- value is close to unity, then we can assume that, and therefore, in practical calculations of average sampling errors, formulas (Form. 5) and (Form. 6) can be used. And only in cases of a small sample (when the sample size does not exceed 30) is it necessary to take into account the coefficient P/(n-1) and calculate small sample average error according to the formula:

W X With random non-repetitive selection In the above formulas for calculating average sampling errors, it is necessary to multiply the radical expression by 1-(n/N), since in the process of non-repetitive sampling the number of units in the general population is reduced. Therefore, for non-repetitive sampling calculation formulas average sampling error will take the following form:

* for the average quantitative characteristic

* for a share (alternative attribute)

. (form. 10)

Because P always less N, then the additional factor 1-( n/N) will always be less than one. It follows that the average error during non-repetitive selection will always be less than during repeated selection. At the same time, with a relatively small percentage of the sample, this multiplier is close to unity (for example, with a 5% sample it is equal to 0.95; with a 2% sample it is 0.98, etc.). Therefore, sometimes in practice they use formulas (Form. 5) and (Form. 6) without the specified multiplier to determine the average sampling error, although the sample is organized as non-repetitive. This occurs in cases where the number of units in the population N is unknown or unlimited, or when P very little compared to N, and in essence, the introduction of an additional multiplier, close in value to unity, will have virtually no effect on the value of the average sampling error.

Mechanical sampling is that the selection of units into a sample population from the general population, divided according to a neutral criterion into equal intervals(groups), is carried out in such a way that from each such group only one unit is selected for the sample. To avoid bias, the unit that is in the middle of each group should be selected.

When organizing mechanical selection, the units of the population are preliminarily arranged (usually in a list) in a certain order (for example, by alphabet, location, in ascending or descending order of the values ​​of some indicator not related to the property being studied, etc.). etc.), after which a given number of units is selected mechanically, at a certain interval. In this case, the size of the interval in the population is equal to the inverse value of the sample proportion. So, with a 2% sample, every 50th unit is selected and checked (1: 0.02), with a 5% sample - every 20th unit (1: 0.05), for example, convergent part from the machine.

With a sufficiently large population, mechanical selection is close to pure random selection in terms of the accuracy of the results. Therefore, to determine the average error of mechanical sampling, the formulas for proper random non-repetitive sampling are used (Form. 9), (Form. 10).

To select units from a heterogeneous population, the so-called typical sample , which is used in cases where all units of the general population can be divided into several qualitatively homogeneous, similar groups according to characteristics that influence the indicators being studied.

When surveying enterprises, such groups can be, for example, industry and sub-industry, forms of ownership. Then, from each typical group, a purely random or mechanical sample is used to individually select units into the sample population.

Typical sampling is usually used when studying complex statistical populations. For example, during a sample survey of family budgets of workers and employees in certain sectors of the economy, the labor productivity of enterprise workers, represented by separate groups by qualification.

A typical sample gives more accurate results compared to other methods of selecting units in the sample population. Typing the general population ensures the representativeness of such a sample, the representation of each typological group in it, which makes it possible to eliminate the influence intergroup variance by the average sampling error.

When determining average error of a typical sample acts as an indicator of variation the average of the within-group variances.

Average sampling error found using the formulas:

* for the average quantitative characteristic

(re-selection); (form. 11)

(irreversible selection); (form. 12)

* for a share (alternative attribute)

(re-selection); (form.13)

(non-repetitive selection), (form. 14)

where is the average of the intragroup variances for the sample population;

The average of the within-group variances of the proportion (of an alternative characteristic) for the sample population.

Serial sampling involves random selection from the general population not of individual units, but of their equal groups (nests, series) in order to subject all units in such groups to observation without exception.

The use of serial sampling is due to the fact that many goods for their transportation, storage and sale are packaged in bundles, boxes, etc. Therefore, when monitoring the quality of packaged goods, it is more rational to check several packages (series) than to select the required amount of product from all packages.

Since within groups (series) all units without exception are examined, the average sampling error (when selecting equal series) depends only on the intergroup (interseries) dispersion.

Sh Average sampling error for the average quantitative trait during serial selection they are found using the formulas:

(re-selection); (form.15)

(non-repetitive selection), (form. 16)

Where r- number of selected episodes; R- total number of episodes.

The between-group variance of a serial sample is calculated as follows:

where is the average i- th series; - the overall average for the entire sample population.

Sh Average sampling error for share (alternative attribute) in serial selection:

(re-selection); (form. 17)

(non-repetitive selection). (form. 18)

Intergroup(inter-series) variance of the serial sample share determined by the formula:

, (form. 19)

where is the share of the characteristic in i-th series; - the total share of the characteristic in the entire sample population.

In the practice of statistical surveys, in addition to the previously discussed selection methods, a combination of them is used (combined selection).

Selective observation

The concept of sample observation

The sampling method is used when the use of continuous observation is physically impossible due to the huge amount of data or is not economically feasible. Physical impossibility occurs, for example, when studying passenger flows, market prices, and family budgets. Economic inexpediency occurs when assessing the quality of goods associated with their destruction. For example, tasting, testing bricks for strength, etc. Sample observation is also used to verify the results of continuous observation.

The statistical units selected for observation are selective totality or sample, and the entire array - general totality (GS). In this case, the number of units in the sample is denoted by P, throughout the entire HS - N. Attitude n/N called relative size or sample share.

The quality of sample observation results depends on representativeness samples, i.e. on how representative it is in the GC. To ensure the representativeness of the sample, it is necessary to observe the principle of random selection of units, which assumes that the inclusion of a HS unit in the sample cannot be influenced by any other factor other than chance.

Sampling methods

1. Actually random selection: all GS units are numbered, and the numbers drawn as a result of the draw correspond to the units included in the sample, and the number of numbers is equal to the planned sample size. In practice, random number generators are used instead of drawing lots. This method selection may be repeated(when each unit selected for the sample returns to the HS after observation and can be surveyed again) and unrepeatable(when surveyed units are not returned to the HS and cannot be surveyed again). With repeated selection, the probability of getting into the sample for each unit of the GS remains unchanged, and with repeated selection it changes (increases), but for the few units remaining in the GS after selecting from it, the probability of getting into the sample is the same.



2. Mechanical selection: units of the population are selected with a constant step N/a. So, if the general population contains 100 thousand units, and you need to select 1 thousand units, then every hundredth unit will be included in the sample.

3. Stratified(stratified) selection is carried out from a heterogeneous general population, when it is first divided into homogeneous groups, after which units from each group are selected into the sample population randomly or mechanically in proportion to their number in the general population.

4. Serial(cluster) selection: not individual units, but certain series (nests) are selected randomly or mechanically, within which continuous observation is carried out.

Average sampling error

After completing the selection of the required number of units in the sample and recording the studied characteristics of these units provided for by the observation program, we proceed to the calculation of generalizing indicators. These include the average value of the characteristic being studied and the proportion of units that have any value for this characteristic. However, if the GS makes several samples, having determined their general characteristics, then it can be established that their values ​​will be different, in addition, they will differ from their real value in the GS, if this is determined using continuous observation. In other words, the generalizing characteristics calculated from the sample data will differ from their real values ​​in the GS, so we introduce the following symbols (Table 8).

Table 8. Legend

The difference between the value of the generalizing characteristics of the sample and general populations is called sampling error, which is divided into error registration and error representativeness. The first arises due to incorrect or inaccurate information due to a lack of understanding of the essence of the issue, the inattention of the registrar when filling out questionnaires, forms, etc. It is quite easy to detect and eliminate. The second arises from non-compliance with the principle of random selection of units in the sample. It is more difficult to detect and eliminate, it is much larger than the first and therefore its measurement is the main task of selective observation.

To measure the sampling error, its average error is determined using formula (39) for repeated sampling and formula (40) for non-repetitive sampling:

= ;(39) = . (40)

From formulas (39) and (40) it is clear that the average error is smaller for non-repetitive sampling, which determines its wider use.

Based on the values ​​of characteristics of units in the sample population registered in accordance with the statistical observation program, generalized sample characteristics are calculated: sample mean() And sample share units possessing any characteristic of interest to researchers, in their total number ( w).

The difference between the indicators of the sample and the general population is called sampling error.

Sampling errors, like errors in any other type of statistical observation, are divided into registration errors and representativeness errors. The main objective of the sampling method is to study and measure random errors of representativeness.

The sample mean and sample proportion are random variables that can take on different values ​​depending on which population units are included in the sample. Therefore, sampling errors are also are random variables and can take on different meanings. Therefore, the average of possible errors is determined.

Average sampling error (µ - mu) is equal to:

for average ; for share ,

Where R- the share of a certain characteristic in the general population.

In these formulas σ x 2 And R(1-R) are characteristics of the general population that are unknown during sample observation. In practice, they are replaced by similar characteristics of the sample population on the basis of the law of large numbers, according to which the sample population, with a sufficiently large volume, quite accurately reproduces the characteristics of the general population. Methods for calculating average sampling errors for the average and for the share during repeated and non-repetitive sampling are given in Table. 6.1.

Table 6.1.

Formulas for calculating the average sampling error for the mean and for the share

The value is always less than one, so the average sampling error with non-repetitive sampling is less than with repeated sampling. In cases where the sample share is insignificant and the multiplier is close to unity, the correction can be neglected.

To assert that the general average value indicator or the general share will not go beyond the average sampling error only with a certain degree of probability. Therefore, to characterize the sampling error, in addition to the average error, calculate marginal sampling error(Δ), which is associated with the level of probability that guarantees it.

Probability level ( R) determines the value of the normalized deviation ( t), and vice versa. Values t are given in tables normal distribution probabilities. Most frequently used combinations t And R are given in table. 6.2.


Table 6.2

Normalized deviation values t at corresponding values ​​of probability levels R

t 1,0 1,5 2,0 2,5 3,0 3,5
R 0,683 0,866 0,954 0,988 0,997 0,999

t- confidence factor, depending on the probability with which it can be guaranteed that the maximum error will not exceed t- multiple average error. It shows how many average errors are contained in the marginal error. So, if t= 1, then with a probability of 0.683 it can be stated that the difference between the sample and general indicators will not exceed one average error.

Formulas for calculating maximum sampling errors are given in Table. 6.3.

Table 6.3.

Formulas for calculating the maximum sampling error for the average and for the share

After calculating the maximum sampling errors, we find confidence intervals for general indicators. The probability that is accepted when calculating the error of a sample characteristic is called confidence. A confidence level of 0.95 means that only in 5 cases out of 100 the error can go beyond the established limits; probabilities of 0.954 - in 46 cases out of 1000, and with 0.999 - in 1 case out of 1000.

For the general average, the most probable boundaries in which it will be located, taking into account the maximum representativeness error, will have the form:

.

The most likely boundaries within which the general share will be located will be:

.

From here, general average , general share .

Given in table. 6.3. formulas are used to determine sampling errors carried out using purely random and mechanical methods.

With stratified sampling, the sample necessarily includes representatives of all groups and usually in the same proportions as in the general population. Therefore, the sampling error in this case depends mainly on the average of the within-group variances. Based on the rule for adding variances, we can conclude that the sampling error for stratified sampling will always be less than for random sampling itself.

With serial (clustered) selection, the measure of variability will be intergroup dispersion.



error: Content protected!!