Population: In statistical usage the term population is applied to any finite or infinite collection of individuals. It has displaced the older term universe, which is derived from the universe of discourse of logic. It is practically synonymous with aggregate and does not necessarily refer to a collection of living organisms.
Census: The complete enumeration of a population or groups at a point in time with respect to well-defined characteristics such as population, production, traffic on particular roads. In some connection the term is associated with data collected rather than the extent of the collection so that the term Sample Census has a distinct meaning.
The partial enumeration resulting from failure to cover the whole population, as distinct from a designed sample enquiry, may be referred to as an ‘incomplete census’.
Sample: A part of a population, or a subset from a set of units, which is provided by some process or other, usually by deliberate selection with the object of investigating the properties of the parent population or set.
Sample Survey: A survey which is carried out using a sampling method, i.e. in which a portion only, and not the whole population, is surveyed
Sampling Unit: One of the units into which an aggregate is divided or regarded as divided for the purposes of sampling, each unit being regarded as individual and indivisible when the selection is made The definition of unit may be made on some natural basis, for example, households, persons, units of product, tickets, etc. or on some arbitrary basis, e.g. areas defined by grid coordinates on a map. In the case of multi-stage sampling the units are different at different stages of sampling, being ‘large’ at the first stage and growing progressively smaller with each stage in the process of selection. The term sample unit is sometimes used in a synonymous sense.
Frame A list, map or other specification of the units which constitute the available information relating to the population designated for a particular sampling scheme. There is a frame corresponding to each state of sampling in a multi-stage sampling scheme. The frame may or may not contain information about the size or other supplementary information of the units, but it should have enough details so that a unit, if included in the sample, may be located and taken up for inquiry. The nature of the frame exerts a considerable influence over the structure of a sample survey. It is rarely perfect, and may be inaccurate, incomplete, inadequately described, out of date or subject to some degree of duplication. Reasonable reliability in the frame is a desirable condition for the reliability of a sample survey based on it.
In multi-stage sampling it is sometimes possible to construct the frame at higher stage during the progress of the sample survey itself. For example, certain first stage units may be selected in the first instance; and then more detailed lists or maps be constructed by compilation of available information or by direct observation only of the first-stage units actually selected.
Sampling Error: The part of the difference between a population value and an estimate thereof, derived from a random sample, which is due to the fact that only a sample of values is observed; as distinct from errors due to imperfect selection, bias in response or estimation, errors of observation and recording, etc. The totality of sampling errors in all possible samples of the same size generates the sampling distribution of the statistic which is being used to estimate the parent value.
Bias: Generally, an effect which deprives a statistical result of representativeness by systematically distorting it, as distinct from a random error which may distort on any one occasion but balances out on the average.
Biased Sample: A sample obtained by a biased sampling process, that is to say, a process which incorporates a systematic component of error, as distinct from random error which balances out on the average. Non-random sampling is often, though not inevitably, subject to bias, particularly when entrusted to subjective judgment on the part of human beings.
Estimation And Testing of Hypothesis: At this stage, it is worthwhile to distinguish two objectives of sample surveys – (i) to estimate certain population parameters, and (ii) to test a hypothesis.
Estimation of a parameter refers to a situation in which the presence of a certain characteristic in a given population is to be estimated. For example, we may be interested in ascertaining the average annual expenditure incurred on smoking or the proportion of employees working overtime in an industrial unit, and so on. In the first example, parameter refers to the average annual expenditure on smoking and in the second example, the proportion of employees working overtime. In order to estimate a parameter, first a sample is chosen, the elements in the sample are contacted and the necessary information is collected from them. From the information thus gathered, the relevant statistic (average or proportion) is calculated. This statistic is used as an estimate of the population parameter.
The second objective of sample surveys may be to test a hypothesis involving a comparison of two or more numerical values. For example, we may like to test the hypothesis that at least 60 per cent of households have telephones in a town. A sample survey is undertaken and the relevant survey data reveal that this percentage is 55. The question now is whether these two percentages are significantly different.
Advantages of Sampling
The following are several advantages of sampling:
- Sampling is cheaper than a census survey. It is obviously more economical, for instance, to cover a sample of households than all the households in a territory although the cost per unit of study may be higher in a sample survey than in a census survey.
- Since magnitude of operations involved in a sample survey is small, both execution of the fieldwork and the analysis of the results can be carried out speedily.
- Sampling results in greater economy of effort, as a relatively small staff is required to carry out the survey and to tabulate and process the survey data.
- A sample survey enables the researcher to collect more detailed information than would otherwise be possible in a census survey. Also, information of a more specialised type can be collected, which would not be possible in a census survey on account of the availability of a smaller number of specialists.
- Since the scale of operations involved in a sample survey in small, the quality of the interviewing, supervision and other related activities can be better than the quality in a census survey.
Limitations of Sampling
- When the information is needed on every unit in the population such as individuals, dwelling units or business establishments, a sample survey cannot be of much help for it fails to provide information on individual count.
- Sampling gives rise to certain errors. If these errors are too large, the results of the sample cannot be of much help for it fails to provide information on individual count.
- While in a census survey it may be easy to check the omissions of certain units in view of complete coverage, this is not so in the case of a sample survey.
The Sampling Process: Having looked into the major advantages and limitations of sampling, we now turn to the sampling process. It is the procedure required right from defining a population to the actual selection of sample elements. There are seven steps involved in this process.
Step 1: Define the population. It is the aggregate of all the elements defined prior to selection of the sample. It is necessary to define population in terms of (i) elements, (ii) sampling units, (iii) extent, and (iv) time. A few examples are given here. If we were to conduct a survey on the consumption of tea in Gujarat, then these specifications might be as follows:
- Element: Housewives
- Sampling units: Households, then housewives
- Extent: GujaratState
- Time: January 1-10, 1999
If we were to monitor the sales of a product recently introduced by us, the population might be
- Element: Our product
- Sampling units: Retail outlets, super markets, then our product
- Extent: Delhi and New Delhi
- Time: January 7-14, 1999
It may be emphasized that all these four specifications must be contained in the designated population. Omission of any of them would render the definition of population incomplete.
Step 2: Identify the sampling frame, which could be a telephone directory, a list of blocks and localities of a city, a map or any other list consisting of all the sampling units. It may be pointed out that if the frame is incomplete or otherwise defective, sampling will not be able to overcome these shortcomings.
The question is – How to ensure that the frame is perfect and free from any defect? Leslie Kish has observed that a perfect frame is one where “every element appears on the list separately, 96 once, only once, and nothing else appears on the list”. This type of perfect frame would indicate one-to-one correspondence between frame units and sampling units. But such perfect frames are rather rare. Accordingly, one has to use frames with one deficiency or another, but one should ensure that the frame is not too deficient so as to be given up altogether.
This raises a pertinent question: What are the criteria for a suitable frame? In order to examine the suitability or otherwise of a sampling frame, a number of questions need be asked. These are:
- Does it adequately cover the population to be surveyed?
- How complete is the frame? Is every unit that should be included represented?
- Is it accurate? Is the information about each individual unit correct? Does the frame as a whole contain units which no longer exist?
- Is there any duplication? If so, then the probability of selection is disturbed as a unit can enter the sample more than once.
- Is the frame up-to-date? It could have met all the criteria when compiled but would well be deficient when it came to be used. This could well be true of all frames involving the human population as change is taking place continuously.
- How convenient is it to use? Is it readily accessible? Is it arranged in a way suitable for sampling? Can it easily be re-arranged so as to enable us to introduce stratification and to undertake multi-stage sampling?
These are demanding criteria and it is most unlikely that any frame would meet them all. Nevertheless, they are the factors to be borne in mind whenever we undertake random sampling.
In marketing research most of the frames are from census reports, electoral registers, lists of member units of trade and industry associations, lists of members of professional bodies, lists of dwelling units maintained by local bodies, returns from an earlier survey and large scale maps.
Step 3: Specify the sampling unit. The sampling unit is the basic unit containing the elements of the target population. The sampling unit may be different from the element. For example, if one wanted a sample of housewives, it might be possible to have access to such a sample directly. However, it might be easier to select households as the sampling unit and then interview housewives in each of the selected households.
As mentioned in the preceding step, the sampling frame should be complete and accurate otherwise the selection of the sampling unit might defective. It is necessary to get a further specification of the sampling unit both in personal interviews and in telephone interviews. Thus, in personal interviews, a pertinent question is – of the several persons in a household, who should be interviewed? If interviews are held during office timing when the heads of families and other employed persons are away, interviewing would under-represent employed persons and over-represent elderly persons, housewives and the unemployed. In view of these considerations, it is necessary to have a random process of selection of the adult residents of each household. One method that could be used for this purpose is to list all the eligible persons living at a particular address and then select one of them.
Step 4: Specify the sampling method. It indicates how the sample units are selected. One of the most important decisions in this regard is to determine which of the two – probability and non-probability sample – is to be chosen. Probability samples are also known as random samples and non-probability samples as non-random samples.
In case of a probability sample, the probability or chance of every unit in the population being included in the sample is known. Further, the selection of specific units in the sample depends entirely on chance. No substitution of one unit for another is permissible. This means that no human judgment is involved in the selection of a sample. In contract, in a non-probability sample, the probability of inclusion of any unit in the population in the sample is not known. In addition, the selection of units within a sample involves human judgment rather than pure chance.
In case of a probability sample, it is possible to measure the sampling error and thereby determine the degree of precision in the estimates with the help of the theory of probability. This theory also enables us to consider, from amongst the various possible sample designs, the one that will give the maximum information per rupee. This is not possible when a non- probability sample is used.
Probability sampling enables us to choose representative sample designs. It also enables us to estimate the extent to which the results based on such a sample are likely to be different from what we would have obtained had we covered the population in our study. Conversely, the use of probability sampling enables us to determine the sample size for a given degree of precision, indicating that our sample results do not differ by more than a specified amount from those yielded by a study covering the entire population.
Although non-probability sampling does not yield these benefits, on account of its convenience and economy, it is often preferred to probability sampling. If the researcher is convinced that the risks involved in the use of a non-probability sample are more than offset by its being relatively cheap and convenient, his choice should be in favour of non-probability sampling.
There are various types of sample designs which can be covered under the two broad groups – random or probability samples and non-random or non-probability samples. The main types of sample designs in each of these two categories are discussed a little later.
Step 5: Determine the sample size. In other words, one has to decide how many elements of the target population are to be chosen.
Step 6: Specify the sampling plan. This means that one should indicate how decisions made so far are to be implemented. For example, if a survey of households is to be conducted, a sampling plan should define a household, contain instructions to the interviewer as to how he should take a systematic sample of households, advise him on what he should do when no one is available on his visit to the household, and so on. These are some pertinent issues in a sampling survey to which a sampling plan should provide answers.
Step 7: Select the sample. This is the final step in the sampling process. A good deal of office and fieldwork is involved in the actual selection of the sampling elements. Most of the problems in this stage are faced by the interviewer while contacting the sample-respondents.
Principles of Sample Survey
The theory of sampling is based on the following important principles:
- Principle of statistical regularity
- Principle of validity
- Principle of optimization
Principle of statistical regularity: stresses the desirability and importance of selecting a sample at random so that each and every unit in the population has an equal chance of being selected in the sample.
We get an immediate derivation from this principle is the principle of Inertia of large numbers which states that “Other things being equal as the sample size increases, the results tend to be more reliable and accurate.”
For example, in a coin tossing experiment, the results will be approximately 50% heads and 50% tails provided we perform the experiment a fairly large number of times.
Principle of validity: means the sample design should enable us to obtain valid tests and estimates about the parameters of the population. The samples obtained by the technique of probability sampling satisfy this principle.
Principle of optimization: impresses upon obtaining optimum results in terms of efficiency and cost of the design with the resources at disposal. The reciprocal of the sampling variance of an estimate provides a measure of its efficiency while a measure of cost of the design is provided by the total expenses incurred in terms of money and man hour.
The principle of optimization consists in
- Achieving a given level of efficiency at minimum cost
- Obtaining maximum possible efficiency with given level of cost.