In order to collect unbiased data, it is important that the sample be representative of the population.
When a study is done with faulty data, the results are questionable.
Usually only a part of the population can be analyzed.
How do you choose your sample?
The process is called sampling.
Sampling is that part of statistical practice concerned with the selection of individual observations intended to yield some knowledge about a population of concern, especially for the purposes of statistical inference.
Random Sample: Each member of the population has an equal chance of being selected.
Simple Random Sample: when every possible sample of size n out of a population of N has an equally likely chance of occurring
Simple random sampling is like pulling a number out of a hat. Every member in the population is assigned a number. However, in a large population, it can be time-consuming to write down thousands names on slips of paper to draw from a hat.
An easier way to draw the sample is to utilize a random number table.
In fact, random numbers can be generated by a random number table, software program or a calculator.
Example. There are 800 students currently enrolled in your school. You wish to form a sample of ten students to answer some survey questions.
Assign numbers 001 to 800 to each student.
On the table of random numbers, choose a starting place at random (anywhere, let’s pick the 5th column, 2nd row.)
5th column, 2nd row. Read numbers in grouping of three digits. Get the first 10 groupings.
261, 046, 731, 800, 701, 349, 866, 675, 199, 723, 596,… The students assigned these numbers will be sampled. Ignore number 866.
Data from members of the population that correspond to these numbers become members of the sample.
Simple random sampling requires that we have a list of all the individuals within a population.
This list is called a frame.
If we do not have a frame, then a different sampling method must be used.
An unbiased random selection of individuals is important so that in the long run, the sample represents the population. This does not guarantee that a particular sample is a perfect representation of the population.
Simple random sampling best suits situations where not much information is available about the population.
There are other effective ways to collect data
Stratified sampling involves selecting independent samples from a number of subpopulations, groups or strata within the population.
In stratified random sampling, the strata are formed based on their members sharing a specific attribute or characteristic. A simple random sample from each stratum is taken, in a number proportional to the stratum’s size when compared to the population.
Within each stratum, the individuals are likely to have a common attribute.
Between the strata, the individuals are likely to have different common attributes.
Example – polling a population about a political issue
It is reasonable to divide up the population into Democrats, Republicans, and Independents
It is reasonable to believe that the opinions of individuals within each party are the same
It is reasonable to believe that the opinions differ from group to group
Therefore it makes sense to consider each strata separately.
There are several major reasons why you might prefer stratified sampling over simple random sampling. First, it assures that you will be able to represent not only the overall population, but also key subgroups of the population, especially small minority groups.
If you want to be able to talk about subgroups, this may be the only way to effectively assure you’ll be able to.
Second, stratified random sampling will generally have more statistical precision than simple random sampling. This will only be true if the groups are homogeneous, or consistent within each group. If they are, we expect that the variability within the groups to be lower than the variability for the population as a whole. Stratified sampling takes advantage of this fact.
A systematic sample is obtained when we choose every “nth” individual in a population.
It is a method of selecting a sample from a larger population using a random starting point and a fixed, periodic interval.
Typically, every “nth” member is selected from the total population for inclusion in the sample.
Systematic sampling is still thought of as being random, as long as the periodic interval is determined beforehand and the starting point is random.
Example: Suppose a supermarket wants to study buying habits of their customers, then using systematic sampling they can choose every 10th or 15th customer entering the supermarket and conduct the study on this sample.
We do not have a list of customers arriving that day
We do not even know how many customers will arrive that day
One advantage to this technique is its simplicity.
Cluster sampling involves selecting the sample units in groups.
Cluster sampling is a technique used when “natural” groupings are evident in a population.
In this technique, the total population is divided into these groups (or clusters) and a sample of the groups is selected.
Then the required information is collected from the all members within each selected group.
The technique works best when most of the variation in the population is within the groups, not between them.
The main difference between cluster sampling and stratified sampling is that in cluster sampling the cluster is treated as the sampling unit so analysis is done on the entire population within selected clusters.
In stratified sampling, the analysis is done on elements within each stratum.
In stratified sampling, a simple random sample is drawn from each of the strata, whereas in cluster sampling only the selected clusters are studied.
Cluster sampling is appropriate when it is very time consuming or expensive to choose the individuals one at a time
Example – testing the fill of bottles
It is time consuming to pull individual bottles.
It is expensive to waste an entire cartons of 12 bottles to just test one bottle.
If we would like to test 240 bottles, we could randomly select 20 cartons, test all 12 bottles within each carton. This reduces the time and expense required.
Convenience sampling is sampling which involves the sample being drawn from that part of the population which is close to hand.
Convenience sampling is used in research where the researcher is interested in getting an inexpensive approximation of the truth. As the name implies, the sample is selected because they are convenient.
Convenience sampling often leads to a biased study since it consists of only available people.
No matter how a convenience sample is recruited, the key point is this – since there is no attempt to match the characteristics of the convenience sample to the general population, the extent to which a convenience sample represents the traits or behaviors of the general population cannot be known – and this is true regardless of how large the sample may be.
That’s why when someone tries to draw a conclusion about the broader general population by using research from a convenience sample, they run a very real danger of drawing exceptionally wrong conclusions.
Convenience sampling has little statistical validity.
The design is poor.
The results are suspect.