Showing posts with label Surveys and Sampling. Show all posts
Showing posts with label Surveys and Sampling. Show all posts

Sunday 30 June 2013

Sampling and Generalization

             
            (Surveys and Sampling) Sampling and Generalization

We have seen that surveys are conducted with the goal of creating an accurate picture of the current attitudes, beliefs, or behaviors of a large group of people. In some rare cases it is possible to conduct a census—that is, to measure each person about whom we wish to know. In most cases, however, the group of people that we want to learn about is so large that measuring each person is not practical. Thus, the researcher must test some subset of the entire group of people who could have participated in the research. Sampling refers to the selection of people to participate in a research project, usually with the goal of being able to use these people to make inferences about a larger group of individuals. The entire group of people that the researcher desires to learn about is known as the population, and the smaller group of people who actually participate in the research is known as the sample


Definition of the Population 

          The population of interest to the researcher must be defi ned precisely. For instance, some populations of interest to a survey researcher might be “all citizens of voting age in the United States who plan to vote in the next election,” “all students currently enrolled full time at the University of Chicago,” or “all Hispanic Americans over forty years of age who live within the Baltimore city limits.” In most cases the scientist does not particularly care about the characteristics of the specifi c people chosen to be in the sample. Rather, the scientist uses the sample to draw inferences about the population as a whole (just as a medical researcher analyzes a sample to make inferences about blood that was not sampled). 
        Whenever samples are used to make inferences about populations, the researcher faces a basic dilemma—he or she will never be able to know exactly what the true characteristics of the population are because all of the members of the population cannot be contacted. However, this is not really as big a problem as it might seem if the sample can be assumed to be representative of the population. A representative sample is one that is approximately the same as the population in every important respect. For instance, a representative sample of the population of students at a college or university would contain about the same proportion of men, sophomores, and engineering majors as are in the college itself, as well as being roughly equivalent to the population on every other conceivable characteristic. 

Probability Sampling 

         To make the sample representative of the population, any of several probability sampling techniques may be employed. In probability sampling, procedures are used to ensure that each person in the population has a known chance of being selected to be part of the sample. As a result, the likelihood that the sample is representative of the population is increased, as is the ability to use the sample to draw inferences about the population.  

Simple Random Sampling. The most basic probability sample is drawn using simple random sampling. In this case, the goal is to ensure that each person in the population has an equal chance of being selected to be in the sample. To draw a simple random sample, an investigator must first have a complete list (known as a sampling frame) of all of the people in the population. For instance, voting registration lists may be used as a sampling frame, or telephone numbers of all of the households in a given geographic location may be used. The latter list will basically represent the population that lives in that area because almost all U.S. households now have a telephone. Recent advances in survey methodology allow researchers to include cell phone numbers in their sampling frame as well. 
         Then the investigator randomly selects from the frame a sample of a given number of people. Let’s say you are interested in studying volunteering behavior of the students at your college or university, and you want to collect a random sample of 100 students. You would begin by finding a list of all of the students currently enrolled at the college. Assume that there are 7,000 names on this list, numbered sequentially from 1 to 7,000. Then, as shown in the instructions for using Statistical Table A (in Appendix E), you could use a random number table (or a random number generator on a computer) to produce 100 numbers that fall between 1 and 7,000 and select those 100 students to be in your sample. 

Systematic Random Sampling. If the list of names on the sampling frame is itself known to be in a random sequence, then a probability sampling procedure known as systematic random sampling can be used. In your case, because you wish to draw a sample of 100 students from a population of 7,000 students, you will want to sample 1 out of every 70 students (100/7,000 5 1/70). To create the systematic sample, you first draw a random number between 1 and 70 and then sample the person on the list with that number. You create the rest of the sample by taking every seventieth person on the list after the initial person. For instance, if the fi rst person sampled was number 32, you would then sample number 102, 172, and so on. You can see that it is easier to use systematic sampling than simple random sampling because only one initial number has to be chosen at random. 

Stratified Sampling. Because in most cases sampling frames include such information about the population as sex, age, ethnicity, and region of residence, and because the variables being measured are frequently expected to differ across these subgroups, it is often useful to draw separate samples from each of these subgroups rather than to sample from the population as a whole. The subgroups are called strata, and the sampling procedure is known as stratified sampling
          To collect a proportionate stratifi ed sample, frames of all of the people within each strata are fi ,rst located, and random samples are drawn from within each of the strata. For example, if you expected that volunteering rates would be different for students from different majors, you could fi rst make separate lists of the students in each of the majors at your school and then randomly sample from each list. One outcome of this procedure is that the different majors are guaranteed to be represented in the sample in the same proportion that they are represented in the population, a result that might not occur if you had used random sampling. Furthermore, it can be shown mathematically that if volunteering behavior does indeed differ among the strata, a stratified sample will provide a more precise estimate of the population characteristics than will a simple random sample (Kish, 1965). 
        
          Disproportionate stratified sampling is frequently used when the strata differ in size and the researcher is interested in comparing the characteristics of the strata. For instance, in a class of 7,000 students, only 10 or so might be French majors. If a random sample of 100 students was drawn, there might not be any French majors in the sample, or at least there would be too few to allow a researcher to draw meaningful conclusions about them. In this case, the researcher draws a sample that includes a larger proportion of some strata than they are actually represented in the population. This procedure is called oversampling and is used to provide large enough samples of the strata of interest to allow analysis. Mathematical formulas are used to determine the optimum size for each of the strata. 

Cluster Sampling. Although simple and stratifi ed sampling can be used to create representative samples when there is a complete sampling frame for the population, in some cases there is no such list. For instance, there is no single list of all of the currently matriculated college students in the United States. In these cases an alternative approach known as cluster sampling can be used. The technique is to break the population into a set of smaller groups (called clusters) for which there are sampling frames and then to randomly choose some of the clusters for inclusion in the sample. At this point, every person in the cluster may be sampled, or a random sample of the cluster may be drawn. 
         Often the clustering is done in stages. For instance, we might fi rst divide the United States into regions (for instance, East, Midwest, South, Southwest, and West). Then we would randomly select states from each region, counties from each state, and colleges or universities from each county. Because there is a sampling frame of the matriculated students at each of the selected colleges, we could draw a random sample from these lists. In addition to allowing a representative sample to be drawn when there is no sampling frame, cluster sampling is convenient. Once we have selected the clusters, we need only contact the students at the selected colleges rather than having to sample from all of the colleges and universities in the United States. In cluster sampling, the selected clusters are used to draw inferences about the nonselected ones. Although this practice loses some precision, cluster sampling is frequently used because of convenience. 

Sampling Bias and Nonprobability Sampling 

The advantage of probability sampling methods is that their samples will be representative and thus can be used to draw inferences about the characteristics of the population. Although these procedures sound good in theory,in practice it is difficult to be certain that the sample is truly representative. Representativeness requires that two conditions be met. First, there must be one or more sampling frames that list the entire population of interest, and second, all of the selected individuals must actually be sampled. When either of these conditions is not met, there is the potential for sampling bias. This occurs when the sample is not actually representative of the population because the probability with which members of the population have been selected for participation is not known. 
        Sampling bias can arise when an accurate sampling frame for the population of interest cannot be obtained. In some cases there is an available sampling frame, but there is no guarantee that it is accurate. The sampling frame may be inaccurate because some members of the population are missing or because it includes some names that are not actually in the population. College student directories, for instance, frequently do not include new students or those who requested that their name not be listed, and these directories may also include students who have transferred or dropped out. 
           In other cases there simply is no sampling frame. Imagine attempting to obtain a frame that included all of the homeless people in New York City or all of the women in the United States who are currently pregnant with their first child. In cases where probability sampling is impossible because there is no available sampling frame, nonprobability samples must be used. To obtain a sample of homeless individuals, for instance, the researcher will interview individuals on the street or at a homeless shelter. One type of nonprobability sample that can be used when the population of interest is rare or difficult to reach is called snowball sampling. In this procedure one or more individuals from the population are contacted, and these individuals are used to lead the researcher to other population members. Such a technique might be used to locate homeless individuals. Of course, in such cases the potential for sampling bias is high because the people in the sample may be different from the people in the population. Snowball sampling at homeless shelters, for instance, may include a greater proportion of people who stay in shelters and a smaller proportion of people who do not stay in shelters than are in the population. This is a limitation of nonprobability sampling, but one that the researcher must live with because there is no possible probability sampling method that can be used. 
            Even if a complete sampling frame is available, sampling bias can occur if all members of the random sample cannot be contacted or cannot be convinced to participate in the survey. For instance, people may be on vacation, they may have moved to a different address, or they may not be willing to complete the questionnaire or interview. When a questionnaire is mailed, the response rate may be low. In each of these cases the potential for sampling bias exists because the people who completed the survey may have responded differently than would those who could not be contacted. 
       Nonprobability samples are also frequently found when college students are used in experimental research. Such samples are called convenience samples because the researcher has sampled whatever individuals were  readily available without any attempt to make the sample representative of a population. Although such samples can be used to test research hypotheses, they may not be used to draw inferences about populations. We will discuss the use of convenience samples in experimental research designs more fully in Chapter 13. 
            Whenever you read a research report, make sure to determine what sampling procedures have been used to select the research participants. In some cases, researchers make statements about populations on the basis of nonprobability samples, which are not likely to be representative of the population they are interested in. For instance, polls in which people are asked to call a 900 number or log on to a website to express their opinions on a given topic may contain sampling bias because people who are in favor of (or opposed to) the issue may have more time or more motivation to do so. Whenever the respondents, rather than the researchers, choose whether to be part of the sample, sampling bias is possible. The important thing is to remain aware of what sampling techniques have been used and to draw your own conclusions accordingly.

Saturday 29 June 2013

Surveys



           (Surveys and Sampling) Surveys

A survey is a series of self-report measures administered either through an interview or a written questionnaire. Surveys are the most widely used method of collecting descriptive information about a group of people. You may have received a phone call (it usually arrives in the middle of the dinner hour when most people are home) from a survey research group asking you about your taste in music, your shopping habits, or your political preferences. 
The goal of a survey, as with all descriptive research, is to produce a “snapshot” of the opinions, attitudes, or behaviors of a group of people at a given time. Because surveys can be used to gather information about a wide variety of information in a relatively short time, they are used extensively by businesspeople, advertisers, and politicians to help them learn what people think, feel, or do. 

Interviews

       Surveys are usually administered in the form of an interview, in which questions are read to the respondent in person or over the telephone. One advantage of in-person interviews is that they may allow the researcher to develop a close rapport and sense of trust with the respondent. This may motivate the respondent to continue with the interview and may lead to more honest and open responding. However, face-to-face interviews are extremely expensive to conduct, and consequently telephone surveys are now more common. In a telephone interview all of the interviewers are located in one place, the telephone numbers are generated automatically, and the questions are read from computer terminals in front of the researchers. This procedure provides such effi ciency and coordination among the interviewers that many surveys can be conducted in one day. 

Unstructured Interviews. Interviews may use either free-format or fixedformat self-report measures. In an unstructured interview the interviewer talks freely with the person being interviewed about many topics. Although a general list of the topics of interest is prepared beforehand, the actual interview focuses in on those topics that the respondent is most interested in or most knowledgeable about. Because the questions asked in an unstructured interview differ from respondent to respondent, the interviewer must be trained to ask questions in a way that gets the most information from the respondent and allows the respondent to express his or her true feelings. One type of a face-to-face unstructured interview in which a number of people are interviewed at the same time and share ideas both with the interviewer and with each other is called a focus group.
        Unstructured interviews may provide in-depth information about the particular concerns of an individual or a group of people, and thus, may produce ideas for future research projects or for policy decisions. It is, however, very diffi cult to adequately train interviewers to ask questions in an unbiased manner and to be sure that they have actually done so. And, as we have seen in Chapter 4, because the topics of conversation and the types of answers given in free-response formats vary across participants, the data are diffi cult to objectively quantify and analyze, and are therefore frequently treated qualitatively. 

Structured Interviews. Because researchers usually want more objective data, the structured interview, which uses quantitative fi xed-format items, is most common. The questions are prepared ahead of time, and the interviewer reads the questions to the respondent. The structured interview has the advantage over an unstructured interview of allowing better comparisons of the responses across different individuals because the questions, time frame, and response format are controlled to be the same for each respondent. 

Questionnaires

A questionnaire is a set of fi xed-format, self-report items that is completed by respondents at their own pace, often without supervision. Questionnaires are generally cheaper than interviews because a researcher can mail the questionnaires to many people or have them complete the questionnaires in large groups. Questionnaires may also produce more honest responses than interviews, particularly when the questions involve sensitive issues such as sexual activity or annual income, because respondents are more likely to perceive their responses as being anonymous than they are in interviews. In comparison to interviews, questionnaires are also likely to be less infl uenced by the characteristics of the experimenter. For instance, if the topic concerns race-related attitudes, how the respondent answers might depend on the race of the interviewer and how the respondent thinks the interviewer wants him or her to respond. Because the experimenter is not present when a questionnaire is completed, or at least is not directly asking the questions, such problems are less likely.

The Response Rate. Questionnaires are free of some problems that may occur in interviews, but they do have their own set of diffi culties. Although people may be likely to return surveys that have direct relevance to them (for instance, a survey of college students conducted by their own university), when mailings are sent to the general population, the response rate (that is, the percentage of people who actually complete the questionnaire and return it to the investigator) may not be very high. This may lead to incorrect conclusions because the people who return the questionnaire may respond differently than those who don’t return it would have. Investigators can sometimes increase response rates by providing gifts or monetary payments for completing the survey, by making the questionnaire appear brief and interesting, by ensuring the confi dentiality of all of the data, and by emphasizing the importance of the individual in the research (Dillman, 1978). Follow-up mailings can also be used to remind people that they have not completed the questionnaire, with the hope that they will then do so. 


Question Order. Another potential problem with questionnaires that does not occur with interviews is that people may not answer the questions in the order they are written, and the researcher does not know whether or not they have. To take one example, consider these two questions:

1. “How satisfied are you with your relationships with your family?”
2. “How satisfied are you with your relationship with your spouse?”

If the questions are answered in the order that they are presented here, then most respondents interpret the word family in question 1 to include their spouse. If question 2 is answered before question 1, however, the term family in question 1 is interpreted to mean the rest of the family except the spouse. Such variability can create measurement error (Schuman & Presser, 1981; Schwarz & Strack, 1991). 


Use of Existing Survey Data

 Because it is very expensive to conduct surveys, scientists often work together on them. For instance, a researcher may have a small number of questions relevant to his or her research included within a larger survey. Or researchers can access public-domain data sets that contain data from previous surveys. The U.S. Census is probably the largest such data set, containing information on family size, fertility, occupation, and income for the entire U.S. population, as well as a more extensive interview data set of a smaller group of citizens. The General Social Survey is a collection of over 1,000 items given to a sample of U.S. citizens (Davis, Smith, & Marsden, 2000). Because the same questions are asked each year the survey is given, comparisons can be made over time. Sometimes these data sets are given in comparable forms to citizens of different countries, allowing cross-cultural comparisons. One such data set is the Human Area Relations Files. Indexes of some of the most important social science databases can be found in Clubb, Austin, Geda, and Traugott (1985).

Surveys and Sampling


Surveys and Sampling

Now that we have reviewed the basic types of measured variables and considered how to evaluate their effectiveness at assessing the conceptual variables of interest, it is time to more fully discuss the use of these measures in descriptive research. In this chapter, we will discuss the use of self-report measures, and in Chapter 7, we will discuss the use of behavioral measures. Although these measures are frequently used in a qualitative sense—to draw  a complete and complex picture in the form of a narrative—they can also be used quantitatively, as measured variables. As you read these chapters, keep in mind that the goal of descriptive research is to describe the current state of affairs but that it does not by itself provide direct methods for testing research hypotheses. However, both surveys (discussed in this chapter) and naturalistic methods (discussed in Chapter 7) are frequently used not only as descriptive data but also as the measured variables in correlational and experimental tests of research hypotheses. We will discuss these uses in later chapters.