Sunday, 30 June 2013

Current Research in the Behavioral Sciences: Detecting Psychopathy From Thin Slices of Behavior

(Naturalistic Methods) Current Research in the Behavioral Sciences: Detecting
Psychopathy From Thin Slices of Behavior

Katherine A. Fowler, Scott O. Lilienfeld, and Christopher J. Patrick (2009) used a naturalistic research design to study whether personality could be reliably assessed by raters who were given only very short samples (“thin slices”) of behavior. They were particularly interested in assessing psychopathy, a syndrome characterized by emotional and interpersonal defi cits that often lead a person to antisocial behavior. According to the authors’ defi nition, psychopathic individuals tend to be “glib and superficially charming,” giving a surface-level appearance of intelligence, but are also “manipulative and prone to pathological lying” (p. 68). Many lead a socially deviant lifestyle marked by early behavior problems, irresponsibility, poor impulse control, and proneness to boredom. 
         Because the researchers felt that behavior was likely to be a better indicator of psychopathy than was self-report, they used coders to assess the disorder from videotapes. Forty raters viewed videotapes containing only very brief excerpts (either 5s, 10s, or 20s in duration) selected from longer videotaped interviews with 96 maximum-security inmates at a prison in Florida. Each inmate’s video was rated by each rater on a variety of dimensions related to psychopathy including overall rated psychopathy, as well as antisocial, narcissistic and avoidant characteristics. The raters also rated the prisoners on physical attractiveness, as well as estimates of their violence proneness, and intelligence. To help the coders understand what was to be rated, the researchers provided them with very specifi c descriptions of each of the dimensions to be rated. 
    
  Even though the raters were not experts in psychopathy, they tended to agree on their judgments. Interrater reliability was calculated as the agreement among the raters on each item. As you can see in Table 7.2, the reliability of the codings was quite high, suggesting that the raters, even using very thin slices, could adequately assess the conceptual variables of interest.


Archival Research

  (Naturalistic Methods) Archival Research

As you will recall, one of the great advantages of naturalistic methods is that there are so many data available to be studied. One approach that takes full advantage of this situation is archival research, which is based on an analysis of any type of existing records of public behavior. These records might include newspaper articles, speeches and letters of public figures, television and radio broadcasts, Internet websites, or existing surveys. Because there are so many records that can be examined, the use of archival records is limited only by the researcher’s imagination. 
          Records that have been used in past behavioral research include the trash in a landfi ll, patterns of graffi ti, wear and tear on fl oors in museums, litter, and dirt on the pages of library books (see Webb et al., 1981, for examples). Archival researchers have found that crimes increase during hotter weather (Anderson, 1989); that earlier-born children live somewhat longer than later-borns (Modin 2002); and that gender and racial stereotypes are prevalent in current television shows (Greenberg, 1980) and in magazines (Sullivan & O’Connor, 1988). 
      One of the classic archival research projects is the sociological study of the causes of suicide by sociologist Emile Durkheim (1951). Durkheim used records of people who had committed suicide in seven European countries between 1841 and 1872 for his data. These records indicated, for instance, that suicide was more prevalent on weekdays than on weekends, among those who were not married, and in the summer months. From these data, Durkheim drew the conclusion that alienation from others was the primary cause of suicide. Durkheim’s resourcefulness in collecting data and his ability to use the data to draw conclusions about the causes of suicide are remarkable. 
         Because archival records contain a huge amount of information, they must also be systematically coded. This is done through a technique known as content analysis. Content analysis is essentially the same as systematic coding of observational data and includes the specifi cation of coding categories and the use of more than one rater. In one interesting example of an archival research project, Simonton (1988) located and analyzed biographies of U.S. presidents. He had seven undergraduate students rate each of the biographies on a number of predefi ned coding categories, including “was cautious and conservative in action,” “was charismatic,” and “valued personal loyalty.” The interrater reliability of the coders was assessed and found to be adequate. 
            Simonton then averaged the ratings of the seven coders and used the data to draw conclusions about the personalities and behaviors of the presidents. For instance, he found that “charismatic” presidents were motivated by achievement and power and were more active and accomplished more while in office. Although Simonton used biographies as his source of information, he could, of course, have employed presidential speeches, information on how and where the speeches were delivered, or material on the types of appointments the presidents made, among other records.

Systematic Coding Methods

(Naturalistic Methods) Systematic Coding Methods


You have probably noticed by now that although observational research and case studies can provide a detailed look at ongoing behavior, because they represent qualitative data, they may often not be as objective as one might like, especially when they are based on recordings by a single scientist. Because the observer has chosen which people to study, which behaviors to record or ignore, and how to interpret those behaviors, she or he may be more likely to see (or at least to report) those observations that confirm, rather than disconfi rm, her or his expectations. Furthermore, the collected data may be relatively sketchy, in the form of “fi eld notes” or brief reports, and thus not amenable to assessment of their reliability or validity. However, in many cases these problems can be overcome by using systematic observation to create quantitative measured variables (Bakeman & Gottman, 1986; Weick, 1985).

Deciding What to Observe

          Systematic observation involves specifying ahead of time exactly which observations are to be made on which people and in which times and places. These decisions are made on the basis of theoretical expectation about the types of events that are going to be of interest. Specificity about the  of interest has the advantage of both focusing the observers’ attention on these specific behaviors and reducing the masses of data that might be collected if the observers attempted to record everything they saw. Furthermore, in many cases more than one observer can make the observations, and, as we have discussed in Chapter 5, this will increase the reliability of the measures. 
           Consider, for instance, a research team interested in assessing how and when young children compare their own performance with that of their classmates (Pomerantz et al., 1995). In this study, one or two adult observers sat in chairs adjacent to work areas in the classrooms of elementary school children and recorded in laptop computers the behaviors of the children. Before beginning the project, the researchers had defined a specific set of behavioral categories for use by the observers. These categories were based on theoretical predictions of what would occur for these children and defined exactly what behaviors were to be coded, how to determine when those behaviors were occurring, and how to code them into the computer. 

Deciding How to Record Observations 

         Before beginning to code the behaviors, the observers spent three or four days in the classroom learning, practicing, and revising the coding methods and letting the children get used to their presence. Because the coding categories were so well defi ned, there was good interrater reliability. And to be certain that the judges remained reliable, the experimenters frequently computed a reliability analysis on the codings over the time that the observations were being made. This is particularly important because there are some behaviors that occur infrequently, and it is important to be sure that they are being coded reliably. 
        Over the course of each observation period, several types of data were collected. For one, the observers coded event frequencies—for instance, the number of verbal statements that indicated social comparison. These included both statements about one’s own performance (“My picture is the best.”) and questions about the performance of others (“How many did you get wrong?”). In addition, the observers also coded event duration—for instance, the amount of time that the child was attending to the work of others. Finally, all the children were interviewed after the observation had ended. 

Choosing Sampling Strategies

         One of the difficulties in coding ongoing behavior is that there is so much of it. Pomerantz et al. (1995), used three basic sampling strategies to reduce the amount of data they needed to record. First, as we have already seen, they used event sampling—focusing in on specifi c behaviors that were theoretically related to social comparison. Second, they employed individual sampling. Rather than trying to record the behaviors of all of the children at the same time, the observers randomly selected one child to be the focus child for an observational period. The observers zeroed in on this child, while ignoring the behavior of others during the time period. Over the entire period of the study, however, each child was observed. Finally, Pomerantz and colleagues employed time sampling. Each observer focused on a single child for only four minutes before moving on to another child. In this case, the data were coded as they were observed, but in some cases the observer might use the time periods between observations to record the responses. Although sampling only some of the events of interest may lose some information, the events that are attended to can be more precisely recorded. 
        The data of the observers were then uploaded from laptop computers for analysis. Using these measures, Pomerantz et al. found, among other things, that older children used subtler social comparison strategies and increasingly saw such behavior as boastful or unfair. These data have high ecological validity, and yet their reliability and validity are well established. Another example of a coding scheme for naturalistic research, also using children, is shown in Figure 7.1.

Case Studies


(Naturalistic Methods) Case Studies

Whereas observational research generally assesses the behavior of a relatively large group of people, sometimes the data are based on only a small set of individuals, perhaps only one or two. These qualitative research designs are known as case studies—descriptive records of one or more individual’s experiences and behavior. Sometimes case studies involve normal individuals, as when developmental psychologist Jean Piaget (1952) used observation of his own children to develop a stage theory of cognitive development. More frequently, case studies are conducted on individuals who have unusual or abnormal experiences or characteristics or who are going through particularly diffi cult or stressful situations. The assumption is that by carefully studying individuals who are socially marginal, who are experiencing a unique situation, or who are going through a diffi cult phase in their life, we can learn something about human nature. 
        Sigmund Freud was a master of using the psychological diffi culties of individuals to draw conclusions about basic psychological processes. One classic example is Freud’s case study and treatment of “Little Hans,” a child whose fear of horses the psychoanalyst interpreted in terms of repressed sexual impulses (1959). Freud wrote case studies of some of his most interesting patients and used these careful examinations to develop his important theories of personality. 
        Scientists also use case studies to investigate the neurological bases of behavior. In animals, scientists can study the functions of a certain section of the brain by removing that part. If removing part of the brain prevents the animal from performing a certain behavior (such as learning to locate a food tray in a maze), then the inference can be drawn that the memory was stored in the removed part of the brain. It is obviously not possible to treat humans in the same manner, but brain damage sometimes occurs in people for other reasons. “Split-brain” patients (Sperry, 1982) are individuals who have had the  two hemispheres of their brains surgically separated in an attempt to prevent severe epileptic seizures. Study of the behavior of these unique individuals has provided important information about the functions of the two brain hemispheres
in humans. In other individuals, certain brain parts may be destroyed through disease or accident. One well-known case study is Phineas Gage, a man who was extensively studied by cognitive psychologists after he had a railroad spike blasted through his skull in an accident. An interesting example of a case study in clinical psychology is described by Rokeach (1964), who investigated in detail the beliefs and interactions among three schizophrenics, all of whom were convinced they were Jesus Christ.
           One problem with case studies is that they are based on the experiences of only a very limited number of normally quite unusual individuals. Although descriptions of individual experiences may be extremely interesting, they cannot  usually tell us much about whether the same things would happen to other individuals in similar situations or exactly why these specific reactions to these events occurred. For instance, descriptions of individuals who have been in a stressful situation such as a war or an earthquake can be used to understand how they reacted during such a situation but cannot tell us what particular longterm effects the situation had on them. Because there is no comparison group  that did not experience the stressful situation, we cannot know what these individuals would be like if they hadn’t had the experience. As a result, case studies provide only weak support for the drawing of scientifi c conclusions. They may, however, be useful for providing ideas for future, more controlled research.

Observational Research

(Naturalistic Methods) Observational Research

Observational research involves making observations of behavior and recording those observations in an objective manner. The observational approach is the oldest method of conducting research and is used routinely in psychology, anthropology, sociology, and many other fields. 
        Let’s consider an observational study. To observe the behavior of individuals at work, industrial psychologist Roy (1959–1960) took a job in a factory where raincoats were made. The job entailed boring, repetitive movements (punching holes in plastic sheets using large stamping machines) and went on eight hours a day, five days a week. There was nothing at all interesting about the job, and Roy was uncertain how the employees, some of whom had been there for many years, could stand the monotony. 
           In his first few days on the job Roy did not notice anything particularly unusual. However, as he carefully observed the activities of the other employees over time, he began to discover that they had a series of “pranks” that they played on and with each other. For instance, every time “Sammy” went to the drinking fountain, “Ike” turned off the power on “Sammy’s” machine. And whenever “Sammy” returned, he tried to stamp a piece before “discovering” that the power had been turned off. He then acted angrily toward “Ike,” who in turn responded with a shrug and a smirk. 
              In addition to this event, which occurred several times a day, Roy also noted many other games that the workers effectively used to break up the day. At 11:00 “Sammy” would yell, “Banana time!” and steal the banana out of “Ike’s” lunch pail, which was sitting on a shelf. Later in the morning “Ike” would open the window in front of “Sammy’s” machine, letting in freezing cold air. “Sammy” would protest and close the window. At the end of the day, “Sammy” would quit two minutes early, drawing fire from the employees’ boss, who nevertheless let the activity occur day after day. 
               Although Roy entered the factory expecting to fi nd only a limited set of mundane observations, he actually discovered a whole world of regular, complicated, and, to the employees, satisfying activities that broke up the monotony of their everyday work existence. This represents one of the major advantages of naturalistic research methods. Because the data are rich, they can be an important source of ideas. 
             In this example, because the researcher was working at a stamping machine and interacting with the other employees, he was himself a participant in the setting being observed. When a scientist takes a job in a factory, joins a religious cult (Festinger, Riecken, & Schachter, 1956), or checks into a mental institution (Rosenhan, 1973), he or she becomes part of the setting itself. Other times, the scientist may choose to remain strictly an observer of the setting, such as when he or she views children in a classroom from a corner without playing with them, watches employees in a factory from behind a one-way mirror, or observes behavior in a public restroom (Humphreys, 1975). 
             In addition to deciding whether to be a participant, the researcher must also decide whether to let the people being observed know that the observation is occurring—that is, to be acknowledged or unacknowledged to the population being studied. Because the decision about whether to be participant or nonparticipant can be independent of the decision to be acknowledged or unacknowledged, there are, as shown in Table 7.1, altogether four possible types of observational research designs. There are advantages and disadvantages to each approach, and the choice of which to use will be based on the goals of the research, the ability to obtain access to the population, and ethical principles. 

The Unacknowledged Participant

          One approach is that of the unacknowledged participant. When an observer takes a job in a factory, as Roy did, or infi ltrates the life of the homeless in a city, without letting the people being observed know about it, the observer has the advantage of concealment. As a result, she or he may be able to get close to the people being observed and may get them to reveal personal or intimate information about themselves and their social situation, such as their true feelings about their employers or their reactions to being on the street. The unacknowledged participant, then, has the best chance of really “getting to know” the people being observed.
           Of course, becoming too close to the people being studied may have negative effects as well. For one thing, the researcher may have diffi culty remaining objective. The observer who learns people’s names, hears intimate accounts of their lives, and becomes a friend may fi nd his or her perception shaped more by their point of view than by a more objective, scientific one. Alternatively, the observer may dislike the people whom he or she is observing, which may create a negative bias in subsequent analysis and reporting of the data. 
          The use of an unacknowledged participant strategy also poses ethical dilemmas for the researcher. For one thing, the people being observed may never be told that they were part of a research project or may fi nd it out only later. This may not be a great problem when the observation is conducted in a public arena, such as a bar or a city park, but the problem may be greater when the observation is in a setting where people might later be identifi ed, with potential negative consequences to them. For instance, if a researcher takes a job in a factory and then writes a research report concerning the true feelings of the employees about their employers, management may be able to identify the individual workers from these descriptions.

   Another disadvantage of the unacknowledged participant approach is that the activities of the observer may infl uence the process being observed. This may happen, for instance, when an unacknowledged participant is asked by the group to contribute to a group decision. Saying nothing would “blow one’s cover,” but making substantive comments would change the nature of the group itself. Often the participant researcher will want to query the people being observed in order to gain more information about why certain behaviors are occurring. Although these questions can reveal the underlying nature of the social setting, they may also alter the situation itself. 

The Acknowledged Participant 

        In cases where the researcher feels that it is unethical or impossible to hide his or her identity as a scientist, the acknowledged participant approach can be used. Sociologist W. F. Whyte (1993) used this approach in his classic sociological study of “street corner society.” Over a period of a year, Whyte got to know the people in, and made extensive observations of, a neighborhood in a New England town. He did not attempt to hide his identity. Rather, he announced freely that he was a scientist and that he would be recording the behavior of the individuals he observed. Sometimes this approach is necessary, for instance, when the behavior the researcher wants to observe is difficult to gain access to. To observe behavior in a corporate boardroom or school classroom, the researcher may have to gain offi cial permission, which may require acknowledging the research to those being observed. 

          The largest problem of being acknowledged is reactivity. Knowing that the observer is recording information may cause people to change their speech and behavior, limit what they are willing to discuss, or avoid the researcher altogether. Often, however, once the observer has spent some time with the population of interest, people tend to treat him or her as a real member of the group. This happened to Whyte. In such situations, the scientist may let this habituation occur over a period of time before beginning to record observations. 

Acknowledged and Unacknowledged Observers

The researcher may use a nonparticipant approach when he or she does not want to or cannot be a participant of the group being studied. In these cases, the researcher observes the behavior of interest without actively participating in the ongoing action. This occurs, for instance, when children are observed in a classroom from behind a one-way mirror or when clinical psychologists videotape group therapy sessions for later analysis. One advantage of not being part of the group is that the researcher may be more objective because he or she does not develop close relationships with the people being observed. Being out of the action also leaves the observer more time to do the job he or she came for—watching other people and recording relevant data. 
        The nonparticipant observer is relieved of the burdensome role of acting like a participant and maintaining a “cover,” activities that may take substantial effort. The nonparticipant observer may be either acknowledged or unacknowledged. Again, there are pros and cons to each, and these generally parallel the issues involved with the participant observer. Being acknowledged can create reactivity, whereas being unacknowledged may be unethical if it violates the confidentiality of the data. These issues must be considered carefully, with the researcher reviewing the pros and cons of each approach before beginning the project.

Naturalistic Research

          (Naturalistic Methods) Naturalistic Research

Naturalistic research is designed to describe and measure the behavior of people or animals as it occurs in their everyday lives. The behavior may be measured as it occurs, or it could already have been recorded by others, or it may be recorded on videotape to be coded at a later time. In any case, however, because it involves the observation of everyday behavior, a basic diffi - culty results—the rich and complex data that are observed must be organized into meaningful measured variables that can be analyzed. One of the goals of
this chapter is to review methods for turning observed everyday behavior into measured variables. 
         Naturalistic research approaches are used by researchers in a variety of disciplines, and the data that form the basis of naturalistic research methods can be gathered from many different sources in many different ways. These range from a clinical psychologist’s informal observations of his or her clients, to another scientist’s more formal observations of the behaviors of animals in the wild, to an analysis of politicians’ speeches, to a videotaping of children playing with their parents in a laboratory setting. Although these approaches frequently involve qualitative data, there are also techniques for turning observations into quantitative data, and we will discuss both types in this chapter. 
          In many cases, naturalistic research is the only possible approach to collecting data. For instance, whereas researchers may not be able to study the impact of earthquakes, fl oods, or cult membership using experimental research designs, they may be able to use naturalistic research designs to collect a wide variety of data that can be useful in understanding such phenomena. 
         One particular advantage of naturalistic research is that it has ecological validity. Ecological validity refers to the extent to which the research is conducted in situations that are similar to the everyday life experiences of the participants (Aronson & Carlsmith, 1968). In naturalistic research the people whose behavior is being measured are doing the things they do every day, and in some cases they may not even know that their behavior is being recorded. In these cases, reactivity is minimized and the construct validity of the measures should therefore be increased.

Naturalistic Methods

Naturalistic Methods

As we have seen in Chapter 6, self-report measures have the advantage of allowing the researcher to collect a large amount of information from the respondents quickly and easily. On the other hand, they also have the potential of being inaccurate if the respondent does not have access to, or is unwilling to express, his or her true beliefs. And we have seen in Chapter 4 that behavioral measures have the advantage of being more natural and thus less infl uenced by reactivity. In this chapter, we discuss descriptive research that uses behavioral measures. As we have seen in Chapter 1, descriptive research may be conducted either qualitatively—in which case the goal is to describe the observations in detail and to use those descriptions as the results, or quantitatively— in which the data is collected using systematic methods and the data are analyzed using statistical techniques. Keep in mind as you read the chapter that, as with most descriptive research, the goal is not only to test research  hypotheses, but also to develop ideas for topics that can be studied later using other types of research designs. However, as with survey research, naturalistic methods can also be used to create measured variables for use in correlational and experimental tests of research hypotheses.


Current Research in the Behavioral Sciences: Assessing Americans’ Attitudes Toward Health Care


(Reliability and Validity) Current Research in the Behavioral Sciences: Assessing Americans’ Attitudes Toward Health Care

Because so many opinion polls are now conducted, and many of their results are quickly put online, it is now possible to view the estimated opinions of large populations in almost real time. For instance, as I write these words in July, 2009, I can visit the CBS news website and see the results of a number of  recent polls regarding the opinions of U.S. citizens about a variety of national issues. 
        One poll, reported at http://www.cbsnews.com/htdocs/pdf/jul09b_ health_care-AM.pdf used a random sample of 1,050 adults nationwide in the United States, who were interviewed by telephone on July 24–28, 2009. The phone numbers were dialed from random digit dial samples of both standard landline and cell phones. The error due to sampling for results based on the entire sample is plus or minus three percentage points, although the error for subgroups is higher. 
            The polls provide a snapshot of the current state of thinking in U.S. citizens about health care reform. Here are some findings: 
          In response to the question “Will health care reform happen in 2009?” most Americans see health care reform as likely, although just 16 percent call it “very” likely. Four in 10 think it is not likely this year. 

Very likely 16% 
Somewhat likely 43% 
Not likely 40% 

        However, many Americans don’t see how they would personally benefi t from the health care proposals being considered. In response to the question, “Would the current congressional reform proposals help you? 59 percent say those proposals—as they understand them—would not help them directly. Just under a third says current plans would. 

Yes 31%
No 59%

         By a 2 to 1 margin, Americans feel President Obama has better ideas for reforming health care than Congressional Republicans. Views on this are partisan, but independents side with the President.
          The question asked was “Who has better ideas for health care reform?” Here are the results overall, as well as separately for Democrats, Republicans, and Independents:

                                        Overall         Democrats          Republicans           Independents

President Obama                 55%               81%                      27%                     48%
Republicans                          26%              10%                      52%                     26%

But, as you can see in the responses to the following question, Mr. Obama’s approval rating on handling the overall issue remains under  50 percent, and many still don’t have a view yet:
          “Do you approve or disapprove of President Obama’s health care plans?”
           Approve           46%
           Disapprove       38%
           Don’t know      16%

Summarizing the Sample Data

(Reliability and Validity)  Summarizing the Sample Data

You can well imagine that once a survey has been completed, the collected data (known as the raw data) must be transformed in a way that will allow them to be meaningfully interpreted. The raw data are, by themselves, not very useful for gaining the desired snapshot because they contain too many numbers. For example, if we interview 500 people and ask each of them forty questions, there will be 20,000 responses to examine. In this section we will consider some of the statistical methods used to summarize sample data. Procedures for using computer software programs to conduct statistical analyses are reviewed in Appendix B, and you may want to read this material at this point.

Frequency Distributions

Table 6.1 presents some hypothetical raw data from twenty-fi ve participants on five variables collected in a sort of “minisurvey.” You can see that the table is arranged such that the variables (sex, ethnic background, age, life satisfaction, family income) are in the columns and the participants form the rows. For nominal variables such as sex or ethnicity, the data can be summarized through the use of a frequency distribution. A frequency distribution is a table that indicates how many, and in most cases what percentage, of individuals in the sample fall into each of a set of categories. A frequency distribution of the ethnicity variable from Table 6.1 is shown in Figure 6.1(a). The frequency distribution can be displayed visually in a bar chart, as shown for the ethnic background variable in Figure 6.1(b). The characteristics of the sample are easily seen when summarized through a frequency distribution or a bar chart. 

































          One approach to summarizing a quantitative variable is to combine adjacent  values into a set of categories and then to examine the frequencies of each of the categories. The resulting distribution is known as a grouped frequency distribution. A grouped frequency distribution of the age variable from Table 6.1 is shown in Figure 6.2(a). In this case, the ages have been grouped into fi ve categories (less than 21, 21–30, 31–40, 41–50, and greater than 50).

The grouped frequency distribution may be displayed visually in the form of a histogram, as shown in Figure 6.2(b). A histogram is slightly different from a bar chart because the bars are drawn so that they touch each other. This indicates that the original variable is quantitative. If the frequencies of the groups are indicated with a line, rather than bars, as shown in Figure 6.2(c), the display is called a frequency curve.
        

             One limitation of grouped frequency distributions is that grouping the values together into categories results in the loss of some information. For instance, it is not possible to tell from the grouped frequency distribution in Figure 6.2(a) exactly how many people in the sample are twenty-three years old. A stem and leaf plot is a method of graphically summarizing the raw data such that the original data values can still be seen. A stem and leaf plot of the age variable from Table 6.1 is shown in Figure 6.3. 


Descriptive Statistics 

Descriptive statistics are numbers that summarize the pattern of scores observed on a measured variable. This pattern is called the distribution of the variable. Most basically, the distribution can be described in terms of its central tendency—that is, the point in the distribution  around which the data are centered—and its dispersion, or spread. As we will see, central tendency is summarized through the use of descriptive statistics such as the mean, the median, and the mode, and dispersion is summarized through the use of the variance and the standard deviation. Figure 6.4 shows a printout from the IBM Statistical Package for the Social Sciences (IBM SPSS) software of the descriptive statistics for the quantitative variables in Table 6.1.

Measures of Central Tendency. The arithmetic average, or arithmetic mean,
is the most commonly used measure of central tendency. It is computed by summing all of the scores on the variable and dividing this sum by the number of participants in the distribution (denoted by the letter N). The sample mean is sometimes denoted with the symbol x– , read as “X-Bar,” and may also be indicated by the letter M. As you can see in Figure 6.4, in our sample, the mean age of the twenty-fi ve students is 33.52. In this case, the mean provides  an accurate index of the central tendency of the age variable because if you look at the stem and leaf plot in Figure 6.3, you can see that most of the ages are centered at about thirty-three.
          The pattern of scores observed on a measured variable is known as the variable’s distribution. It turns out that most quantitative variables have distributions similar to that shown in Figure 6.5(a). Most of the data are located near the center of the distribution, and the distribution is symmetrical and bell-shaped. Data distributions that are shaped like a bell are known as normal distributions.
          In some cases, however, the data distribution is not symmetrical. This occurs when there are one or more extreme scores (known as outliers) at one end of the distribution. For instance, because there is an outlier in the family income variable in Table 6.1 (a value of $2,800,000), a frequency curve of this variable would look more like that shown in Figure 6.5(b) than that shown in Figure 6.5(a). Distributions that are not symmetrical are said to be skewed. As shown in Figure 6.5(b) and (c), distributions are said to be either positively skewed or negatively skewed, depending on where the outliers fall. 
         Because the mean is highly infl uenced by the presence of outliers, it is not a good measure of central tendency when the distribution is highly skewed. For instance, although it appears from Table 6.1 that the central tendency of the family income variable should be around $40,000, the mean family income is actually $159,920. The single very extreme income has a disproportionate impact on the mean, resulting in a value that does not well represent the central tendency. 
          The median is used as an alternative measure of central tendency when distributions are skewed. The median is the score in the center of the distribution, meaning that 50 percent of the scores are greater than the median and 50 percent of the scores are lower than the median. Methods for calculating the median are presented in Appendix B. In our case, the median household income ($43,000) is a much better indication of central tendency than is the mean household income ($159,920). 
       A final measure of central tendency, known as the mode, represents the value that occurs most frequently in the distribution. You can see from Table 6.1 that the modal value for the income variable is $43,000 (it occurs four times). In some cases there can be more than one mode. For instance, the age variable has modes at 18, 19, 31, 33, and 45. Although the mode does represent central tendency, it is not frequently used in scientific research. The relationships among the mean, the median, and the mode are described in Figure 6.5. 



































Measures of Dispersion. In addition to summarizing the central tendency of a distribution, descriptive statistics convey information about how the scores on the variable are spread around the central tendency. Dispersion refers to the extent to which the scores are all tightly clustered around the central tendency, like this:

          One simple measure of dispersion is to find the largest (the maximum) and the smallest (the minimum) observed values of the variable and to compute the range of the variable as the maximum observed score minus the minimum observed score. You can check that the range of the age variable is 63 2 18 5 45. 
        The standard deviation, symbolized as s, is the most commonly used measure of dispersion. As discussed in more detail in Appendix B, computation of the standard deviation begins with the calculation of a mean deviation score for each individual. The mean deviation is the score on the variable minus the mean of the variable. Individuals who score above the mean have positive deviation scores, whereas those who score below the mean have negative deviation scores. The mean deviations are squared and summed to produce a statistic called the sum of squared deviations, or sum of squares. The sum of squares is divided by the sample size (N) to produce a statistic known as the variance, symbolized as s2. The square root of the variance is the standard deviation, s. Distributions with a larger standard deviation have more spread. As you can see from Figure 6.4, the standard deviation of the age variable in Table 6.1 is 12.51.

Sampling and Generalization

             
            (Surveys and Sampling) Sampling and Generalization

We have seen that surveys are conducted with the goal of creating an accurate picture of the current attitudes, beliefs, or behaviors of a large group of people. In some rare cases it is possible to conduct a census—that is, to measure each person about whom we wish to know. In most cases, however, the group of people that we want to learn about is so large that measuring each person is not practical. Thus, the researcher must test some subset of the entire group of people who could have participated in the research. Sampling refers to the selection of people to participate in a research project, usually with the goal of being able to use these people to make inferences about a larger group of individuals. The entire group of people that the researcher desires to learn about is known as the population, and the smaller group of people who actually participate in the research is known as the sample


Definition of the Population 

          The population of interest to the researcher must be defi ned precisely. For instance, some populations of interest to a survey researcher might be “all citizens of voting age in the United States who plan to vote in the next election,” “all students currently enrolled full time at the University of Chicago,” or “all Hispanic Americans over forty years of age who live within the Baltimore city limits.” In most cases the scientist does not particularly care about the characteristics of the specifi c people chosen to be in the sample. Rather, the scientist uses the sample to draw inferences about the population as a whole (just as a medical researcher analyzes a sample to make inferences about blood that was not sampled). 
        Whenever samples are used to make inferences about populations, the researcher faces a basic dilemma—he or she will never be able to know exactly what the true characteristics of the population are because all of the members of the population cannot be contacted. However, this is not really as big a problem as it might seem if the sample can be assumed to be representative of the population. A representative sample is one that is approximately the same as the population in every important respect. For instance, a representative sample of the population of students at a college or university would contain about the same proportion of men, sophomores, and engineering majors as are in the college itself, as well as being roughly equivalent to the population on every other conceivable characteristic. 

Probability Sampling 

         To make the sample representative of the population, any of several probability sampling techniques may be employed. In probability sampling, procedures are used to ensure that each person in the population has a known chance of being selected to be part of the sample. As a result, the likelihood that the sample is representative of the population is increased, as is the ability to use the sample to draw inferences about the population.  

Simple Random Sampling. The most basic probability sample is drawn using simple random sampling. In this case, the goal is to ensure that each person in the population has an equal chance of being selected to be in the sample. To draw a simple random sample, an investigator must first have a complete list (known as a sampling frame) of all of the people in the population. For instance, voting registration lists may be used as a sampling frame, or telephone numbers of all of the households in a given geographic location may be used. The latter list will basically represent the population that lives in that area because almost all U.S. households now have a telephone. Recent advances in survey methodology allow researchers to include cell phone numbers in their sampling frame as well. 
         Then the investigator randomly selects from the frame a sample of a given number of people. Let’s say you are interested in studying volunteering behavior of the students at your college or university, and you want to collect a random sample of 100 students. You would begin by finding a list of all of the students currently enrolled at the college. Assume that there are 7,000 names on this list, numbered sequentially from 1 to 7,000. Then, as shown in the instructions for using Statistical Table A (in Appendix E), you could use a random number table (or a random number generator on a computer) to produce 100 numbers that fall between 1 and 7,000 and select those 100 students to be in your sample. 

Systematic Random Sampling. If the list of names on the sampling frame is itself known to be in a random sequence, then a probability sampling procedure known as systematic random sampling can be used. In your case, because you wish to draw a sample of 100 students from a population of 7,000 students, you will want to sample 1 out of every 70 students (100/7,000 5 1/70). To create the systematic sample, you first draw a random number between 1 and 70 and then sample the person on the list with that number. You create the rest of the sample by taking every seventieth person on the list after the initial person. For instance, if the fi rst person sampled was number 32, you would then sample number 102, 172, and so on. You can see that it is easier to use systematic sampling than simple random sampling because only one initial number has to be chosen at random. 

Stratified Sampling. Because in most cases sampling frames include such information about the population as sex, age, ethnicity, and region of residence, and because the variables being measured are frequently expected to differ across these subgroups, it is often useful to draw separate samples from each of these subgroups rather than to sample from the population as a whole. The subgroups are called strata, and the sampling procedure is known as stratified sampling
          To collect a proportionate stratifi ed sample, frames of all of the people within each strata are fi ,rst located, and random samples are drawn from within each of the strata. For example, if you expected that volunteering rates would be different for students from different majors, you could fi rst make separate lists of the students in each of the majors at your school and then randomly sample from each list. One outcome of this procedure is that the different majors are guaranteed to be represented in the sample in the same proportion that they are represented in the population, a result that might not occur if you had used random sampling. Furthermore, it can be shown mathematically that if volunteering behavior does indeed differ among the strata, a stratified sample will provide a more precise estimate of the population characteristics than will a simple random sample (Kish, 1965). 
        
          Disproportionate stratified sampling is frequently used when the strata differ in size and the researcher is interested in comparing the characteristics of the strata. For instance, in a class of 7,000 students, only 10 or so might be French majors. If a random sample of 100 students was drawn, there might not be any French majors in the sample, or at least there would be too few to allow a researcher to draw meaningful conclusions about them. In this case, the researcher draws a sample that includes a larger proportion of some strata than they are actually represented in the population. This procedure is called oversampling and is used to provide large enough samples of the strata of interest to allow analysis. Mathematical formulas are used to determine the optimum size for each of the strata. 

Cluster Sampling. Although simple and stratifi ed sampling can be used to create representative samples when there is a complete sampling frame for the population, in some cases there is no such list. For instance, there is no single list of all of the currently matriculated college students in the United States. In these cases an alternative approach known as cluster sampling can be used. The technique is to break the population into a set of smaller groups (called clusters) for which there are sampling frames and then to randomly choose some of the clusters for inclusion in the sample. At this point, every person in the cluster may be sampled, or a random sample of the cluster may be drawn. 
         Often the clustering is done in stages. For instance, we might fi rst divide the United States into regions (for instance, East, Midwest, South, Southwest, and West). Then we would randomly select states from each region, counties from each state, and colleges or universities from each county. Because there is a sampling frame of the matriculated students at each of the selected colleges, we could draw a random sample from these lists. In addition to allowing a representative sample to be drawn when there is no sampling frame, cluster sampling is convenient. Once we have selected the clusters, we need only contact the students at the selected colleges rather than having to sample from all of the colleges and universities in the United States. In cluster sampling, the selected clusters are used to draw inferences about the nonselected ones. Although this practice loses some precision, cluster sampling is frequently used because of convenience. 

Sampling Bias and Nonprobability Sampling 

The advantage of probability sampling methods is that their samples will be representative and thus can be used to draw inferences about the characteristics of the population. Although these procedures sound good in theory,in practice it is difficult to be certain that the sample is truly representative. Representativeness requires that two conditions be met. First, there must be one or more sampling frames that list the entire population of interest, and second, all of the selected individuals must actually be sampled. When either of these conditions is not met, there is the potential for sampling bias. This occurs when the sample is not actually representative of the population because the probability with which members of the population have been selected for participation is not known. 
        Sampling bias can arise when an accurate sampling frame for the population of interest cannot be obtained. In some cases there is an available sampling frame, but there is no guarantee that it is accurate. The sampling frame may be inaccurate because some members of the population are missing or because it includes some names that are not actually in the population. College student directories, for instance, frequently do not include new students or those who requested that their name not be listed, and these directories may also include students who have transferred or dropped out. 
           In other cases there simply is no sampling frame. Imagine attempting to obtain a frame that included all of the homeless people in New York City or all of the women in the United States who are currently pregnant with their first child. In cases where probability sampling is impossible because there is no available sampling frame, nonprobability samples must be used. To obtain a sample of homeless individuals, for instance, the researcher will interview individuals on the street or at a homeless shelter. One type of nonprobability sample that can be used when the population of interest is rare or difficult to reach is called snowball sampling. In this procedure one or more individuals from the population are contacted, and these individuals are used to lead the researcher to other population members. Such a technique might be used to locate homeless individuals. Of course, in such cases the potential for sampling bias is high because the people in the sample may be different from the people in the population. Snowball sampling at homeless shelters, for instance, may include a greater proportion of people who stay in shelters and a smaller proportion of people who do not stay in shelters than are in the population. This is a limitation of nonprobability sampling, but one that the researcher must live with because there is no possible probability sampling method that can be used. 
            Even if a complete sampling frame is available, sampling bias can occur if all members of the random sample cannot be contacted or cannot be convinced to participate in the survey. For instance, people may be on vacation, they may have moved to a different address, or they may not be willing to complete the questionnaire or interview. When a questionnaire is mailed, the response rate may be low. In each of these cases the potential for sampling bias exists because the people who completed the survey may have responded differently than would those who could not be contacted. 
       Nonprobability samples are also frequently found when college students are used in experimental research. Such samples are called convenience samples because the researcher has sampled whatever individuals were  readily available without any attempt to make the sample representative of a population. Although such samples can be used to test research hypotheses, they may not be used to draw inferences about populations. We will discuss the use of convenience samples in experimental research designs more fully in Chapter 13. 
            Whenever you read a research report, make sure to determine what sampling procedures have been used to select the research participants. In some cases, researchers make statements about populations on the basis of nonprobability samples, which are not likely to be representative of the population they are interested in. For instance, polls in which people are asked to call a 900 number or log on to a website to express their opinions on a given topic may contain sampling bias because people who are in favor of (or opposed to) the issue may have more time or more motivation to do so. Whenever the respondents, rather than the researchers, choose whether to be part of the sample, sampling bias is possible. The important thing is to remain aware of what sampling techniques have been used and to draw your own conclusions accordingly.

Saturday, 29 June 2013

Surveys



           (Surveys and Sampling) Surveys

A survey is a series of self-report measures administered either through an interview or a written questionnaire. Surveys are the most widely used method of collecting descriptive information about a group of people. You may have received a phone call (it usually arrives in the middle of the dinner hour when most people are home) from a survey research group asking you about your taste in music, your shopping habits, or your political preferences. 
The goal of a survey, as with all descriptive research, is to produce a “snapshot” of the opinions, attitudes, or behaviors of a group of people at a given time. Because surveys can be used to gather information about a wide variety of information in a relatively short time, they are used extensively by businesspeople, advertisers, and politicians to help them learn what people think, feel, or do. 

Interviews

       Surveys are usually administered in the form of an interview, in which questions are read to the respondent in person or over the telephone. One advantage of in-person interviews is that they may allow the researcher to develop a close rapport and sense of trust with the respondent. This may motivate the respondent to continue with the interview and may lead to more honest and open responding. However, face-to-face interviews are extremely expensive to conduct, and consequently telephone surveys are now more common. In a telephone interview all of the interviewers are located in one place, the telephone numbers are generated automatically, and the questions are read from computer terminals in front of the researchers. This procedure provides such effi ciency and coordination among the interviewers that many surveys can be conducted in one day. 

Unstructured Interviews. Interviews may use either free-format or fixedformat self-report measures. In an unstructured interview the interviewer talks freely with the person being interviewed about many topics. Although a general list of the topics of interest is prepared beforehand, the actual interview focuses in on those topics that the respondent is most interested in or most knowledgeable about. Because the questions asked in an unstructured interview differ from respondent to respondent, the interviewer must be trained to ask questions in a way that gets the most information from the respondent and allows the respondent to express his or her true feelings. One type of a face-to-face unstructured interview in which a number of people are interviewed at the same time and share ideas both with the interviewer and with each other is called a focus group.
        Unstructured interviews may provide in-depth information about the particular concerns of an individual or a group of people, and thus, may produce ideas for future research projects or for policy decisions. It is, however, very diffi cult to adequately train interviewers to ask questions in an unbiased manner and to be sure that they have actually done so. And, as we have seen in Chapter 4, because the topics of conversation and the types of answers given in free-response formats vary across participants, the data are diffi cult to objectively quantify and analyze, and are therefore frequently treated qualitatively. 

Structured Interviews. Because researchers usually want more objective data, the structured interview, which uses quantitative fi xed-format items, is most common. The questions are prepared ahead of time, and the interviewer reads the questions to the respondent. The structured interview has the advantage over an unstructured interview of allowing better comparisons of the responses across different individuals because the questions, time frame, and response format are controlled to be the same for each respondent. 

Questionnaires

A questionnaire is a set of fi xed-format, self-report items that is completed by respondents at their own pace, often without supervision. Questionnaires are generally cheaper than interviews because a researcher can mail the questionnaires to many people or have them complete the questionnaires in large groups. Questionnaires may also produce more honest responses than interviews, particularly when the questions involve sensitive issues such as sexual activity or annual income, because respondents are more likely to perceive their responses as being anonymous than they are in interviews. In comparison to interviews, questionnaires are also likely to be less infl uenced by the characteristics of the experimenter. For instance, if the topic concerns race-related attitudes, how the respondent answers might depend on the race of the interviewer and how the respondent thinks the interviewer wants him or her to respond. Because the experimenter is not present when a questionnaire is completed, or at least is not directly asking the questions, such problems are less likely.

The Response Rate. Questionnaires are free of some problems that may occur in interviews, but they do have their own set of diffi culties. Although people may be likely to return surveys that have direct relevance to them (for instance, a survey of college students conducted by their own university), when mailings are sent to the general population, the response rate (that is, the percentage of people who actually complete the questionnaire and return it to the investigator) may not be very high. This may lead to incorrect conclusions because the people who return the questionnaire may respond differently than those who don’t return it would have. Investigators can sometimes increase response rates by providing gifts or monetary payments for completing the survey, by making the questionnaire appear brief and interesting, by ensuring the confi dentiality of all of the data, and by emphasizing the importance of the individual in the research (Dillman, 1978). Follow-up mailings can also be used to remind people that they have not completed the questionnaire, with the hope that they will then do so. 


Question Order. Another potential problem with questionnaires that does not occur with interviews is that people may not answer the questions in the order they are written, and the researcher does not know whether or not they have. To take one example, consider these two questions:

1. “How satisfied are you with your relationships with your family?”
2. “How satisfied are you with your relationship with your spouse?”

If the questions are answered in the order that they are presented here, then most respondents interpret the word family in question 1 to include their spouse. If question 2 is answered before question 1, however, the term family in question 1 is interpreted to mean the rest of the family except the spouse. Such variability can create measurement error (Schuman & Presser, 1981; Schwarz & Strack, 1991). 


Use of Existing Survey Data

 Because it is very expensive to conduct surveys, scientists often work together on them. For instance, a researcher may have a small number of questions relevant to his or her research included within a larger survey. Or researchers can access public-domain data sets that contain data from previous surveys. The U.S. Census is probably the largest such data set, containing information on family size, fertility, occupation, and income for the entire U.S. population, as well as a more extensive interview data set of a smaller group of citizens. The General Social Survey is a collection of over 1,000 items given to a sample of U.S. citizens (Davis, Smith, & Marsden, 2000). Because the same questions are asked each year the survey is given, comparisons can be made over time. Sometimes these data sets are given in comparable forms to citizens of different countries, allowing cross-cultural comparisons. One such data set is the Human Area Relations Files. Indexes of some of the most important social science databases can be found in Clubb, Austin, Geda, and Traugott (1985).

Surveys and Sampling


Surveys and Sampling

Now that we have reviewed the basic types of measured variables and considered how to evaluate their effectiveness at assessing the conceptual variables of interest, it is time to more fully discuss the use of these measures in descriptive research. In this chapter, we will discuss the use of self-report measures, and in Chapter 7, we will discuss the use of behavioral measures. Although these measures are frequently used in a qualitative sense—to draw  a complete and complex picture in the form of a narrative—they can also be used quantitatively, as measured variables. As you read these chapters, keep in mind that the goal of descriptive research is to describe the current state of affairs but that it does not by itself provide direct methods for testing research hypotheses. However, both surveys (discussed in this chapter) and naturalistic methods (discussed in Chapter 7) are frequently used not only as descriptive data but also as the measured variables in correlational and experimental tests of research hypotheses. We will discuss these uses in later chapters.


Current Research in the Behavioral Sciences: The Hillyer-Joynes Kinematics Scale of Locomotion in Rats With Spinal Injuries

Jessica Hillyer and Robin L. Joynes conduct research on animals with injuries to their spinal cords, with the goal of helping learn how organisms, including humans, may be able to improve their physical movements (locomotion) after injury. One difficulty that they noted in their research with rats was that the existing measure of locomotion (the BBB Locomotor Rating Scale, (BBB), Basso, Beattie, & Bresnahan, 1995) was not sophisticated enough to provide a clear measure of locomotion skills. They therefore decided to create their own, new, measure, which they called the Hillyer-Joynes Kinematics Scale of Locomotion (HiJK). Their measure was designed to assess the locomotion abilities of rats walking on treadmills. 
        The researchers began by videotaping 137 rats with various degrees of spinal cord injuries as they walked on treadmills. Then three different coders viewed each of the videotapes on a subset of twenty of the rats. For each of these 20 rats, the coders rated the walking skills of the rats on eight different dimensions: Extension of the Hip, Knee, and Ankle joints, Fluidity of the joint movement, Alternation of the legs during movement, Placement of the feet, Weight support of the movement and Consistency of walking. 
      Once the raters had completed their ratings, the researchers tested for interrater reliability, to see if the three raters agreed on their coding of each of the five categories that they had rated. Overall, they found high interrater reliability, generally with r’s over .9. For instance, for the ratings of foot placement, the correlations among the three coders were as follows:

                   Rater 1             Rater 2
Rater 2          .95
Rater 3          .95                 .99

       The researchers then had one of the three raters rate all 137 of the rats on the 8 subscales. On the basis of this rater’s judgments, they computed  the overall reliability of the new measure, using each of the eight rated dimensions as an item in the scale. The Cronbach’s alpha for the composite scale, based on 8 items and 137 rats was a 5 .86, denoting acceptable reliability.

         Having determined that their new measure was reliable, the researchers next turned to study the validity of the scale. The researchers found that the new measure correlated signifi cantly with scores on the existing measure of locomotion, the BBB Locomotor Rating Scale, suggesting that it was measuring the locomotion of the rats in a similar way that it did.

         Finally, the researchers tested for predictive validity, by correlating both the BBB and the HiJK with a physiological assessment of the magnitude of each of the rat’s spinal cord injuries. The researchers found that the HiJK was better able to predict the nature of the rats’ injuries than was the BBB, suggesting that the new measure may be a better measure than the old one. ( Reliability and Validity )

Comparing Reliability and Validity

 Comparing Reliability and Validity

We have seen that reliability and construct validity are similar in that they are both assessed through examination of the correlations among measured variables. However, they are different in the sense that reliability refers to correlations among different variables that the researcher is planning to combine into the same measure of a single conceptual variable, whereas construct validity refers to correlations of a measure with different measures of other conceptual variables. In this sense, it is appropriate to say that reliability comes before validity because reliability is concerned with creating a measure that is then tested in relationship to other measures. If a measure is not reliable, then its construct validity cannot be determined. Tables 5.1 and 5.2 summarize the various types of reliability and validity that researchers must consider. 
            One important question that we have not yet considered is “How reliable and valid must a scale be in order to be useful?” Researchers do not always agree about the answer, except for the obvious fact that the higher the reliability and the construct validity, the better. One criterion that seems reasonable  is that the reliability of a commonly used scale should be at least a 5 .70. However, many tests have reliabilities well above a 5 .80. 
           In general, it is easier to demonstrate the reliability of a measured variable than it is to demonstrate a variable’s construct validity. This is so in part because demonstrating reliability involves only showing that the measured variables correlate with each other, whereas validity involves showing both convergent and discriminant validity. Also, because the items on a scale are all answered using the same response format and are presented sequentially, and because items that do not correlate highly with the total scale score can be deleted, high reliabilities are usually not diffi cult to achieve. 
           However, the relationships among different measures of the same conceptual variable that serve as the basis for demonstrating convergent validity are generally very low. For instance, the correlations observed by Snyder were only in the range of .40, and such correlations are not unusual. Although correlations of such size may seem low, they are still taken as evidence for convergent validity. 
          One of the greatest diffi culties in developing a new scale is to demonstrate its discriminant validity. Although almost any new scale that you can imagine will be at least moderately correlated with at least some other existing scales, to be useful, the new scale must be demonstrably different from existing scales in at least some critical respects. Demonstrating this uniqueness is difficult and will generally require that a number of different studies be conducted. 
         Because there are many existing scales in common use within the behavioral sciences, carefully consider whether you really need to develop a new scale for your research project. Before you begin scale development, be sure to determine if a scale assessing the conceptual variable you are interested in, or at least a similar conceptual variable, might already exist. A good source for information about existing scales, in addition to PsycINFO®, is Robinson, Shaver, and Wrightsman (1991). Remember that it is always advantageous to use an existing measure rather than to develop your own— the reliability and validity of such measures are already established, saving you a lot of work.

Improving the Reliability and Validity of Measured Variables

(Reliability and Validity) Improving the Reliability and Validity of Measured Variables

Now that we have considered some of the threats to the validity of measured variables, we can ask how our awareness of these potential threats can help us improve our measures. Most basically, the goal is to be aware of the potential diffi culties and to keep them in mind as we design our measures. Because the research process is a social interaction between researcher and participant, we must carefully consider how the participant perceives the research and consider how she or he may react to it. The following are some useful tips for creating valid measures:

1.    Conduct a pilot test. Pilot testing involves trying out a questionnaire or other research on a small group of individuals to get an idea of how they react to it before the fi nal version of the project is created. After collecting the data from the pilot test, you can modify the measures before actually using the scale in research. Pilot testing can help ensure that participants understand the questions as you expect them to and that they cannot  guess the purpose of the questionnaire. You can also use pilot testing to create self-report measures. You ask participants in the pilot study to generate thoughts about the conceptual variables of interest. Then you use these thoughts to generate ideas about the types of items that should be asked on a fi xed-format scale. 

2.      Use multiple measures. As we have seen, the more types of measures are used to assess a conceptual variable, the more information about the variable is gained. For instance, the more items a test has, the more reliable it will be. However, be careful not to make your scale so long that your participants lose interest in taking it! As a general guideline, twenty items are usually suffi cient to produce a highly reliable measure. 

3.       Ensure variability within your measures. If 95 percent of your participants answer an item with the response 7 (strongly agree) or the response 1 (strongly disagree), the item won’t be worth including because it won’t differentiate the respondents. One way to guarantee variability is to be sure that the average response of your respondents is near the middle of the scale. This means that although most people fall in the middle, some people will fall above and some below the average. Pilot testing enables you to create measures that have variability. 

4.         Write good items. Make sure that your questions are understandable and not ambiguous. This means the questions shouldn’t be too long or too short. Try to avoid ambiguous words. For instance, “Do you regularly feel stress?” is not as good as “How many times per week do you feel stress?” because the term regular is ambiguous. Also watch for “double-barreled” questions such as “Are you happy most of the time, or do you fi nd there to be no reason to be happy?” A person who is happy but does not find any real reason for it would not know how to answer this question. Keep your questions as simple as possible, and be specifi c. For instance, the question “Do you like your parents?” is vaguer than “Do you like your mother?” and “Do you like your father?” 

5.         Attempt to get your respondents to take your questions seriously. In the instructions you give to them, stress that the accuracy of their responses is important and that their responses are critical to the success of the research project. Otherwise carelessness may result. 

6.          Attempt to make your items nonreactive. For instance, asking people to indicate whether they agree with the item “I dislike all Japanese people” is unlikely to produce honest answers, whereas a statement such as “The Japanese are using their economic power to hurt the United States” may elicit a more honest answer because the item is more indirect. Of course, the latter item may not assess exactly what you are hoping to measure, but in some cases tradeoffs may be required. In some cases you may wish to embed items that measure something entirely irrelevant (they are called distracter items) in your scale to disguise what you are really assessing. 

7.       Be certain to consider face and content validity by choosing items that seem “reasonable” and that represent a broad range of questions concerning the topic of interest. If the scale is not content valid, you may be evaluating only a small piece of the total picture you are interested in. 

8.       When possible, use existing measures, rather than creating your own, because the reliability and validity of these measures will already be established.