(Measures) Self-Report Measures
In the next sections, we will consider some of the many types of measured variables used in behavioral research. We begin by considering how we might gain information by directly asking someone about his or her thoughts, feelings, or behavior. To do so involves using self-report measures, in which individuals are asked to respond to questions posed by an interviewer or a questionnaire. Then in the following sections we will consider the use of behavioral measures, designed to directly measure what people do.
Free-Format Self-Report Measures
Perhaps the most straightforward use of self-report measures involves asking people to freely list their thoughts or feelings as these come to mind. One of the major advantages of such free-format self-report measures is that they allow respondents to indicate whatever thoughts or feelings they have about the topic, without any constraints imposed on respondents except the effort it takes to write these thoughts or feelings down or speak them into a tape recorder.
Projective Measures. A projective measure is a measure of personality in which an unstructured image, such as an inkblot, is shown to participants, who are asked to freely list what comes to mind as they view the image. One common use of free-format self-report measures is the assessment of personality variables through the use of projective tests such as the Thematic Apperception Test, or TAT (Morgan & Murray, 1935) or the Rorschach inkblots. The TAT, for instance, consists of a number of sketches of people, either alone or with others, who are engaging in various behaviors, such as gazing out a window or pointing at each other. The sketches are shown to individuals, who are asked to tell a story about what is happening in the picture. The TAT assumes that people may be unwilling or unable to admit their true feelings when asked directly but that these feelings will show up in the stories about the pictures. Trained coders read the stories and use them to develop a personality profile of the respondent.
Associative Lists. Free-format response formats in the form of associative lists have also been used to study such variables as stereotyping. In one of these studies (Stangor, Sullivan, & Ford, 1991), college students were presented with the names of different social groups (African Americans, Hispanics, Russians) and asked to list whatever thoughts came to mind about the groups. The study was based on the assumption that the thoughts listed in this procedure would be those that the individual viewed as strongest or most central to the group as a whole and would thus provide a good idea of what the person really thought about the groups. One student listed the following thoughts to describe different social groups:
Whites: “Materialistic and prejudiced.”
Hispanics: “Poor, uneducated, and traditional. Willing to work hard.”
Russians: “Unable to leave their country, even though they want to.”
Think-Aloud Protocols. Another common type of free-format response formats is a think-aloud protocol (Ericsson & Simon, 1980). In this procedure, individuals are asked to verbalize into a tape recorder the thoughts that they are having as they complete a task. For instance, the following protocol was generated by a college student in a social psychology experiment who was trying to form an impression of another person who was characterized by confl icting information (Fiske, Neuberg, Beattie, & Milberg, 1987): “Professor. Strong, close-minded, rowdy, red-necked, loud. Hmmmm. I’ve never met a professor like this. I tend to make a stereotype of a beer-guzzling bigot…. I can sort of picture him sitting in a smoky, white bar, somewhere in, off in the suburbs of Maryland.” The researchers used the think-aloud protocols, along with other data, to understand how people formed impressions about others.
The Difficulties of Coding Free-Format Data. Despite the fact that freeformat self-report measures produce a rich set of data regarding the thoughts and feelings of the people being studied, they also have some disadvantages. Most important, it is very difficult and time-consuming to turn the generated thoughts into a set of measured variables that can be used in data analysis. Because each individual is likely to have used a unique set of thoughts, it is hard to compare individuals. One solution is to simply describe the responses verbally (such as the description of the college professor on this page) and to treat the measures as qualitative data. However, because correlational and experimental research designs require the use of quantitative data (measured variables that can be subjected to statistical analysis), it is frequently useful to convert the free responses into one or more measured variables. For instance, the coders can read the answers given on projective tests and tabulate the extent to which different themes are expressed, or the responses given on associative lists can be tallied into different categories. However, the process of fi tting the free responses into a structured coding system tends to reduce the basic advantage of the approach—the
freedom of the individual to give unique responses. The process of coding free- response data is known as content analysis, and we will discuss it in more detail in Chapter 7.
Fixed-Format Self-Report Measures
Partly because of the diffi culty of coding free-format responses, most research using self-report measures relies on fixed-format self-report measures. On these measures, the individual is presented with a set of questions (the questions are called items), and the responses that can be given are more structured than in free-format measures.
In some cases, the information that we wish to obtain is unambiguous, and only one item is necessary to get it. For instance: Enter your ethnic identifi cation (please check one):
_____ American Indian or Alaska Native
_____ Asian Black or African American Native Hawaiian or Other Pacific Islander
_____ White
_____ Some Other Race
In other cases—for instance, the measurement of personality variables such as self-esteem, anxiety, intelligence, or mood—the conceptual variable is more difficult to assess. In these cases, fixed-format self-report measures containing a number of items may be used. Fixed-format self-report measures that contain more than one item (such as an intelligence test or a measure of self-esteem) are known as scales. The many items, each designed to measure the same conceptual variable, are combined together by summing or averaging, and the result becomes the person’s score on the measured variable.
One advantage of fi xed-format scales is that there is a well-developed set of response formats already available for use, as well as a set of statistical procedures designed to evaluate the effectiveness of the scales as measures of underlying conceptual variables. As we will see in the next chapter, using more than one item is very advantageous because it provides a better measure of the conceptual variable than would any single item.
The Likert Scale. The most popular type of fi xed-format scale is the Likert scale (Likert, 1932). A
Likert scale consists of a series of items that indicate agreement or disagreement with the issue that is to be measured, each with a set of responses on which the respondents indicate their opinions. One example of a Likert scale, the Rosenberg self-esteem scale, is shown in Table 4.2. This scale contains ten items, each of which is responded to on a four-point response format ranging from “strongly disagree” to “strongly agree.” Each of the possible responses is assigned a number, and the measured variable is the sum or average of the responses across all of the items. You will notice that fi ve of the ten items on the Rosenberg scale are written such that marking “strongly agree” means that the person has high self-esteem, whereas for the other half of the items marking “strongly agree” indicates that the individual does not have high self-esteem. This variation avoids a potential problem on fi xed-format scales known as
acquiescent responding (frequently called a yeah-saying bias). If all the items on a Likert scale are phrased in the same direction, it is not possible to tell if the respondent is simply a “yeah-sayer” (that is, a person who tends to agree with everything) or if he or she really agrees with the content of the item.
To reduce the impact of acquiescent responding on the measured variable, the wording of about one-half of the items is reversed such that agreement with these items means that the person does not have the characteristic being measured. Of course, the responses to the reversed items must themselves be reverse-scored, so that the direction is the same for every item, before the sum or average is taken. On the Rosenberg scale, the reversed items are changed so that 1 becomes 4, 2 becomes 3, 3 becomes 2, and 4 becomes 1. Although the Likert scale shown in Table 4.2 is a typical one, the format can vary to some degree. Although “strongly agree” and “strongly disagree” are probably the most common endpoints, others are also possible:
I am late for appointments:
Never 1 2 3 4 5 6 7 Always
It is also possible to label the midpoint of the scale (for instance, “neither agree nor disagree”) as well as the endpoints, or to provide a label for each of the choices:
I enjoy parties:
1 Strongly disagree
2 Moderately disagree
3 Slightly disagree
4 Slightly agree
5 Moderately agree
6 Strongly agree
In still other cases, for instance, in the study of children, the response
scale has to be simplified:
When an even number of response choices is used, the respondent cannot choose a neutral point, whereas the provision of an odd number of choices allows a neutral response. Depending on the purposes of the research and the type of question, this may or may not be appropriate or desirable. One response format that can be useful when a researcher does not want to restrict the range of input to a number of response options is to simply present a line of known length (for instance 100 mm) and ask the respondents to mark their opinion on the line. For instance:
I enjoy making decisions on my own:
Agree ____________Disagree
The distance of the mark from the end of the line is then measured with a ruler, and this becomes the measured variable. This approach is particularly effective when data are collected on computers because individuals can use the mouse to indicate on the computer screen the exact point on the line that represents their opinion and the computer can precisely measure and record the response.
The Semantic Differential. Although Likert scales are particularly useful for measuring opinions and beliefs, people’s feelings about topics under study can often be better assessed using a type of scale known as a semantic differential (Osgood, Suci, & Tannenbaum, 1957). Table 4.3 presents a semantic differential designed to assess feelings about a university. In a semantic differential, the topic being evaluated is presented once at the top of the page, and the items consist of pairs of adjectives located at the two endpoints of a standard response format. The respondent expresses his or her feelings toward the topic by marking one point on the dimension. To quantify the scale, a number is assigned to each possible response, for instance, from 23 (most negative) to 13 (most positive). Each respondent’s score is computed by averaging across his or her responses to each of the items after the items in which the negative response has the higher number have been reverse-scored. Although semantic differentials can sometimes be used to assess other dimensions, they are most often restricted to measuring people’s evaluations about a topic—that is, whether they feel positively or negatively about it.
The Guttman Scale. There is one more type of fi xed-format self-report scale, known as a Guttman scale (Guttman, 1944), that is sometimes used in behavioral research, although it is not as common as the Likert or semantic differential scale. The goal of a Guttman scale is to indicate the extent to which an individual possesses the conceptual variable of interest. But in contrast to Likert and semantic differential scales, which measure differences in the extent to which the participants agree with the items, the Guttman scale involves the creation of differences in the items themselves. The items are created ahead of time to be cumulative in the sense that they represent the degree of the conceptual variable of interest. The expectation is that an individual who endorses any given item will also endorse every item that is less extreme. Thus, the Guttman scale can be defi ned as a fi xed-format self-report scale in which the items are arranged in a cumulative order such that it is assumed that if a respondent endorses or answers correctly any one item, he or she will also endorse or correctly answer all of the previous scale items.
Consider, for instance, the gender constancy scale shown in Table 4.4 (Slaby & Frey, 1975). This Guttman scale is designed to indicate the extent
to which a young child has confidently learned that his or her sex will not change over time. A series of questions, which are ordered in terms of increasing diffi culty, are posed to the child, who answers each one. The assumption is that if the child is able to answer a given question correctly, then he or she should also be able to answer all of the questions that come earlier on the scale correctly because those items are selected to be easier. Slaby and Frey (1975) found that although the pattern of responses was not perfect (some children did answer a later item correctly and an earlier item incorrectly), the gender constancy scale did, by and large, conform to the expected cumulative pattern. They also found that older children answered more items correctly than did younger children.
Reactivity as a Limitation in Self-Report Measures
Taken together, self-report measures are the most commonly used type of measured variable within the behavioral sciences. They are relatively easy to construct and administer and allow the researcher to ask many questions in a short period of time. There is great fl exibility, particularly with Likert scales, in the types of questions that can be posed to respondents. And, as we will see in Chapter 5, because a fi xed-format scale has many items, each relating to the same thought or feeling, they can be combined together to produce a very useful measured variable.
However, there are also some potential disadvantages to the use of selfreport. For one thing, with the exception of some indirect free-format measures such as the TAT, self-report measures assume that people are able and willing to accurately answer direct questions about their own thoughts, feelings, or behaviors. Yet, as we have seen in Chapter 1, people may not always be able to accurately self-report on the causes of their behaviors. And even if they are accurately aware, respondents may not answer questions on selfreport measures as they would have if they thought their responses were not being recorded. Changes in responding that occur when individuals know they are being measured are known as reactivity. Reactivity can change responses in many different ways and must always be taken into consideration in the development of measured variables (Weber & Cook, 1972).
The most common type of reactivity is social desirability—the natural tendency for research participants to present themselves in a positive or socially acceptable way to the researcher. One common type of reactivity, known as self-promotion, occurs when research participants respond in ways that they think will make them look good. For instance, most people will overestimate their positive qualities and underestimate their negative qualities and are usually unwilling to express negative thoughts or feelings about others. These responses occur because people naturally prefer to answer questions in a way that makes them look intelligent, knowledgeable, caring, healthy, and nonprejudiced.
Research participants may respond not only to make themselves look good but also to make the experimenter happy, even though they would probably not respond this way if they were not being studied. For instance, in one well-known study, Orne (1962) found that participants would perform tedious math problems for hours on end to please the experimenter, even though they had also been told to tear up all of their work as soon as they completed it, which made it impossible for the experimenter to check what they had done in any way.
The desire to please the experimenter can cause problems on self-report measures; for instance, respondents may indicate a choice on a response scale even though they may not understand the question or feel strongly about their answer but want to appear knowledgeable or please the experimenter. In such cases, the researcher may interpret the response as meaning more than it really does. Cooperative responding is particularly problematic if the participants are able to guess the researcher’s hypothesis—for instance, if they can figure out what the self-report measure is designed to assess. Of course, not all participants have cooperative attitudes. Those who are required to participate in the research may not pay much attention or may even develop an uncooperative attitude and attempt to sabotage the study.
There are several methods of countering reactivity on self-report measures. One is to administer other self-report scales that measure the tendency to lie or to self-promote, which are then used to correct for reactivity (see, for instance, Crowne and Marlow’s [1964] social-desirability scale). To lessen the possibility of respondents guessing the hypothesis, the researcher may disguise the items on the self-report scale or include unrelated fi ller or distracter items to throw the participants off the track. Another strategy is to use a cover story—telling the respondents that one thing is being measured when the scale is really designed to measure something else. And the researcher may also be able to elicit more honest responses from the participant by explaining that the research is not designed to evaluate him or her personally and that its success depends upon honest answers to the questions (all of which is usually true). However, given people’s potential to distort their responses on self-report measures, and given that there is usually no check on whether any corrections have been successful, it is useful to consider other ways to measure the conceptual variables of interest that are less likely to be infl uenced by reactivity.