Showing posts with label Measures. Show all posts
Showing posts with label Measures. Show all posts

Friday 28 June 2013

Current Research in the Behavioral Sciences: Using Multiple Measured Variables to Assess the Conceptual Variable of Panic Symptoms


Current Research in the Behavioral Sciences: Using Multiple
Measured Variables to Assess the Conceptual Variable of
Panic Symptoms

Bethany Teachman, Shannan Smith-Janik, and Jena Saporito are clinical psychologists who study psychological disorders. In one of their recent research projects (Teachman, Smith-Janik, & Saporito, 2007), they were interested in testing the extent to which a variety of direct and indirect measured variables could be used to help defi ne the underlying conceptual variable of the panic symptoms that are frequently experienced by people with anxiety disorders. They operationalized six different measured variables to assess the single conceptual variable. 
         Their research used a sample of 43 research participants who had been diagnosed with panic disorder. Each of the participants completed a variety of measures designed to assess their psychological states, both directly and indirectly. In terms of direct, self-report Likert-scale measures, the participants completed the Anxiety Sensitivity Index (Reiss, Peterson, Gursky, & McNally, 1986), which is a 16-item questionnaire assessing concern over the symptoms associated with anxiety; the Fear Questionnaire-Agoraphobia scale (Marks & Mathews, 1979), which measures level of phobic avoidance toward common situations; and the Panic Disorder Severity Scale (Shear et al., 1997), which is a measure of severity score of frequency, distress and impairment associated with panic attacks. 
         Another direct measure used a different response format. In the Brief Body Sensations Interpretation Questionnaire (Clark et al., 1997), participants are presented with ambiguous events and then asked to rank order three alternative explanations for why the event might have occurred. For instance, the participants are told, ‘‘You notice that your heart is beating quickly and pounding,’’ and had to choose one of three answers among ‘‘because you have been physically active,’’ ‘‘because there is something wrong with your heart,’’ or ‘‘because you are feeling excited.’’
       The researchers also used two indirect measures of panic symptoms, the Implicit Association Test (Greenwald et al., 1998), and a version of the Stroop Color and Word Test. Participants took these tests on a computer. In the Implicit Associations Test, the participants were asked to classify items as either “self” or “other” and as “panicked” or “calm.” The measured variable was the difference in the speed of classifying the self and panicked and the self and calm words. The idea is that if the individual has automatic associations with the self and panic symptoms, he or she will be able to classify the stimuli more quickly. 
         The Stroop Color and Word Test is a reaction time task that measures how fast the participant can name the color in which a word is presented. It is based on the assumption that words related to panic will be named more slowly because of interference caused by their semantic content. The difference in response time for naming the ink color across panic-related and control words was used as the measured variable. 
         As you can see in Figure 4.2, each of the six measured variables correlated positively with an overall measure of panic symptoms that was derived by statistically combining all of the measures together. You can see that, in this case, the direct measures correlated more highly with the composite than  did the indirect measures.

Choosing a Measure


(Measures) Choosing a Measure

As we have seen in this chapter, most conceptual variables of interest to behavioral scientists can be operationalized in any number of ways. For instance, the conceptual variable of aggression has been operationalized using such diverse measures as shocking others, fi ghting on a playground, verbal abuse, violent crimes, horn-honking in traffi c, and putting hot sauce on people’s food. The possibility of multiple operationalizations represents a great advantage to researchers because there are specific advantages and disadvantages to each type of measure. For instance, as we have seen, self-report measures have the advantage of allowing researchers to get a broad array of information in a short period of time, but the disadvantage of reactivity. On the other hand, behavioral measures may often reduce reactivity, but they may be diffi cult to operationalize and code, and the meaning of some behaviors may be difficult to interpret.           
         When designing a research project, think carefully about which measures to use. Your decision will be based on traditional approaches in the area you are studying and on the availability of resources, such as equipment and expertise. In many cases, you will want to use more than one operationalization of a measure, such as self-report and behavioral measures, in the same research project. In every case, however, you must be absolutely certain that you do a complete literature review before you begin your project, to be sure that you have uncovered measures that have been used in prior research. There is so much research that has measured so many constructs, that it is almost certain that someone else has already measured the conceptual variable in which you are interested. Do not be afraid to make use of measures that have already been developed by others. It is entirely appropriate to do so, as long as you properly cite the source of the measure. As we will see in the next chapter, it takes a great amount of effort to develop a good measured variable. As a result, except when you are assessing a new variable or when existing measures are not appropriate for your research design, it is generally advisable to make use of the work that others have already done rather than try to develop your own measure.

Behavioral Measures


(Measures) Behavioral Measures

One alternative to self-report is to measure behavior. Although the measures shown in Table 4.1 are rather straightforward, social scientists have used a surprising variety of behavioral measures to help them assess the conceptual variables of interest. Table 4.5 represents some that you might fi nd interesting that were sent to me by my social psychology colleagues. Indeed, the types of behaviors that can be measured are limited only by the creativity of the researchers. Some of the types of behavioral variables that form the basis of measured variables in behavioral science include those based on:

  Frequency (for instance, frequency of stuttering as a measure of anxiety in interpersonal relations) 
  Duration (for instance, the number of minutes working at a task as a measure of task interest)
  Intensity (for instance, how hard a person claps his or her hands as a measure of effort) 
 Latency (for instance, the number of days before a person begins to work on a project as a measure of   procrastination) 
 Speed (for instance, how long it takes a mouse to complete a maze as a measure of learning)

        
Although some behaviors, such as how close a person sits to another person, are relatively easy to measure, many behavioral measures are diffi cult to operationally defi ne and effectively code. For instance, you can imagine that it would be no easy task to develop a behavioral measure of “aggressive play” in children. In terms of the operational defi nition, decisions would have to be made about whether to include verbal aggression, whether some types of physical aggression (throwing stones) should be weighted more heavily than other types of physical aggression (pushing), and so forth. Then the behaviors would have to be coded. In most cases, complete coding systems are worked out in advance, and more than one experimenter makes ratings of the behaviors, thereby allowing agreement between the raters to be assessed. In somecases, videotapes may be made so that the behaviors can be coded at a later time. We will discuss techniques of coding behavioral measures more fully in Chapter 7.

Nonreactive Measures

Behavioral measures have a potential advantage over self-report measures—  because they do not involve direct questioning of people, they are frequently less reactive. This is particularly true when the research participant (1) is not aware that the measurement is occurring, (2) does not realize what the measure is designed to assess, or (3) cannot change his or her responses, even if he or she desires to. 

Nonreactive Behavioral Measures. are frequently used to assess attitudes that are unlikely to be directly expressed on self-report measures, such as racial prejudice. For instance, Word, Zanna, and Cooper (1974) coded the nonverbal behavior of White male participants as they conducted an interview with another person, who was either Black or White. The researchers found that the interviewers sat farther away from the Black interviewees than from the White interviewees, made more speech errors when talking to the Blacks, and terminated the interviews with the Blacks sooner than with the Whites. This experiment provided insights into the operation of prejudice that could not have been obtained directly because, until the participants were debriefed, they did not know that their behavior was being measured or what the experiment was about. 
           Some behavioral measures reduce reactivity because they are so indirect that the participants do not know what the measure is designed to assess. For instance, some researchers studying the development of impressions of others will provide participants with a list of behaviors describing another person and then later ask them to remember this information or to make decisions about it. Although the participants think that they are engaging in a memory test, what they remember about the behaviors and the speed with which they make decisions about the person can be used to draw inferences about  whether the participants like or dislike the other person and whether they use stereotypes in processing the information. The use of nonreactive behavioral measures is discussed in more detail in a book by Webb, Campbell, Schwartz, Sechrest, and Grove (1981). 

Psychophysiological Measures

         In still other cases, behavioral measures reduce reactivity because the individual cannot directly control his or her response. One example is the use of psychophysiological measures, which are designed to assess the physiological functioning of the body’s nervous and endocrine systems (Cacioppo, Tassinary, & Berntson, 2000). 
          Some psychophysiological measures are designed to assess brain activity, with the goal of determining which parts of the brain are involved in which types of information processing and motor activities. These brain measures include the electroencephalogram (EEG), magnetic resonance imaging (MRI), positron-emission tomography (PET), and computerized axial tomography (CAT). In one study using these techniques, Harmon-Jones and Sigelman (2001) used an EEG measure to assess brain activity after research participants had been insulted by another person. Supporting their hypotheses, they found that electrical brain responses to the insult were stronger on the left side of the brain than on the right side of the brain, indicating that anger involves not only negative feelings about the other person but also a motivational desire to address the insult. 
           Other psychophysiological measures, including heart rate, blood pressure, respiration speed, skin temperature, and skin conductance, assess the activity of the sympathetic and parasympathetic nervous systems. The electromyograph (EMG) assesses muscle responses in the face. For instance, Bartholow and his colleagues (2001) found that EMG responses were stronger when people read information that was unexpected or unusual than when they read more expected material, and that the responses were particularly strong in response to negative events. Still other physiological measures, such as amount of cortisol, involve determining what chemicals are in the bloodstream—for instance, to evaluate biochemical reactions to stress.
            Although collecting psychophysiological measures can be diffi cult because doing so often requires sophisticated equipment and expertise and the interpretation of these measures may yield ambiguous results (For instance, does an increase in heart rate mean that the person is angry or afraid?), these measures do reduce reactivity to a large extent and are increasingly being used in behavioral research.

Self-Report Measures


(Measures) Self-Report Measures

In the next sections, we will consider some of the many types of measured variables used in behavioral research. We begin by considering how we might gain information by directly asking someone about his or her thoughts, feelings, or behavior. To do so involves using self-report measures, in which individuals are asked to respond to questions posed by an interviewer or a questionnaire. Then in the following sections we will consider the use of behavioral measures, designed to directly measure what people do.

Free-Format Self-Report Measures

          Perhaps the most straightforward use of self-report measures involves asking people to freely list their thoughts or feelings as these come to mind. One of the major advantages of such free-format self-report measures is that they allow respondents to indicate whatever thoughts or feelings they have about the topic, without any constraints imposed on respondents except the effort it takes to write these thoughts or feelings down or speak them into a tape recorder.

Projective Measures. A projective measure is a measure of personality in which an unstructured image, such as an inkblot, is shown to participants, who are asked to freely list what comes to mind as they view the image. One common use of free-format self-report measures is the assessment of personality variables through the use of projective tests such as the Thematic Apperception Test, or TAT (Morgan & Murray, 1935) or the Rorschach inkblots. The TAT, for instance, consists of a number of sketches of people, either alone or with others, who are engaging in various behaviors, such as gazing out a window or pointing at each other. The sketches are shown to individuals, who are asked to tell a story about what is happening in the picture. The TAT assumes that people may be unwilling or unable to admit their true feelings when asked directly but that these feelings will show up in the stories about the pictures. Trained coders read the stories and use them to develop a personality profile of the respondent. 

Associative Lists. Free-format response formats in the form of associative lists have also been used to study such variables as stereotyping. In one of these studies (Stangor, Sullivan, & Ford, 1991), college students were presented with the names of different social groups (African Americans, Hispanics, Russians) and asked to list whatever thoughts came to mind about the groups. The study was based on the assumption that the thoughts listed in this procedure would be those that the individual viewed as strongest or most central to the group as a whole and would thus provide a good idea of what the person really thought about the groups. One student listed the following thoughts to describe different social groups: 
           
           Whites: “Materialistic and prejudiced.” 
           Hispanics: “Poor, uneducated, and traditional. Willing to work hard.” 
           Russians: “Unable to leave their country, even though they want to.”  

Think-Aloud Protocols. Another common type of free-format response formats is a think-aloud protocol (Ericsson & Simon, 1980). In this procedure, individuals are asked to verbalize into a tape recorder the thoughts that they are having as they complete a task. For instance, the following protocol was generated by a college student in a social psychology experiment who was trying to form an impression of another person who was characterized by confl icting information (Fiske, Neuberg, Beattie, & Milberg, 1987): “Professor. Strong, close-minded, rowdy, red-necked, loud. Hmmmm. I’ve never met a professor like this. I tend to make a stereotype of a beer-guzzling bigot…. I can sort of picture him sitting in a smoky, white bar, somewhere in, off in the suburbs of Maryland.” The researchers used the think-aloud protocols, along with other data, to understand how people formed impressions about others. 

The Difficulties of Coding Free-Format Data. Despite the fact that freeformat self-report measures produce a rich set of data regarding the thoughts and feelings of the people being studied, they also have some disadvantages. Most important, it is very difficult and time-consuming to turn the generated thoughts into a set of measured variables that can be used in data analysis. Because each individual is likely to have used a unique set of thoughts, it is hard to compare individuals. One solution is to simply describe the responses verbally (such as the description of the college professor on this page) and to treat the measures as qualitative data. However, because correlational and experimental research designs require the use of quantitative data (measured variables that can be subjected to statistical analysis), it is frequently useful to convert the free responses into one or more measured variables. For instance, the coders can read the answers given on projective tests and tabulate the extent to which different themes are expressed, or the responses given on associative lists can be tallied into different categories. However, the process of fi tting the free responses into a structured coding system tends to reduce the basic advantage of the approach—the
freedom of the individual to give unique responses. The process of coding free- response data is known as content analysis, and we will discuss it in more detail in Chapter 7.

Fixed-Format Self-Report Measures

       Partly because of the diffi culty of coding free-format responses, most research using self-report measures relies on fixed-format self-report measures. On these measures, the individual is presented with a set of questions (the questions are called items), and the responses that can be given are more structured than in free-format measures. 
          In some cases, the information that we wish to obtain is unambiguous, and only one item is necessary to get it. For instance: Enter your ethnic identifi cation (please check one): 

_____ American Indian or Alaska Native 
_____ Asian Black or African American Native Hawaiian or Other Pacific Islander 
_____ White 
_____ Some Other Race 

         In other cases—for instance, the measurement of personality variables such as self-esteem, anxiety, intelligence, or mood—the conceptual variable is more difficult to assess. In these cases, fixed-format self-report measures containing a number of items may be used. Fixed-format self-report measures that contain more than one item (such as an intelligence test or a measure of self-esteem) are known as scales. The many items, each designed to measure the same conceptual variable, are combined together by summing or averaging, and the result becomes the person’s score on the measured variable. 
          One advantage of fi xed-format scales is that there is a well-developed set of response formats already available for use, as well as a set of statistical procedures designed to evaluate the effectiveness of the scales as measures of underlying conceptual variables. As we will see in the next chapter, using more than one item is very advantageous because it provides a better measure of the conceptual variable than would any single item. 
The Likert Scale. The most popular type of fi xed-format scale is the Likert scale (Likert, 1932). A Likert scale consists of a series of items that indicate agreement or disagreement with the issue that is to be measured, each with a set of responses on which the respondents indicate their opinions. One example of a Likert scale, the Rosenberg self-esteem scale, is shown in Table 4.2. This scale contains ten items, each of which is responded to on a four-point response format ranging from “strongly disagree” to “strongly agree.” Each of the possible responses is assigned a number, and the measured variable is the sum or average of the responses across all of the items. You will notice that fi ve of the ten items on the Rosenberg scale are written such that marking “strongly agree” means that the person has high self-esteem, whereas for the other half of the items marking “strongly agree” indicates that  the individual does not have high self-esteem. This variation avoids a potential problem on fi xed-format scales known as acquiescent responding (frequently called a yeah-saying bias). If all the items on a Likert scale are phrased in the same direction, it is not possible to tell if the respondent is simply a “yeah-sayer” (that is, a person who tends to agree with everything) or if he or she really agrees with the content of the item.

To reduce the impact of acquiescent responding on the measured variable, the wording of about one-half of the items is reversed such that agreement with these items means that the person does not have the characteristic being measured. Of course, the responses to the reversed items must themselves be reverse-scored, so that the direction is the same for every item, before the sum or average is taken. On the Rosenberg scale, the reversed items are changed so that 1 becomes 4, 2 becomes 3, 3 becomes 2, and 4 becomes 1. Although the Likert scale shown in Table 4.2 is a typical one, the format can vary to some degree. Although “strongly agree” and “strongly disagree” are probably the most common endpoints, others are also possible:

I am late for appointments:

Never 1 2 3 4 5 6 7 Always

         It is also possible to label the midpoint of the scale (for instance, “neither agree nor disagree”) as well as the endpoints, or to provide a label for each of the choices:

I enjoy parties:
1 Strongly disagree
2 Moderately disagree
3 Slightly disagree
4 Slightly agree
5 Moderately agree
6 Strongly agree
In still other cases, for instance, in the study of children, the response
scale has to be simplified:


         When an even number of response choices is used, the respondent cannot choose a neutral point, whereas the provision of an odd number of choices allows a neutral response. Depending on the purposes of the research and the type of question, this may or may not be appropriate or desirable. One response format that can be useful when a researcher does not want to restrict the range of input to a number of response options is to simply present a line of known length (for instance 100 mm) and ask the respondents to mark their opinion on the line. For instance: 

I enjoy making decisions on my own: 
                         
                       Agree ____________Disagree

The distance of the mark from the end of the line is then measured with a ruler, and this becomes the measured variable. This approach is particularly effective when data are collected on computers because individuals can use the mouse to indicate on the computer screen the exact point on the line that represents their opinion and the computer can precisely measure and record the response.

 The Semantic Differential. Although Likert scales are particularly useful for measuring opinions and beliefs, people’s feelings about topics under study can often be better assessed using a type of scale known as a semantic differential (Osgood, Suci, & Tannenbaum, 1957). Table 4.3 presents a semantic differential designed to assess feelings about a university. In a semantic differential, the topic being evaluated is presented once at the top of the page, and the items consist of pairs of adjectives located at the two endpoints of a standard response format. The respondent expresses his or her feelings toward the topic by marking one point on the dimension.  To quantify the scale, a number is assigned to each possible response, for instance, from 23 (most negative) to 13 (most positive). Each respondent’s score is computed by averaging across his or her responses to each of the items after the items in which the negative response has the higher number have been reverse-scored. Although semantic differentials can sometimes be used to assess other dimensions, they are most often restricted to measuring people’s evaluations about a topic—that is, whether they feel positively or negatively about it.



The Guttman Scale. There is one more type of fi xed-format self-report scale, known as a Guttman scale (Guttman, 1944), that is sometimes used in behavioral research, although it is not as common as the Likert or semantic differential scale. The goal of a Guttman scale is to indicate the extent to which an individual possesses the conceptual variable of interest. But in contrast to Likert  and semantic differential scales, which measure differences in the extent to which the participants agree with the items, the Guttman scale involves the creation of differences in the items themselves. The items are created ahead of time to be cumulative in the sense that they represent the degree of the conceptual variable of interest. The expectation is that an individual who endorses any given item will also endorse every item that is less extreme. Thus, the Guttman scale can be defi ned as a fi xed-format self-report scale in which the items are arranged in a cumulative order such that it is assumed that if a respondent endorses or answers correctly any one item, he or she will also endorse or correctly answer all of the previous scale items. 
           Consider, for instance, the gender constancy scale shown in Table 4.4 (Slaby & Frey, 1975). This Guttman scale is designed to indicate the extent 


to which a young child has confidently learned that his or her sex will not change over time. A series of questions, which are ordered in terms of increasing diffi culty, are posed to the child, who answers each one. The assumption is that if the child is able to answer a given question correctly, then he or she should also be able to answer all of the questions that come earlier on the scale correctly because those items are selected to be easier. Slaby and Frey (1975) found that although the pattern of responses was not perfect (some children did answer a later item correctly and an earlier item incorrectly), the gender constancy scale did, by and large, conform to the expected cumulative pattern. They also found that older children answered more items correctly than did younger children.


 Reactivity as a Limitation in Self-Report Measures 

         Taken together, self-report measures are the most commonly used type of measured variable within the behavioral sciences. They are relatively easy to construct and administer and allow the researcher to ask many questions in a short period of time. There is great fl exibility, particularly with Likert scales, in the types of questions that can be posed to respondents. And, as we will see in Chapter 5, because a fi xed-format scale has many items, each relating to the same thought or feeling, they can be combined together to produce a very useful measured variable.
            However, there are also some potential disadvantages to the use of selfreport. For one thing, with the exception of some indirect free-format measures such as the TAT, self-report measures assume that people are able and willing to accurately answer direct questions about their own thoughts, feelings, or behaviors. Yet, as we have seen in Chapter 1, people may not always be able to accurately self-report on the causes of their behaviors. And even if they are accurately aware, respondents may not answer questions on selfreport measures as they would have if they thought their responses were not being recorded. Changes in responding that occur when individuals know they are being measured are known as reactivity. Reactivity can change responses in many different ways and must always be taken into consideration in the development of measured variables (Weber & Cook, 1972). 
         The most common type of reactivity is social desirability—the natural tendency for research participants to present themselves in a positive or socially acceptable way to the researcher. One common type of reactivity, known as self-promotion, occurs when research participants respond in ways that they think will make them look good. For instance, most people will overestimate  their positive qualities and underestimate their negative qualities and are usually  unwilling to express negative thoughts or feelings about others. These responses occur because people naturally prefer to answer questions in a way that makes them look intelligent, knowledgeable, caring, healthy, and nonprejudiced.
          Research participants may respond not only to make themselves look good but also to make the experimenter happy, even though they would probably not respond this way if they were not being studied. For instance, in one well-known study, Orne (1962) found that participants would perform tedious math problems for hours on end to please the experimenter, even though they had also been told to tear up all of their work as soon as they completed it, which made it impossible for the experimenter to check what they had done in any way. 
        The desire to please the experimenter can cause problems on self-report measures; for instance, respondents may indicate a choice on a response scale even though they may not understand the question or feel strongly about their answer but want to appear knowledgeable or please the experimenter. In such cases, the researcher may interpret the response as meaning more than it really does. Cooperative responding is particularly problematic if the participants are able to guess the researcher’s hypothesis—for instance, if they can figure out what the self-report measure is designed to assess. Of course, not all participants have cooperative attitudes. Those who are required to participate in the research may not pay much attention or may even develop an uncooperative attitude and attempt to sabotage the study.
           There are several methods of countering reactivity on self-report measures. One is to administer other self-report scales that measure the tendency to lie or to self-promote, which are then used to correct for reactivity (see, for instance, Crowne and Marlow’s [1964] social-desirability scale). To lessen the possibility of respondents guessing the hypothesis, the researcher may disguise the items on the self-report scale or include unrelated fi ller or distracter items to throw the participants off the track. Another strategy is to use a cover story—telling the respondents that one thing is being measured when the scale is really designed to measure something else. And the researcher may also be able to elicit more honest responses from the participant by explaining that the research is not designed to evaluate him or her personally and that its success depends upon honest answers to the questions (all of which is usually true). However,  given people’s potential to distort their responses on self-report measures, and given that there is usually no check on whether any corrections have been successful, it is useful to consider other ways to measure the conceptual variables of interest that are less likely to be infl uenced by reactivity.




Operational Definition


(Measures) Operational Definition

The term operational definition refers to a precise statement of how a conceptual variable is turned into a measured variable. Research can only proceed once an adequate operational defi nition has been defi ned. In some cases the conceptual variable may be too vague to be operationalized, and in other cases the variable cannot be operationalized because the appropriate technology has not been developed. For instance, recent advances in brain imaging  have allowed new operationalizations of some variables that could not have been measured even a few years ago. Table 4.1 lists some potential operational definitions of conceptual variables that have been used in behavioral research. As you read through this list, note that in contrast to the abstract conceptual variables (employee satisfaction, frustration, depression), the measured variables are very specifi c. This specifi city is important for two reasons. First, more specific definitions mean that there is less danger that the collected data will be misunderstood by others. Second, specific definitions will enable future researchers to replicate the research.

Converging Operations

         That there are many possible measures for a single conceptual variable might seem a scientific problem. But it is not. In fact, multiple possible measures represent a great advantage to researchers. For one thing, no single operational defi nition of a given conceptual variable can be considered the best. Different types of measures may be more appropriate in different research contexts. For instance, how close a person sits to another person might serve as a measure of liking in an observational research design, whereas heart rate might be more appropriate in a laboratory study. Furthermore, the ability to use different operationalizations of the same conceptual variable allows the researcher to hone in, or to “triangulate,” on the conceptual variable of interest. When the same conceptual variable is measured using different measures, we can get a fuller and better measure of it. Because this principle is so important, we will discuss it more fully in subsequent chapters. This is an example of the use of converging operations, as discussed in Chapter 1.
         The researcher must choose which operational defi nition to use in trying to assess the conceptual variables of interest. In general, there is no guarantee that the chosen measured variable will prove to be an adequate measure of the conceptual variable. As we will see in Chapter 5, however, there are ways to assess the effectiveness of the measures once they have been collected.

Conceptual and Measured Variables

     The relationship between conceptual and measured variables in a correlational research design is diagrammed in Figure 4.1. The conceptual variables are represented within circles at the top of the fi gure, and the measured variables are represented within squares at the bottom. The two vertical arrows, which lead from the conceptual variables to the measured variables, represent the operational defi nitions of the two variables. The arrows indicate the expectation that changes in the conceptual variables (job satisfaction and
job performance in this example) will cause changes in the corresponding measured variables. The measured variables are then used to draw inferences about the conceptual variables.
You can see that there are also two curved arrows in Figure 4.1. The top arrow diagrams the research hypothesis—namely, that changes in job satisfaction are related to changes in job performance. The basic assumption involved in testing the research hypothesis is as follows:

• if the research hypothesis (that the two conceptual variables are correlated) is correct, and

• if the measured variables are adequate—that is, if there is a relationship between both of the conceptual and measured variables (the two vertical arrows in the figure)—then

• a relationship between the two measured variables (the bottom arrow in the fi gure) will be observed (cf. Nunnally, 1978).

       The ultimate goal of the research is to learn about the relationship between the conceptual variables. But, the ability to learn about this relationship is dependent on the operational definitions. If the measures do not really measure the conceptual variables, then they cannot be used to draw inferences about the relationship between the conceptual variables. Thus, the adequacy of a test of any research hypothesis is limited by the adequacy of the measurement of the conceptual variables.

Nominal and Quantitative Variables

        Measured variables can be divided into two major types: nominal variables and quantitative variables. A nominal variable is used to name or identify a particular characteristic. For instance, sex is a nominal variable that identifi es whether a person is male or female, and religion is a nominal variable that identifies whether a person is Catholic, Buddhist, Jewish, or some other religion. Nominal variables are also frequently used in behavioral research to indicate the condition that a person has been assigned to in an experimental research design (for instance, whether she or he is in the “experimental condition” or the “control condition”).        
         Nominal variables indicate the fact that people who share a value on the variable (for instance, all men or all the people in the control condition of an experiment) are equivalent in some way, whereas those that do not share the value are different from each other. Numbers are generally used to indicate the values of a nominal variable, such as when we represent the experimental condition of an experiment with the number 1 and the control condition of the experiment with the number 2. However, the numbers used to represent the categories of a nominal variable are arbitrary, and thus we could change which numbers represent which categories, or even label the categories with letters or names instead of numbers, without losing any information.
         In contrast to a nominal variable, which names or identifi es, a quantitative variable uses numbers to indicate the extent to which a person possesses a characteristic of interest. Quantitative variables indicate such things as how attractive a person is, how quickly she or he can complete a task, or how many siblings she or he has. For instance, on a rating of perceived attractiveness, the number 10 might indicate greater attractiveness than the number 5.

Measurement Scales

          Specifying the relationship between the numbers on a quantitative measured variable and the values of the conceptual variable is known as scaling. In some cases in the natural sciences, the mapping between the measure and the conceptual variable is quite precise. As an example, we are all familiar with the use of the Fahrenheit scale to measure temperature. In the Fahrenheit scale, the relationship between the measured variable (degrees Fahrenheit)  and the conceptual variable (temperature) is so precise that we can be certain that changes in the measured variable correspond exactly to changes in the conceptual variable.
           In this case, we can be certain that the difference between any two points  on the scale (the degrees) refers to equal changes in the conceptual variable across the entire scale. For instance, we can state that the difference in temperature between 10 and 20 degrees Fahrenheit is exactly the same as the difference in temperature between 70 and 80 degrees Fahrenheit. When equal distances between scores on a measure are known to correspond to equal changes in the conceptual variable (such as on the Fahrenheit scale), we call the measure an interval scale.
           Now consider measures of length, such as feet and inches or the metric scale, which uses millimeters, centimeters, and meters. Such scales have all of the properties of an interval scale because equal changes between the points on the scale (centimeters for instance) correspond to equal changes in the conceptual variable (length). But, measures of length also have a true zero point that represents the complete absence of the conceptual variable—zero length. Interval scales that also have a true zero point are known as ratio scales (the Kelvin temperature scale, where zero degrees represents absolute zero, is another example of a ratio scale). In addition to being able to compare intervals, the presence of a zero point on a ratio scale also allows us to multiply and divide scale values. When measuring length, for instance, we can say that a person who is 6 feet tall is twice as tall as a child who is 3 feet tall.
           In most behavioral science research, the scaling of the measured variable is not as straightforward as it is in the measurement of temperature or length. Measures in the behavioral sciences normally constitute only ordinal scales. In an ordinal scale, the numbers indicate whether there is more or less of the conceptual variable, but they do not indicate the exact interval between the individuals on the conceptual variable. For instance, if you rated the friendliness of fi ve of your friends from 1 (least friendly) to 9 (most friendly), the scores would constitute an ordinal scale. The scores tell us the ordering of the people (that you believe Malik, whom you rated as a 7, is friendlier than Guillermo, whom you rated as a 2), but the measure does not tell us how big  the difference between Malik and Guillermo is. Similarly, a hotel that receives a four-star rating is probably not exactly twice as comfortable as a hotel that receives a two-star rating.
           Selltiz, Jahoda, Deutsch, and Cook (1966) have suggested that using ordinal scales is a bit like using an elastic tape measure to measure length. Because the tape measure can be stretched, the difference between 1 centimeter and 2 centimeters may be greater or less than the difference between 7 centimeters and 8 centimeters. As a result, a change of 1 centimeter on the measured variable will not exactly correspond to a change of 1 unit of the conceptual variable (length), and the measure is not interval. However, although the stretching may change the length of the intervals, it does not change their order. Because 2 is always greater than 1 and 8 is always greater than 7, the relationship between actual length and measured length on the elastic tape measure is ordinal.
          There is some disagreement of opinion about whether measured variables in the behavioral sciences can be considered ratio or interval scales or whether they should be considered only ordinal scales. In most cases, it is safest to assume that the scales are ordinal. For instance, we do not normally know whether the difference between people who score 8 versus 10 on a measure of self-esteem is exactly the same as that between two people who score 4 versus 6 on the same measure. And because there is no true zero point, we cannot say that a person with a self-esteem score of 10 has twice the esteem of a person with a score of 5. Although some measures can, in some cases, be considered interval or even ratio scales, most measured variables in the behavioral sciences are ordinal.

Fundamentals of Measurement



(Measures) Fundamentals of Measurement

You will recall from Chapter 2 that the research hypothesis involves a prediction about the relationship between or among two or more variables—for instance, the relationship between self-esteem and college performance or between study time and memory. When stated in an abstract manner, the ideas that form the basis of a research hypothesis are known as conceptual variables. Behavioral scientists have been interested in such conceptual variables as self-esteem, parenting style, depression, and cognitive development.  
           Measurement involves turning conceptual variables into measured variables, which consist of numbers that represent the conceptual variables. 1 The measured variables are frequently referred to as measures of the conceptual variables. In some cases, the transformation from conceptual to measured variable is direct. For instance, the conceptual variable “study time” is straightforwardly represented as the measured variable “seconds of study.” But other conceptual variables can be assessed by many different measures. For instance, the conceptual variable “liking” could be assessed by a person rating, from one to ten, how much he or she likes another person. Alternatively, liking could be measured in terms of how often a person looks at or touches another person or the number of love letters that he or she writes. And liking could also be measured using physiological indicators such as an increase in heart rate when two people are in the vicinity of each other.

Measures


Measures

We have seen in Chapters 1 and 2 that the basis of science is empirical measurement of relevant variables. Formally, measurement refers to the assignment of numbers to objects or events according to specific rules (Coombs, 1964). We assign numbers to events in everyday life, for instance, when we rate a movie as a “nine out of ten” or when a hotel is rated “three star.” As in everyday life, measurement is possible in science because we can use numbers to represent the variables we are interested in studying. In this chapter and the next, we will discuss how behavioral scientists decide what to measure, the techniques they use to measure, and how they determine whether these measures are effective.