Thứ Sáu, 19 tháng 3, 2010

Chương 4 - Khái niệm và Đo lường Khái niệm

Chapter 4
Concepts and their measurement

Concepts form a linchpin in the process of social research. Hypotheses contain concepts which are the products of our reflections on the world. Concepts express common elements in the world to which we give a name. We may notice that some people have an orientation in which they dislike people of a different race from their own, often attributing to other races derogatory characteristics. Still others are highly supportive of racial groups, perhaps seeing them as enhancing the ‘host’ culture through instilling new elements into it and hence enriching it. Yet others are merely tolerant, having no strong views one way or the other about people of other racial groups. In other words, we get a sense that people exhibit a variety of positions in regard to racial groups. We may want to suggest that there is a common theme to these attitudes, even though the attitudes themselves may be mutually antagonistic. What seems to bind these dispositions together is that they reflect different positions in regard to ‘racial prejudice’. In giving a name to the various dispositions that may be held regarding persons of another race, we are treating it as a concept, an entity over and above the observations about racial hostility and supportiveness that prompted the formulation of a name for those observations. Racial prejudice has acquired a certain abstractness, so that it transcends the reflections that prompted its formulation. Accordingly, the concept of racial prejudice becomes something that others can use to inform their own reflections about the social world. In this way, hypotheses can be formulated which postulate connections between racial prejudice and other concepts, such as that it will be related to social class or to authoritarianism.

Once formulated, a concept and the concepts with which it is purportedly associated, such as social class and authoritarianism, will need to be operationally defined, in order for systematic research to be conducted in relation to it. An operational definition specifies the procedures (operations) that will permit differences between individuals in respect of the concept(s) concerned to be precisely specified. What we are in reality talking about here is measurement, that is, the assignment of numbers to the units of analysis – be they people, organisations, or nations – to which a concept refers. Measurement allows small differences between units to be specified. We can say that someone who actively speaks out against members of other races is racially prejudiced, while someone who actively supports them is the obverse of this, but it is difficult to specify precisely the different positions that people may hold in between these extremes. Measurement assists in the specification of such differences by allowing systematic differences between people to be stipulated.

In order to provide operational definitions of concepts, indicators are required which will stand for those concepts. It may be that a single indicator will suffice in the measurement of a concept, but in many instances it will not. For example, would it be sufficient to measure ‘religious commitment’ by conducting a survey in which people are asked how often they attend church services? Clearly it would not, since church attendance is but one way in which an individual’s commitment to his or her religion may be expressed. It does not cover personal devotions, behaving as a religious person should in secular activities, being knowledgeable about one’s faith, or how far they adhere to central tenets of faith (Glock and Stark, 1965). These reflections strongly imply that more than one indicator is likely to be required to measure many concepts; otherwise our findings may be open to the argument that we have tapped only one facet of the concept in question.

If more than one indicator of a concept can be envisaged, it may be necessary to test hypotheses with each of the indicators. Imagine a hypothesis in which ‘organisational size’ was a concept. We might measure (that is, operationally define) this concept by the number of employees in a firm, its turnover or its net assets. While these three prospective indicators are likely to be interconnected, they will not be perfectly related (Child, 1973), so that hypotheses about organisational size may need to be tested for each of the indicators. Similarly, if religious commitment is to be measured, it may be necessary to employ indicators which reflect all the facets of such commitment in addition to church attendance. For example, individuals may be asked how far they endorse central aspects of their faith in order to establish how far they adhere to the beliefs associated with their faith.

When questionnaires are employed to measure concepts, as in the case of religious commitment, researchers often favour multiple-item measures. In the Job Survey data, satis is an example of a multiple-item measure. It entails asking individuals their positions in relation to a number of indicators which stand for one concept. Similarly, there are four indicators of both autonom and routine. One could test a hypothesis with each of the indicators. However, if one wanted to use the Job Survey data to examine a hypothesis relating to satis and autonom, each of which contains four questions, sixteen separate tests would be required. The procedure for analysing such multiple-item measures is to aggregate each individual’s response in relation to each question and to treat the overall measure as a scale in relation to which each unit of analysis has a score. In the case of satis, autonom and routine, the scaling procedure is Likert scaling, which is a popular approach to the creation of multiple-item measures. With Likert scaling, individuals are presented with a number of statements which appear to relate to a common theme; they then indicate their degree of agreement or disagreement on a five- or seven-point range. The answer to each constituent question (often called an item) is scored, for example from 1 for strongly disagree to 5 for strongly agree if the range of answers is in terms of five points. The individual scores are added up to form an overall score for each respondent. Multiple-item scales can be very long; the four satis questions are taken from an often-used scale developed by Brayfield and Rothe (1951) which comprised eighteen questions.

These multiple-item scales are popular for various reasons. First, a number of items are more likely to capture the totality of a broad concept like job satisfaction than a single question. Second, we can draw finer distinctions between people. The satis measure comprises four questions which are scored from 1 to 5, so that respondents’ overall scores can vary between 4 and 20. If only one question was asked, the variation would be between 1 and 5 – a considerably narrower range of potential variation. Third, if a question is misunderstood by a respondent, when only one question is asked that respondent will not be appropriately classified; if a few questions are asked, a misunderstood question can be offset by those which are properly understood.

It is common to speak of measures as variables, to denote the fact that units of analysis differ in respect to the concept in question. If there is no variation in a measure, it is a constant. It is fairly unusual to find concepts whose measures are constants. On the whole, the social sciences are concerned with variables and with expressing and analysing the variation that variables exhibit. When univariate analysis is carried out, we want to know how individuals are distributed in relation to a single variable. For example, we may want to know how many cases can be found in each of the categories or levels of the measure in question, or we may be interested in what the average response is, and so on. With bivariate analysis we are interested in the connections between two variables at a time. For example, we may want to know whether the variation in satis is associated with variation in another variable like autonom or whether men and women differ in regard to satis. In each case, it is variation that is of interest.

One of the most important features of an understanding of statistical operations is an appreciation of when it is permissible to employ particular tests. Central to this appreciation is an ability to recognise the different forms that variables take, because statistical tests presume certain kinds of variable, a point that will be returned to again and again in later chapters.

The majority of writers on statistics draw upon a distinction developed by Stevens (1946) between nominal, ordinal and interval/ratio scales or levels of measurement. First, nominal (sometimes called categorical) scales entail the classification of individuals in terms of a concept. In the Job Survey data, the variable ethnicgp, which classifies respondents in terms of five categories – white, Asian, West Indian, African and Other – is an example of a nominal variable. Individuals can be allocated to each category, but the measure does no more than this and there is not a great deal more that we can say about it as a measure. We cannot order the categories in any way, for example. This inability contrasts with ordinal variables, in which individuals are categorised but the categories can be ordered in terms of ‘more’ and ‘less’ of the concept in question. In the Job Survey data, skill, prody and qual are all ordinal variables. If we take the first of these, skill, we can see that people are not merely categorised into each of four categories – highly skilled, fairly skilled, semi-skilled and unskilled – since we can see that someone who is fairly skilled is at a higher point on the scale than someone who is semi-skilled. We cannot make the same inference with ethnicgp since we cannot order the categories that it comprises. Although we can order the categories comprising skill, we are still limited in the things that we can say about it. For example, we cannot say that the skill difference between being highly skilled and fairly skilled is the same as the skill difference between being fairly skilled and semi-skilled. All we can say is that those rated as highly skilled have more skill than those rated as fairly skilled, who in turn have greater skill than the semi-skilled, and so on. Moreover, in coding semi-skilled as 2 and highly skilled as 4, we cannot say that people rated as highly skilled are twice as skilled as those rated as semi-skilled. In other words, care should be taken in attributing to the categories of an ordinal scale an arithmetic quality that the scoring seems to imply.

With interval/ratio variables, we can say quite a lot more about the arithmetic qualities. In fact, this category subsumes two types of variable – interval and ratio. Both types exhibit the quality that differences between categories are identical. For example, someone aged 20 is one year older than someone aged 19, and someone aged 50 is one year older than someone aged 49. In each case, the difference between the categories is identical – one year. A scale is called an interval scale because the intervals between categories are identical. Ratio measures have a fixed zero point. Thus age, absence and income have logical zero points. This quality means that one can say that somebody who is aged 40 is twice as old as someone aged 20. Similarly, someone who has been absent from work six times in a year has been absent three times as often as someone who has been absent twice. However, the distinction between interval and ratio scales is often not examined by writers because, in the social sciences, true interval variables frequently are also ratio variables (for example, income, age). In this book, the term interval variable will sometimes be employed to embrace ratio variables as well.

Interval/ratio variables are recognised to be the highest level of measurement because there is more that can be said about them than about the other two types. Moreover, a wider variety of statistical tests and procedures are available to interval/ratio variables. It should be noted that if an interval/ratio variable such as age is grouped into categories – such as 20–29, 30–39, 40–49, 50–59 and so on – it becomes an ordinal variable. We cannot really say that the difference between someone in the 40–49 group and someone in the 50–59 is the same as the difference between someone in the 20–29 group and someone in the 30–39 group, since we no longer know the points within the groupings at which people are located. On the other hand, such groupings of individuals are sometimes useful for the presentation and easy assimilation of information. It should be noted too that the position of dichotomous variables within the three-fold classification of types of variable is somewhat ambiguous. With such variables, there are only two categories, such as male and female for the variable gender. A dichotomy is usually thought of as a nominal variable, but sometimes it can be considered an ordinal variable. For example, when there is an inherent ordering to the dichotomy, such as passing and failing, the characteristics of an ordinal variable seem to be present.

Strictly speaking, measures like satis, autonom and routine, which derive from multiple-item scales, are ordinal variables. For example, we do not know whether the difference between a score of 20 on the satis scale and a score of 18 is the same as the difference between 10 and 8. This poses a problem for researchers since the inability to treat such variables as interval means that methods of analysis like correlation and regression (see Chapter 8), which are both powerful and popular, could not be used in their connection since these techniques presume the employment of interval variables. On the other hand, most of the multiple-item measures created by researchers are treated by them as though they were interval variables because these measures permit a large number of categories to be stipulated. When a variable allows only a small number of ordered categories, as in the case of commit, prody, skill and qual in the Job Survey data, each of which comprises only either four or five categories, it would be unreasonable in most analysts’ eyes to treat them as interval variables. When the number of categories is considerably greater, as in the case of satis, autonom and routine, each of which can assume

Table 4.1 Types of variable



Example in Job Survey data

Nominal (categorical)

A classification of objects (people, firms, nations, etc.) into discrete categories that cannot be rank ordered.



The categories associated with an ordinal variable that can be rank ordered. Objects can be ordered in terms of a criterion from highest to lowest.

commit skill prody qual

Interval (a)

With ‘true’ interval variables, categories associated with a variable can be rank ordered, as with an ordinal variable, but the distances between the categories are equal.

income age years absence

Interval (b)

Variables which strictly speaking are ordinal, but which have a large number of categories, such as multiple-item questionnaire measures. These variables are assumed to have similar properties to ‘true’ interval variables.

satis routine autonom


A variable that comprises only two categories

gender attend

Figure 4.1 Deciding the nature of a variable

sixteen categories from 5 to 20, the case for treating them as interval variables is more compelling.

Certainly, there seems to be a trend in the direction of this more liberal treatment of multiple-item scales as having the qualities of an interval variable. On the other hand, many purists would demur from this position. Moreover, there does not appear to be a rule of thumb which allows the analyst to specify when a variable is definitely ordinal and when interval. None the less, in this book it is proposed to reflect much of current practice and to treat multiple-item measures such as satis, autonom and routine as though they were interval scales. Labovitz (1970) goes further in suggesting that almost all ordinal variables can and should be treated as interval variables. He argues that the amount of error that can occur is minimal, especially in relation to the considerable advantages that can accrue to the analyst as a result of using techniques of analysis like correlation and regression which are both powerful and relatively easy to interpret. However, this view is controversial (Labovitz, 1971) and whereas many researchers would accept the treatment of variables like satis as interval, they would cavil about variables like commit, skill, prody and qual. Table 4.1 summarises the main characteristics of the types of scale discussed in this section, along with examples from the Job Survey data.

In order to help with the identification of whether variables should be classified as nominal, ordinal, dichotomous, or interval/ratio, the steps articulated in Figure 4.1 can be followed. We can take some of the job survey variables to illustrate how this table can be used. First, we can take skill. This variable has more than two categories; the distances between the categories are not equal; the categories can be rank ordered; therefore the variable is ordinal. Now income. This variable has more than two categories; the distances between them are equal; therefore the variable is interval/ratio. Now gender. This variable does not have more than two categories; therefore it is dichotomous. Finally, we can take ethnicgp. This variable has more than two categories; the distances between the categories are not equal; the categories cannot be rank ordered; therefore, the variable is nominal.

When a concept is very broad, serious consideration needs to be given to the possibility that it comprises underlying dimensions which reflect different aspects of the concept in question. Very often it is possible to specify those dimensions on a priori grounds, so that possible dimensions are established in advance of the formation of indicators of the concept. There is much to recommend deliberation about the possibility of such underlying dimensions, since it encourages systematic reflection on the nature of the concept that is to be measured.

Lazarsfeld’s (1958) approach to the measurement of concepts viewed the search for underlying dimensions as an important ingredient. Figure 4.2 illustrates the steps that he envisaged. Initially, the researcher forms an image from a theoretical domain. This image reflects a number of common characteristics, as in the previous example of job satisfaction which denotes the tendency for people to have a distinctive range of experiences in relation to their jobs. Similarly, Hall (1968) developed the idea of ‘professionalism’ as a consequence of his view that members of professions have a distinctive constellation of attitudes to the nature of their work. In each case, out of this imagery stage, we see a concept starting to form. At the next stage, concept specification takes place, whereby the concept is developed to show whether it comprises different aspects or dimensions. This stage allows the complexity of the concept to be recognised. In Hall’s case, five dimensions of professionalism were proposed:


The use of the professional organisation as a major reference This means that the professional organisation and other members of the profession are the chief source of ideas and judgements for the professional in the context of his or her work.


A belief in service to the public According to this aspect, the profession is regarded as indispensable to society.


Belief in self-regulation This notion implies that the work of a professional can and should only be judged by other members of the profession, because only they are qualified to make appropriate judgements.


A sense of calling to the field The professional is someone who is dedicated to his or her work and would probably want to be a member of the profession even if material rewards were less.

Figure 4.2 Concepts, dimensions and measurements

Sources: Lazarsfeld (1958); Hall (1968); Snizek (1972)


Autonomy This final dimension suggests that professionals ought to be able to make decisions and judgements without pressure from either clients, the organisations in which they work, or any other non-members of the profession.

Not only is the concept specification stage useful in order to reflect and to capture the full complexity of concepts, but it also serves as a means of bridging the general formulation of concepts and their measurement, since the establishment of dimensions reduces the abstractness of concepts.

The next stage is the selection of indicators, in which the researcher searches for indicators of each of the dimensions. In Hall’s case, ten indicators of each dimension were selected. Each indicator entailed a statement in relation to which respondents had to answer whether they believed that it agreed very well, well, poorly, or very poorly in the light of how they felt and behaved as members of their profession. A neutral category was also provided. Figure 4.2 provides both the five dimensions of professionalism and one of the ten indicators for each dimension. Finally, Lazarsfeld proposed that the indicators need to be brought together through the formation of indices or scales. This stage can entail either of two possibilities: an overall scale could be formed comprising all indicators relating to all dimensions; or, more frequently, separate scales can be formulated for each dimension. Thus, in Hall’s research, the indicators relating to each dimension were combined to form scales, so that we end up with five separate scales of professionalism. As Hall shows, different professions exhibit different ‘profiles’ in respect of these dimensions: one may emerge as having high scores for dimensions 2, 3, and 5, moderate for 1, and low for 4, whereas other professions will emerge with different combinations.

In order to check whether the indicators bunch in the ways proposed by an a priori specification of dimensions, factor analysis, a technique that will be examined in Chapter 11, is often employed. Factor analysis allows the researcher to check whether, for example, all of the ten indicators developed to measure ‘autonomy’ are really related to each other and not to indicators that are supposed to measure other dimensions. We might find that an indicator which is supposed to measure autonomy seems to be associated with many of the various indicators of ‘belief in service to the public’, while one or two of the latter might be related to indicators which are supposed to denote ‘belief in self-regulation’, and so on. In fact, when such factor analysis has been conducted in relation to Hall’s professionalism scale, the correspondence between the five dimensions and their putative indicators has been shown to be poor (Snizek, 1972; Bryman, 1985). However, the chief point that should be recognised in the foregoing discussion is that the specification of dimensions for concepts is often an important step in the development of an operational definition.

Some measurement is carried out in psychology and sociology with little (if any) attention to the quest for dimensions of concepts. For example, the eighteen-item measure of job satisfaction developed by Brayfield and Rothe (1951), which was mentioned above, does not specify dimensions, though it is possible to employ factor analysis to search for de facto ones. The chief point that can be gleaned from this section is that the search for dimensions can provide an important aid to understanding the nature of concepts and that when established on the basis of a priori reasoning can be an important step in moving from the complexity and abstractness of many concepts to possible measures of them.

It is generally accepted that when a concept has been operationally defined, in that a measure of it has been proposed, the ensuing measurement device should be both reliable and valid.

The reliability of a measure refers to its consistency. This notion is often taken to entail two separate aspects: external and internal reliability. External reliability is the more common of the two meanings and refers to the degree of consistency of a measure over time. If you have kitchen scales which register a different weight every time the same bag of sugar is weighed, you would have an externally unreliable measure of weight, since the amount fluctuates over time in spite of the fact that there should be no differences between the occasions that the item is weighed. Similarly, if you administered a personality test to a group of people, readministered it shortly afterwards and found a poor correspondence between the two waves of measurement, the personality test would probably be regarded as externally unreliable because it seems to fluctuate. When assessing external reliability in this manner, that is by administering a test on two occasions to the same group of subjects, test–retest reliability is being examined. We would anticipate that people who scored high on the test initially will also do so when retested; in other words, we would expect the relative position of each person’s score to remain comparatively constant. The problem with such a procedure is that intervening events between the test and the retest may account for any discrepancy between the two sets of results. For example, if the job satisfaction of a group of workers is gauged and three months later is reassessed, it might be found that in general respondents exhibit higher levels of satisfaction than previously. It may be that in the intervening period they have received a pay increase or a change to their working practices or some grievance that had been simmering before has been resolved by the time job satisfaction is retested. Also, if the test and retest are too close in time, subjects may recollect earlier answers, so that an artificial consistency between the two tests is created. However, test–retest reliability is one of the main ways of checking external reliability.

Internal reliability is particularly important in connection with multiple-item scales. It raises the question of whether each scale is measuring a single idea and hence whether the items that make up the scale are internally consistent. A number of procedures for estimating internal reliability exist, two of which can be readily computed in SPSS. First, with split-half reliability the items in a scale are divided into two groups (either randomly or on an odd–even basis) and the relationship between respondents’ scores for the two halves is computed. Thus, the Brayfield–Rothe job satisfaction measure, which contains eighteen items, would be divided into two groups of nine, and the relationship between respondents’ scores for the two halves would be estimated. A correlation coefficient is then generated (see Chapter 8), which varies between 0 and 1 and the nearer the result is to 1 – and preferably at or over 0.8 – the more internally reliable is the scale. Second, the currently widely used Cronbach’s alpha essentially calculates the average of all possible split-half reliability coefficients. Again, the rule of thumb is that the result should be 0.8 or above. This rule of thumb is also generally used in relation to test–retest reliability. When a concept and its associated measure are deemed to comprise underlying dimensions, it is normal to calculate reliability estimates for each of the constituent dimensions rather than for the measure as a whole. Indeed, if a factor analysis confirms that a measure comprises a number of dimensions the overall scale will probably exhibit a low level of internal reliability, since the split-half reliability estimates may be lower as a result.

Box 4.1 Reliability Analysis dialog box

Both split-half and alpha estimates of reliability can be easily calculated with SPSS. It is necessary to ensure that all items are coded in the same direction. Thus, in the case of satis it is necessary to ensure that the reverse items (satis2 and satis4) have been recoded (using Recode) so that agreement is indicative of job satisfaction. These two items have been recoded in the following illustration as rsatis2 and rsatis4. In order to generate a reliability test of the four items that make up satis, the following sequence would be used:

➔Analyze ➔Scale ➔Reliability Analysis... [opens Reliability Analysis dialog box shown in Box 4.1]

➔satis1, rsatis2, satis3 and rsatis4 while holding down the Ctrl button [all four of the variables should be highlighted] ➔►button [puts satis1, rsatis2, satis3 and rsatis4 in the Items: box] ➔Model: ➔Alpha in the drop-down menu that appears


If split-half reliability testing is preferred, click on Split-half in the Model: pull-down menu rather than Alpha. The output for alpha (Table 4.2) suggests that satis is in fact internally reliable since the coefficient is 0.76. This is only just short of

Table 4.2 Reliability Analysis output for satis (Job Survey data)

the 0.8 criterion and would be regarded as internally reliable for most purposes. If a scale turns out to have low internal reliability, a strategy for dealing with this eventuality is to drop one item or more from the scale in order to establish whether reliability can be boosted. To do this, select the ➔Statistics... button in the Reliability Analysis dialog box. This brings up the Reliability Analysis: Statistics subdialog box (shown in Box 4.2). Then ➔Scale if item deleted. The output shows the alpha reliability levels when each constituent item is deleted. Of course, in the case of satis, this exercise would not be necessary.

Two other aspects of reliability, that is in addition to internal and external reliability, ought to be mentioned. First, when material is being coded for themes, the reliability of the coding scheme should be tested. This problem can occur when a researcher needs to code people’s answers to interview questions that have not been pre-coded in order to search for general underlying themes to answers, or when a content analysis of newspaper articles is conducted to elucidate ways in which news topics tend to be handled. When such exercises are carried out, more than one coder should be used and an estimate of inter-coder reliability should be provided to ensure that the coding scheme is being consistently interpreted by coders. This exercise would entail gauging the degree to which coders agree on the coding of themes deriving from the material being examined. Second, when the

Box 4.2 Reliability Analysis: Statistics subdialog box

researcher is classifying behaviour an estimate of inter-observer reliability should be provided. For example, if aggressive behaviour is being observed, an estimate of inter-observer reliability should be presented to ensure that the criteria of aggressiveness are being consistently interpreted. Methods of bivariate analysis (see Chapter 8) can be used to measure inter-coder and inter-observer reliability. A discussion of some methods which have been devised specifically for the assessment of inter-coder or inter-observer reliability can be found in Cramer (1998).

The question of validity draws attention to how far a measure really measures the concept that it purports to measure. How do we know that our measure of job satisfaction is really getting at job satisfaction and not at something else? At the very minimum, a researcher who develops a new measure should establish that it has face validity, that is, that the measure apparently reflects the content of the concept in question.

The researcher might seek also to gauge the concurrent validity of the concept. Here the researcher employs a criterion on which people are known to differ and which is relevant to the concept in question. For example, some people are more often absent from work (other than through illness) than others. In order to establish the concurrent validity of our job satisfaction measure we may see how far people who are satisfied with their jobs are less likely than those who are not satisfied to be absent from work. If a lack of correspondence is found, such as frequent absentees being just as likely to be satisfied as not satisfied, we might be tempted to question whether our measure is really addressing job satisfaction. Another possible test for the validity of a new measure is predictive validity, whereby the researcher uses a future criterion measure, rather than a contemporaneous one as in the case of concurrent validity. With predictive validity, the researcher would take later levels of absenteeism as the criterion against which the validity of job satisfaction would be examined.

Some writers advocate that the researcher should also estimate the construct validity of a measure (Cronbach and Meehl, 1955). Here, the researcher is encouraged to deduce hypotheses from a theory that is relevant to the concept. For example, drawing upon ideas about the impact of technology on the experience of work (for example, Blauner, 1964), the researcher might anticipate that people who are satisfied with their jobs are less likely to work on routine jobs; those who are not satisfied are more likely to work on routine jobs. Accordingly, we could investigate this theoretical deduction by examining the relationship between job satisfaction and job routine. On the other hand, some caution is required in interpreting the absence of a relationship between job satisfaction and job routine in this example. First, the theory or the deduction that is made from it may be faulty. Second, the measure of job routine could be an invalid measure of the concept.

All the approaches to the investigation of validity that have been discussed up to now are designed to establish what Campbell and Fiske (1959) refer to as con- vergent validity. In each case, the researcher is concerned to demonstrate that the measure harmonises with another measure. Campbell and Fiske argue that this process usually does not go far enough in that the researcher should really be using different measures of the same concept to see how far there is convergence. For example, in addition to devising a questionnaire-based measure of job routine, a researcher could use observers to rate the characteristics of jobs in order to distinguish between degrees of routineness in jobs in the firm (for example, Jenkins et al., 1975). Convergent validity would entail demonstrating a convergence between the two measures, although it is difficult to interpret a lack of convergence since either of the two measures could be faulty. Many of the examples of convergent validation that have appeared since Campbell and Fiske’s (1959) article have not involved different methods, but have employed different questionnaire research instruments (Bryman, 1989). For example, two questionnaire-based measures of job routine might be used, rather than two different methods. Campbell and Fiske went even further in suggesting that a measure should also exhibit discriminant validity. The investigation of discriminant validity implies that one should also search for low levels of correspondence between a measure and other measures which are supposed to represent other concepts. Although discriminant validity is an important facet of the validity of a measure, it is probably more important for the student to focus upon the various aspects of convergent validation that have been discussed. In order to investigate both the various types of convergent validity and discriminant validity, the various techniques covered in Chapter 8, which are concerned with relationships between pairs of variables, can be employed.


Which of the following answers is true? A Likert scale is (a) a test for validity; (b) an approach to generating multiple-item measures; (c) a test for reliability; or (d) a method for generating dimensions of concepts?


When operationalising a concept, why might it be useful to consider the possibility that it comprises a number of dimensions?


Consider the following questions which might be used in a social survey about people’s drinking habits and decide whether the variable is nominal, ordinal, interval/ratio or dichotomous:


Do you ever consume alcoholic drinks?

Yes ____

No ____ (go to question 5)


If you have ticked Yes to the previous question, which of the following alcoholic drinks do you consume most frequently? (Tick one category only.)

Beer ____

Spirits ____

Wine ____

Liquors ____

Other ____


How frequently do you consume alcoholic drinks? Tick the answer that comes closest to your current practice.

Daily ____

Most days ____

Once or twice week ____

Once or twice a month ____

A few times a year ____

Once or twice a year ____


How many units of alcohol did you consume last week? (We can assume that the interviewer would help respondents to translate into units of alcohol.)

Number of units _____


In the Job Survey data, is absence a nominal, an ordinal, an interval/ratio, or a dichotomous variable?


Is test–retest reliability a test of internal or external reliability?


What would be the SPSS procedure for computing Cronbach’s alpha for autonom?


Following on from question 6, would this be a test of internal or external reliability?


A researcher develops a new multiple-item measure of ‘political conservatism’. He/she administers the measure to a sample of individuals and also asks them how they voted at the last general election in order to validate the new measure. The researcher relates respondents’ scores to how they voted. Which of the following is the researcher assessing: (a) the measure’s concurrent validity; (b) the measure’s predictive validity; or (c) the measure’s discriminant validity?

Không có nhận xét nào:

Đăng nhận xét