IS&T Review of Psychometric Scaling

Psychometric Scaling: A Toolkit for Imaging Systems Development written by Peter Engeldrum. Published in 2000 by Imcotek Press, P.O. Box 17, Winchester, MA, 01890-0017, USA.

This book collects, reviews, and organizes conventional psychological scaling techniques. Since the context is imaging systems, the discussion is tailored to measuring the psychological dimensions underlying the perception of images. The book is a timely contribution to our field, since these scaling techniques were originally published in dozens of books and journals over the last 100 years, and the original discussions are almost never in the context of images. The book also updates the computational notation and presentation. Instead of tables of sample computations, the book includes a CD-ROM with implementations of the various procedures written in Mathcad worksheets along with sample data sets that serve as examples. Although some of the techniques required simplifying assumptions when originally implemented due to the computational limitations of that era, today's modern computing power makes these simplifications unnecessary.

As the title of the book suggests, Engeldrum's intended audience consists of image scientists and engineers trying to optimize the performance of imaging systems. Without metrics related to how customers will perceive the results accruing to various design changes, system optimization is haphazard. Engeldrum calls the framework for this optimization the "Image Quality Circle." This framework builds from the "technology variables" of the system design, through the "physical image parameters" resulting from the design, to customer perceptions of image attributes, and ultimately to customer quality preferences. The steps dealing with customer perceptions require quantification through psychometric scaling procedures.

Engeldrum walks the reader through performing a scaling test one step at a time. He starts by giving good, general advice on selecting and preparing samples for scaling. He goes on to considerations of the observers (how many? expert or man-on-the-street type?) and careful formulation of the instructions given to each observer. Finally, he covers the testing environment and the actual administration of the test. This detail serves to de-mystify the scaling process and ensures that even early tests are successful.

The next section of the book discusses scale type: nominal (unique up to an arbitrary 1-1 transformation), ordinal (unique up to an arbitrary monotonic transformation), interval (unique up to an arbitrary affine--slope and intercept--transformation), and ratio (unique up to an arbitrary scale factor.) Note that the scale types are nested, in the sense that succeeding scale types enforce more constraints on the previous type. The type needed determines many of the features of both the experiment and the subsequent data analysis, and the appropriate type depends on how the results of the study will be used. [As a color scientist, I feel compelled to dispute Engeldrum's claim on p. 48 that CIE XYZ is merely a nominal scale. I think he intends to say that XYZ is not perceptually "uniform"--that vector distance does not represent perceptual distance--which is true. However, uniformity is not a necessary property of psychometric scales. The scale type of XYZ is stronger even than the ratio type. The set of all colored lights is a Grassman structure, which means it is equivalent to a vector space with an addition operator (superposition of light,) and a multiplication operator (attenuation by a neutral density filter.) See Suppes, Krantz, Luce, & Tversky, Foundations of Measurement, Vol. II, Academic Press, 1989.)]

Having established the fundamentals, Engeldrum begins to explain the various scaling procedures in subsequent chapters. He starts with thresholds and just-noticeable differences ("j.n.d.'s"). These topics allow him to develop the classic concept of the psychometric function and the idea that the observer's internal response to a fixed stimulus is not constant, but should be represented by a random variable with a probability distribution having a specific mean and variance. Psychometric scaling has to do with estimation of the mean response to various stimuli. Sometimes the variance in that response is also of interest, since that variance is important in predicting when stimuli with mean responses separated by some amount can be reliably distinguished (i.e. when they are separated by more than a j.n.d.) This statistical approach is fundamental to all of the scaling methods he discusses, as they all depend on statistical models, and actual scale values are nothing but parameter estimates for those models.

Engeldrum emphasizes the difference between the data collection methodology and the data analysis methodology. The standard data collection methods he covers are: paired comparison, rank ordering, category methods, and direct production methods like graphical rating scales and ratio scaling methods. He points out that the same data might produce an ordinal scale or an interval scale depending on the analysis performed. For example, observers might rank order samples and those rank orders be averaged to produce an ordinal scale (p. 80.) Alternatively, those rank orders might be broken into the equivalent set of paired comparisons (p. 103) and then those paired comparisons analyzed with an interval scale-producing technique like Thurstone's case V (p. 97.)

Engeldrum's treatment of indirect interval scaling is very good. These methods are called indirect because the distances between scale values for stimuli are based on the probability of confusions within an observer or disagreement among observers in the relative ranking of the stimuli. (As opposed to direct methods in which the observer directly produces a quantity--like placing a mark on a ruler--representing the scale value of a stimulus.) The indirect methods are good for measuring small differences and confusable stimuli. The direct methods are better for measuring large differences. The classic indirect methods are Thurstone's analysis of paired-comparison data and Torgerson's analysis of category data. These methods are closely related. Both represent the stimulus with a random variable. Both divide into several different "cases" based on the simplifying statistical assumptions placed on those random variables (e.g. Thurstone case V, the most frequently used case, postulates that the random variables are all independent and have the same variance.)

Torgerson's method also represents the category boundaries by random variables. The model gets a little complicated, but Engeldrum provides a canned worksheet, so that should not be a barrier. The benefit from the complication is that, while simpler analysis methods require that the categories divide the scale into equal-appearing intervals, Torgerson's analysis will work on any categories (and even give scale values to the category boundaries so you know how equal-appearing they might be!) In practice, the equal-appearing requirement is difficult to meet, so that the simpler analysis probably can only produce an ordinal scale.

The last procedure Engeldrum discusses is direct ratio scaling. Although ratio scales have the most stringent uniqueness property (only allowing scaling by an arbitrary constant), in practice that uniqueness results from the fact that they require (or assume) the most abstract sensory information processing on the part of the observer. For instance, the observer might be asked to "assign successive numbers so they reflect your subjective impression…" (p. 145.) Given their reliance on observers interpreting and performing these instructions consistently, direct ratio scales may not be as strong as their uniqueness property implies.

The book ends with a discussion of how to select an appropriate scaling methodology for any specific project. Engeldrum presents a tree chart with successive decisions to questions of 1) how confusable are the stimuli? 2) how many samples are there? and 3) how much observer effort can be required? In each of the eight terminal leaves of the tree, Engeldrum suggests scaling methods that are well matched to the situation. This chart is part of his general step-by-step approach and will guide the beginner to exactly those procedures he or she needs.

Overall, Psychometric Scaling will serve as an excellent introduction and reference book for those involved in imaging system development. It is brief and to the point in its explanations, but supplies a wealth of references for those wanting to dig deeper. The working code will save practitioners many hours and represents a step toward better standardization within the industry. In imaging system optimization, how to measure-psychometric scaling-ought to be automatic and routine so that the developer can concentrate on the bigger issues of what to measure. This book does not address that bigger issue, but does give the reader the information necessary to generate scaling results that can be trusted.

Jay Thornton received a Ph.D. in Mathematical Psychology from the University of Michigan. Since 1983 he has worked at Polaroid Corporation in the areas of color reproduction, halftoning, image processing, and image science.