“When Correcting for Unreliability of Job Performance Ratings, the Best Estimate Is Still 0.52”, Winny Shen, Jeffrey M. Cucina, Philip T. Walmsley, Benjamin K. Seltzer2014-12-01 (, , ; backlinks; similar)⁠:

In this commentary we answer 3 questions that are often posed when debating the usefulness and accuracy of correcting criterion-related validity coefficients for unreliability: (a) Is 0.52 an inaccurate estimate? (b) Do corrections for criterion unreliability lead us to choose different selection tools? (c) Is too much variance explained?

[1. Yes; 2. No, because rank-order of tools’ utility is preserved by the corrections; 3. No, because while everything is correlated r = 0.30 on average, most of those variables are unknowable at hiring time and also adding up variables ignores diminishing returns/intercorrelations between the predictors, so one will never predict perfectly.]

Conclusion: Based on our review of the evidence, the 0.52 estimate of the interrater reliability of supervisor ratings of job performance is an appropriate estimate; corrections for unreliability do not appear to change our decisions regarding the choice of one selection tool over another; and most variables may be more strongly correlated than people expect, making it difficult to demonstrate continued incremental validity in predicting job performance when adding additional predictors. We agree with LeBreton et al that psychologists need to be careful when applying and interpreting corrections, and we are thankful that they sponsored a discussion on the topic.

Corrections are critical for both basic science (ie. estimating population parameters) and practice (ie. recognizing artifacts attenuating estimates on which our work may be evaluated by stakeholders, courts, and other third parties). Ultimately, the appropriate use of corrections depends on the purpose of the project. If the goal is to explain variation among a sample of incumbents on observed criterion scores, then no corrections need to be made. If the goal is to explain variation among incumbents on a true score for job performance, then a correction for unreliability is not only desirable but necessary. Finally, if the goal is to estimate how much variation among applicants is explained by a predictor for a true score on job performance, then corrections for range restriction and unreliability are indispensable. This goal represents the target validity inference that was included in Binning & Barrett1989’s figure, but (rather interestingly) is omitted from LeBreton et al’s reproduction of that figure. We believe that the target validity inference is the most important inference in personnel selection; it provides the critical link from the observed predictor to the criterion construct (see also Putka & Sackett2010).