The myth that Criterion related validity is a simple correlation between test score and work outcome

This is a myth that can be discussed with relative simplicity: Criterion validity is far more than the simple correlations that are found in technical manuals. Validity in this sense is more appropriately described as whether an assessment can deliver a proposed outcome in a given setting with a given group. Criterion validity is thus ‘does this test predict some real world outcome in a real world setting’.

Assessments can add value, as discussed last month, but we need to think deeper about criterion related validity if this value is going to be more effectively demonstrated. Criterion validity is too often determined by correlating a scale on a test (e.g. extroversion) with an outcome (e.g. training). The problem is that neither the scale score nor the outcome exists in a vacuum. They are both sub-parts of greater systems (i.e. both consist of multiple variables). In the case of the test, the scale score is not independently exclusive. Rather, it is one scale among many that have been used to understand a person’s psychological space better (e.g. one of the big five scales). Any work outcome is the sum total of a system working together. Outcomes are likely to be impacted by variables; like the team a person is working in, or the environmental context (both micro and macro), what they are reinforced for, etc.. In a normal research design, these aspects are controlled for, but when it comes to criterion validity correlations reported by test publishers this is unlikely to be the case.

When it comes to criterion validity, we are very much in the dark as to how psychological variables impact work outcomes in the real world despite claims to know otherwise. As an example, let’s consider the variable of conscientiousness. The test publisher research tells us that the higher a person’s conscientiousness the better they are likely to perform on the job. Common sense would tell us that people who are excessively conscientious may however not perform well due to their need to achieve a level of perfection that detracts from delivery in a timely manner. Not surprisingly recent research does not support the idea of a linear correlation in that for many traits too much of the trait is detrimental: Le, H., Oh, I-S., Robbins, S.B., Ilies, R., Holland, E., & Westrick, P. (2011). Too much of a good thing: Curvilinear relationships between personality traits and job performance. Journal of Applied Psychology, 96, 1, 113-133.

This is supported by work I was involved in with Dr. Paul Wood that showed that intelligence and conscientiousness may be negatively correlated in certain circumstances and therefore indicate that there are multiple ways of completing a task to the level of proficiency required. Intelligence Compensation Theory: A Critical Examination of the Negative Relationship Between Conscientiousness and Fluid and Crystallised Intelligence The Australian and New Zealand Journal of Organisational Psychology / Volume 2 /August 2009, pp 19-29. The problem that both studies highlight is that we are simply looking at the concept of criterion validity in a too reductionist manner. These simple 1-1 correlations do not represent validity in terms of how the practitioner would think of the term (“is this going to help me select better”). This question cannot be answered because the question itself requires thinking about the interaction between psychological variables and the unique context that the test will be applied in.

To understand how the problem of validity has become an accepted norm, one must look to the various players in the field. As is often the case, a reductionist view of validity stems from associations such as the BPS, who have simplified the concept of validity to suit their requirements. This then forces test publishers to adhere to this and clamor over each other to produce tables of validity data. The practitioners then understand validity within this paradigm. To add injury to insult, the criteria of quality becomes: to have as many of these seemingly meaningless validity studies as possible, further proliferating this definition of validity. The fact that a closer look at these studies show validity correlation coefficients going off in all sorts of directions is seemingly lost, or deemed irrelevant!

The solution to this nonsense is that the way we think of criterion validity must change. We need to be taking a more holistic approach that is more thorough and system based to answer the real questions practitioners have. This would incorporate both qualitative and quantitative approaches, and is perhaps best captured in the practice of evaluation, which is taking this approach seriously: http://www.hfrp.org/evaluation/the-evaluation-exchange/issue-archive/reflecting-on-the-past-and-future-of-evaluation/michael-scriven-on-the-differences-between-evaluation-and-social-science-research.

Finally to survive the criteria used to evaluate tests, the likes of the BPS needs to change. Without this change test publishers cannot adopt alternative practices as their tests will not be deemed “up to standard”. So alas, I think we may be stuck with this myth for a bit longer yet.