Why do People Recommend Ipsative Tools for Selection if They Are Not Designed for that Purpose?

Last week we discussed the basics of ipsative testing. This week my blog will look at why ipsative testing has been used for selection and the real driver of ipsative testing: marketing.

Why do people recommend ipsative tools for selection if they are not recommended for that purpose?

There are two reasons that people recommend ipsative measures for selection. The first is a mis-belief that they are less resistant to faking and therefore produce more valid results. The second is that marketing is fundamentally about having a point of difference; This ultimately means that companies selling assessments will present any number of different assessment solutions so long as they can create a story around it.

Faking

 As is often the case, I quote from work by Paul Barrett in response to issues of faking: “The widespread use of ipsative measurement came about as a response to rater bias exhibited in questionnaire personnel ratings. Travers (1951) credits Paul Horst with first proposing the idea of forced-choice format to counter ‘leniency’ and other errors in rating. The forced-choice method involves the presentation of items that have been matched for preference value (e.g. social desirability) yet discriminate differentially on a set criterion, such as leadership quality for a specific task (Gordon, 1951). However, two important assumptions underlie this approach. Firstly, all the choices must be as high in apparent validity as each other, and secondly, the ratees will, on average, ascribe equal status to the irrelevant qualities (Guildford, 1954)”.

Hammond and Barrett then go on to show that the ipsative tools rather than getting around social desirability have in build and systematic problems of social desirability in responding. People will still attempt to respond in ways in which they are motivated to. However with ipsative testing because the scales are interdependent it is much harder to detect social desirability and differentiate it from situational difference. Thus, ipsative testing is not so much more difficult to fake but rather is more difficult to detect making its use in selection more not less problematic.

Marketing

The key reason that ipsative testing has been promoted by a few assessment companies mainly around marketing purposes. Personality testing has far less validity than other forms of testing, such as cognitive ability, despite its common use. With the plethora of assessment companies, the need for market differentiation increases. Couple this with issues around faking in personality testing and you have a clear market driver for ipsative testing.

The key is understanding that this market driver runs counter to the supporting evidence for ipsative testing. Thus, companies who flaunt ipsative testing clearly show a commitment not to psychometric rigor but gaining market share. One cannot have things another way as to promote the use of ipsative testing requires a deliberate ignorance of the independent supporting theory and science related to ipsative assessment.

A more pressing question is how the market place ever got conned into the belief that ipsative testing is indeed valid for selection. The answer lies with whom the control of the message resided and who is the paradigm setting agency. As competition has increased, counter views and ‘independent best-practice’ have been brought to bear in countries like New Zealand where ipsative testing for selection is now rare. The question still remains as to why even the New Zealand market was not told about the issue related to ipsative measurement before and why (in my experience) were so many convinced that the ‘best-practice’ was the other way around (i.e. ipsative for selection and normative for development)?

The resolution is of course professional ethics in that if we as psychologists know there is no issue with normative tools for between person comparisons than our use of ipsative testing will be minimised. As professionals we must therefore not fall folly to marketing diatribe but as always base our decisions on independent science and theory.

15 thoughts on “Why do People Recommend Ipsative Tools for Selection if They Are Not Designed for that Purpose?

  1. wahyupsy

    Dear Dr. Paul.
    I read some articles that recently, company seek employee who possess certain type of personality which suit to job requirement. As I know, personality testing that produce typology of personality is ipsative testing. For example MBTI typology: extrovert-introvert; sensing-intuition; thinking-feeling. Some job require employee who extrovert, intuition and feeling, other job require another types. Is the test like this called ipsative testing? Thank you.

    Reply
    1. Dr Paul Post author

      Thank you for your question. Ipsative tests are not defined by the output (i.e. bi polar scales such as extrovert-introvert). Rather the tests are defined by the way the questions are asked. Ipsative tests have forced choice questions. A respondent is asked to choose between (usually 4 choices) two alternatives (usually an item that is most like them and an item that is least like them). This can be seen in contrast to questionnaires which ask a respondent to rate themselves (say from strongly agree to strongly disagree).

      Both tests may measure the same constructs (say introversion and extroversion) but they will do this in a different way. With an ipsative test your score on one trait is not independent from your score on another and in there lies one of the main problems. For other issues please read the previous posts on ipsative tests in this blog as they provide more detail and references.

      Reply
  2. Joey Wang

    Dear Dr.Paul.
    I have just read this articles and I think it is really helpful for me understanding why some company selling ipsative test.
    Actually I am a student in Asian studying psychology.
    And because I am not quite know about how ipsaitve socore is caculated,
    would you please give me some brief introduction or some links to the articles introducing it?

    Thank you very much and best regards,

    Reply
    1. Dr Paul Post author

      Hi Joey

      Great to see you reading the blogs. All the key references for ipsative tests are in the blogs so that should steer you in the right direction.

      Reply
  3. Tom

    Dr Paul,
    Just finding your blog and enjoyed your posts on ipsative assessments.

    I first was subject of a hiring assessment in the late 1990’s and carried that tool with me to other companies and now utilize it as part of my consulting practice.

    I do not see that you addressed the available options for people interested in conducting hiring or other comparative assessments…ie. Normative Assessments such as The Prevue.

    Reply
    1. Dr Paul Englert Post author

      Hi Tom

      Thanks for the comment. There are a range of normative tools that can be used for selection and I have not gone into these in the posts. The key is that a normative tool is designed to make between person judgments so is suitable for selection (so long as it meets other psychometric standards and is related to the job).

      So in short you are quite right. There are a range of normative tools that can be used in replace of ipsative measures.

      All the best
      Paul

      Reply
  4. Bob Parker

    Ipsative tests ARE approved for selection. It’s a complete myth that they are not. Well constructed (NOT MBTI) ipsative tests are fantastic for making comparisons between people. I do it every day in my practice with a Job Benchmark. Ipsative tests are very difficult to game. Normative tests are quite easy to game. Every claim you make against ipsative testing has been proven false by TTI Success Insights, the over 100,000 companies that use their ipsative tests, and the EEOC compliant process they have in place.

    Amazing that there’s so much ignorance.

    Reply
    1. Dr Paul Englert Post author

      Dear Mr Parker

      Thank you for your response to the blog. We appreciate all commentary, even those that disagree with us.

      In response to your comment it is important to note that the critique is not test specific. The critique is one of fundamental design issues based on the properties of ipsative tests in contrast to normative tools. The argument is as much based on the logic of the design as the measurement properties.

      Secondly I agree that there have been advancements in this space since these blogs were written. The work on the OPQr is a sound and a committed attempt to tackle this problem. Likewise CAT personality tools like TAPAS are clearly extremely well developed and have potential to further develop the industry (Conference Presentation). The central issue is one of between versus within person measurement, which these tests, I agree, are tackling.

      This noted these are but a subset of the ipsative tools on the market. Most do not have this development background and therefore the critiques noted remain applicable and relevant to most.

      Thirdly, the critique is one based on research; not the marketing hype that is so prevalent in this industry. The research continues to point to the problems of ipsative tools. While much of this research is noted in this series of blogs (Ipsative Tests: Psychometric Properties) you may also wish to review Closs S.J. (2011). On the factoring and interpretation of ipsative data, Journal of Occupational and Organizational Psychology, 69 (1), 41–47.Article

      I contrast to this research based approach to discussion I note your own reply that appears to provide as evidence that you “do it everyday” and that “100,000 companies use their ipsative tools”. In fairness I went to your site in response to your critique. I clicked on the topic ‘validity’ http://www.ttisuccessinsights.com/how/validity which then appeared to focus on an interesting, albeit misplaced, run down on the reliability of your tests. While I admit to only having a cursory overview I could not see as to how you, or the company you mention, have tackled the ipsative issue. The marketing was nice though.

      I would be more than happy to review any peer reviewed literature you have on the topic of ipsative measurement for between person judgement and any peer-reviewed research that outlines the process you have taken to solve this conundrum. Case studies, marketing hype, research on other topics, numbers of people tested, pretty graphs, years in the industry, trust me and any other spurious evidence I unfortunately don’t have time for.

      Thanks again for participating and if you wish to personally email the peer reviewed evidence base for your ipsative measure I will happily look at it.

      All the best
      Paul

      P.S. On the issue of faking and ipsative tools I suggest a good read of Hammond & Barrett, (1995). The Psychometric and Practical Implications of the use of Ipsative, forced-choice format, Questionnaires.

      Reply
  5. Ryan

    You seem rather dogmatic in the assertion that ipsative formats are fundamentally broken, and provide no empirical evidence for this. Meanwhile, there is good research indicating that ipsative formats are more resilient to faking in high-stakes contexts and more useful for predicting job performance, e.g. Hirsh, Peterson 2008.

    Ipsative formats may have their problems, but normative formats may well be far worse.

    Reply
    1. Dr Paul Englert Post author

      Hi Ryan

      Thank you for participating in the blog, this is appreciated. You note that there are no references provided. This is not the case and you may have missed the list in this blog: https://oprablog.wordpress.com/2010/10/12/great-blog-but-what-are-ipsative-tests/. For your benefit I note this below:

      Cattell, R.B. (1944) Psychological Measurement: ipsative, normative, and interactive. Psychological Review, 51, 292-303.

      Meade, A.W. (2004). Psychometric problems and issues involved with creating and using ipsative measures for selection. Journal of Occupational and Organizational Psychology (2004), 77, 531–552

      Hammond, S., and Barrett, P., (1995). The Psychometric and Practical Implications of the use of Ipsative, forced-choice format, Questionnaires.

      Hugh, L.M., and Ones. D. S., (2001). The structure, measurement, validity and use of Personality variables in Industrial, Work and Organisational Psychology, pg. 233.

      Since this time there have been further papers that could be cited. Two that you might find interesting are Goffin, R.D., Jang, I., & Skinner, E. (2011) Forced-choice and conventional personality assessment: Each may have unique value in pre-employment testing. Personality and Individual Differences, 51, 7, 840-844; Salgado, J.F., Anderson, N., & Tauriz, G. (2014). The validity of ipsative and quasi-ipsative forced-choice personality inventories for different occupational groups: A comprehensive meta-analysis. Journal of Occupational and Organizational Psychology, (2015) 797-834

      Having addressed the assertion that there are no references, let me move on to your second claim that I’m dogmatic on ipsative formats. On the contrary, I note, that my issue is the difference between within and between person measurement and the application of the two. I state clearly that ipsative measurement has its place in the former and is problematic, by design, for the latter. “In Summary, ipsative testing is very applicable when working with an individual. Ipsative testing is inappropriate when used for selection”. I note at this time that these are the same guidelines that you will find in any elementary textbook on testing and have been recently endorsed by the BPS (assessment matters (2015) Vol 7, (4).

      Next, we come to the paper by Hirsh and Peterson (2008). The fact that they call their test ‘ A Fake Proof’ measure is perhaps all that anyone needs to know about how far they may be bending the truth in the promotion of the tool. These are the exact claims that I’m fighting. They are obviously excessive and have no place in science. The paper in no way convinces me that ipsative tests are useful for between person comparisons. The paper does not address the fundamental issues with respect to this. What the paper assess is how predicative validity is affected under fake good conditions. The naturally corollary, shown by later papers, is this predictive validity in a faking condition is simply accounted for by ‘g’ especially in cases with students samples.

      This brings me to your final point, “Ipsative formats may have their problems, but normative formats may well be far worse”. I disagree for all the reasons that I have outlined in the blogs. Ipsative tests, by design, are not meant for between person comparison. The problems with their interpretation, factor structure, and reliability, when used in this setting, make their problems far greater. The International guidelines mean that that those that use them in this manner have a hurdle to defend if challenged. The issue of faking is a moot point, as both are susceptible for faking (any employee who wants to try and fake can as shown in the paper you quote). However, under ipsative conditions, it is harder to track. The push for ipsative formats for selection invariably comes from those who have a test to sell. Claims that anything is ‘fake proof’ is a nonsense and the industry must be wise to this.

      Finally to reiterate my point early and points made in other responses. I see great value in ipsative measure for counselling. Moreover, much of the work that is going on with the likes of the OPQi I see as valuable and would concur that the team behind that instrument are addressing many of the issues for selection. But this same science is not behind most ipsative tools and the claims made are done so for commercial reasons. Thanks again.

      Reply
  6. Saville

    You seem blindly oblivious of the superior validity of quasi- ipsative designs. Please look up the literature

    Reply
    1. Dr Paul Englert Post author

      Hello Peter

      Firstly, thank you for taking the time to provide a brief response to the blog.

      While my reading is now less in areas of issues of measurement, I do keep abreast on issues around ipsative testing, in so much that I can discuss with clients if asked. I have not written on the topic in a while. Your short comment has spurred me on to do an update-thank you.

      I believe a critical paper that you may be referring to about quasi-ipsative testing may be the 2014 paper by Salgado, & Táuriz . If I’m correct, then the first point that I would note, which, I believe may be in line with your position, in that it makes the distinction between ipsative and quasi-ipsative tests, in test scoring. I agree this is an important distinction to make in so much as it makes clear the difference. Arguments I have raised previously, such as challenging the supposed increase in criterion validity of ipsative testing, are not supported in the paper. The issue here is quasi-ipsative tests.

      Hence, let me provide my thoughts on the paper, in so much as it relates to the critical argument, I have on quasi-ipsativitity for personality testing. I want to note from the outset that the paper’s analysis does not appear to include the Wave. While I appreciate that you are not making this claim, I think it is an essential point for clarity, given potential assumptions by readers.

      The number of studies in the paper related to quasi-ipsative tests is small, which is in no way a critique of the researchers who have done a thorough review of the area. However, the proliferation of quasi-methods for scoring, in widely used commercial tests, is relatively new, and it is only in more recent years that there is a growth of these methods, in part to address known weaknesses with earlier versions of ipsative tests, such as previous versions of the OPQ.

      The research looks at whether quasi-ipsative tests are more predictive than traditional ipsative measures, and the findings would indeed support this conjecture. The researchers identify six types of quasi-ipsativity in the paper, and it is unclear, at least on my reading, as to what change leads to improved validity. Suffice to say that the closer one is to pure, non-quasi, ipsative testing the lower the criterion validity.
      Moreover, the criterion validities cited are so-called ‘true’ coefficients of the relationship between personality and outcome variables. While the techniques used are accepted practice in our industry, this type of analysis, is increasingly coming under criticism as it inflates the effect and draws away from the inherently large variance between studies. Criterion validity is a necessary but not sufficient marker for the application of psychometric tests. Given the statistical manipulations readers must take care as to their understanding of claims made based on the established validities, especially when those claims are that assessment X approach produces a higher relationship with job performance that assessment Y. My issue is not with a test per say but a methodology and I try and stay clear of debates that are I believe are more grounded in marketing than science.

      Most importantly, the paper does not adequately address the fundamental measurement issues, that are at the base of my critique of the use of ipsative and quasi-ipsative testing for selection, as their focus is criterion-related validity. The paper is a good read, and the work important to our field but arguments on adjusted criterion-validity alone fails to address my concerns. Indeed, recent literature does nothing but confirms the problems of using ipsative testing for the measurement of personality for the application of screening (see below). I have made my views clear in previous blogs (1, 2, 3) and will not repeat the arguments here other than to give the central gist.

      1. Ipsative testing for the measurement of personality is problematic on both logical and measurement grounds:
      a. Personality is not a within-person construct. We understand as a measurement between people. To say one is extraverted is to say they are more extraverted than others.
      b. Ipsative tests often have a problem with reliability that challenge the consistency of the underlying constructs. While I recognise claims to the contrary, such in this paper, the bulk of the evidence, not mention, recent studies (such as those below), indicate reliability issues. In the age of the replication crisis, I will put my confidence in the bulk of the work.
      c. A traditional selling point for producers of ipsative tests is the capacity to get around the problem of faking. The problem, however, is that the likelihood that candidates will put their best forward in an assessment is, I would argue, a given. Hence, faking is not overcome by Ipsative testing; it merely makes it harder to trace. The issue is seemingly more related to a person’s ability to identify the criteria being selected for

      In the spirit of dialogue on the conversation, I note recent research in the area which supports my claims (while also challenging the criterion superiority):
      1. Note even high-dimensionality can address the problems with ipsative measurement:
      Schulte, N., Holling, H., & Bürkner, P. (2020). Can high-dimensional questionnaires resolve the ipsativity issue of forced-choice response formats?. Educational and Psychological Measurement (https://doi.org/10.1177/0013164420934861 ), In Press, 1-28.

      2. The criterion-related validity of ipsative tests is likely to be lower, not higher than CTT:
      Fisher, P.A., Robie, C., Christiansen, N.D., Speer, A.B., & Schneider, L. (2019). Criterion-related validity of forced-choice personality measures: A cautionary note regarding Thurstonian IRT versus Classical Test Theory scoring. Personnel Assessment and Decisions (https://scholarworks.bgsu.edu/pad/vol5/iss1/3/ ), 5, 1, 1-14.

      Moving on from ipsative tests, at a practical level, I think that the commercialisation of the industry often leads to variations designed around creating unique selling points, and marketing to the sky on these differences, rather than on the science of personality testing and the ethical approach to selection. The testing industry should promote the reality of occupational testing and the link with the science of personality.

      a) Admission of the underlying limitations of testing, given the science of personality – There are a small set of robust traits that make sense to assess. No test publisher has discovered, let alone owns, the occupational model of personality, as if there was such a thing.

      b) That as the factors in models increase the separation between scales decreases. Essentially you simply have rulers measuring the same thing and calling it something different.

      c) The Principles of measurement ( see anything by Michell) are important and relate to what we can or cannot achieve by assessing psychological attributes and human behaviour within a selection environment. Human behaviour is variable, and therefore we must be clear on there are significant constraints to our level of accuracy given the reality of human behaviour.

      d) That testing is not there to catch candidates out. Instead, it is a straightforward methodology to get someone to describe themselves using responses to behaviours or an agreed taxonomy which is then combined with other information to make an informed selection decision. Some people may overrepresent, but this is natural, and making claims that we have ways to stop this are exaggerated or may deliberately allow people to assume that their test is somehow cheat-proof.

      For these reasons, not to mention potential issues of adverse impact I’m not swayed by the arguments for the introduction of quasi-ipsative testing, let alone ipsative testing for personality testing, especially in the area of selection. That said, I agree that this distinction needs to be made between quasi and pure-ipsative testing and I appreciate you engaging in a manner that gives me the opportunity to update the latest literature and engage in the discussion.

      References
      Salgado, J.F., & Táuriz, G. (2013) The Five-Factor Model, forced-choice personality inventories and performance: A comprehensive meta-analysis of academic and occupational validity studies. European Journal of Work and Organizational Psychology, earlyview, , 1-29

      Kleinmann, M., Ingold, P.V., Lievens, F., Jansen, A., Melchers, K.G., & König, C.J. (2011). A different look at why selection procedures work : The role of candidates’ ability to identify criteria. Organizational Psychology Review, 1, 2, 128-146. And Klehe, U., Kleinmann, M., Hartstein, T., Melchers, K.G., König, C.J., Heslin, P.A., & Lievens, F. (2012). Responding to personality tests in a selection context: The role of the ability to identify criteria and the ideal-employee factor. Human Performance, 25, 4, 273-302.

      Anderson, N., & Sleap, S. (2004). An evaluation of gender differences on the Belbin team role self‐perception inventory. Journal of Occupational and Organizational Psychology, 77(3), 429-437.

      Reply
  7. Paul Barrett

    One might also carefully read:
    Wetzel, E., & Frick, S. (2020). Comparing the validity of trait estimates from the multidimensional forced-choice format and the rating scale format. Psychological Assessment (http://dx.doi.org/10.1037/pas0000781), In Press, , 2-16.
    =Abstract=
    The multidimensional forced-choice (MFC) format has been proposed as an alternative to rating scales (RS) that may be less susceptible to response biases. The goal of this study was to compare the validity of trait estimates from the MFC and the RS format when using normative scoring for both formats. We focused on construct validity and criterion-related validity. In addition, we investigated test–retest reliability over a period of six months. Participants were randomly assigned the MFC (N = 593) or the RS (N = 622) version of the Big Five Triplets. In addition to self-ratings on the Big Five Triplets and other personality questionnaires and criteria, we also obtained other-ratings (N = 770) for the Big Five Triplets. The Big Five in the Big Five Triplets corresponded well with the Big Five in the Big Five Inventory except for agreeableness in the MFC version. The majority of the construct validity coefficients differed between the MFC and the RS version, whereas criterion-related validities were very similar. The self- and other-rated Big Five Triplets showed higher correlations in the MFC format than in the RS format. The reliability of trait estimates on the Big Five and test-retest reliabilities were lower for MFC compared to RS. For the MFC format to be able to replace the RS format, more research on how to obtain ideal constellations of items that are matched in their desirability is needed.

    Public Significance Statement
    The response format used in a questionnaire can influence inferences on respondents’ trait levels and therefore diagnostic decisions. This study finds that the multidimensional forced-choice version of a Big Five questionnaire, in which participants rank items presented in triplets, overall measures the same constructs and is related to criteria (e.g., number of Facebook friends) in the same way as a rating scale version of the questionnaire, although relations with other traits tend to differ in magnitude.

    For me, the reality is that the new empirical evidence has rendered the whizz-bang Thurstonian IRT forced-choice ipsative-to-normative assessment as no more than a bit of clever psychometric flim-flammery.

    The ultimate problem for anyone producing a fully or quasi-ipsative forced-choice assessment is that it forces people make choices between responses (or have to rank them non-equally) that they otherwise might find truly impossible to make.

    I loathe these blasted things. They simply reflect a desire in self-report test developers to inhibit response distortion – but doing so by ignoring the complexity of human cognition – which is strange for those developers who like to call themselves psychologists.

    Confounding all this is ability – especially ATIC (which Paul Englert has already referenced) i.e. candidates Ability to Identify Criteria. Which was also highlighted in a recent 2018 article:
    Geiger, M., Olderbak, S., Sauter, R., & Wilheim, O. (2018). The “g” in Faking: Doublethink the validity of personality self-report measures for applicant selection. Frontiers in Psychology: Personality and Social Psychology (https://doi.org/10.3389/fpsyg.2018.02153), 9, 2153, 1-15.[open-access]

    And, let us not forget Socioanalytic theory – and the very reasonable notion that people aren’t “faking” at all but just being normal human beings putting their ‘best-foot-forward'”. A very fine article from Bob Hogan and Jeff Foster provides evidence and argument for this ‘psychological-oriented’ view.
    Hogan, R., & Foster, J. (2016). Rethinking personality. International Journal of Personality Psychology (http://ijpp.rug.nl/article/view/25245/22691), 2, 1, 37-43.[open-access]

    The main problem as ever is self-report – and its many weaknesses as a methodology to acquire ‘true-score quantitative measurements’ of human psychological attributes. But that’s another issue altogether!

    But perhaps the biggest issue of all now is whether the data in this preprint paper will ever be published:
    Schmidt, F.L., Oh, I-S., & Shaffer, J.A. (2016). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 100 years of research findings. was posted at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2853669, , , 1-74.
    Now downloadable at:

    Click to access 2016-100-Yrs-Working-Paper-on-Selection-Methods-Schmit-Mar-17.pdf

    If those results in Table 1 stick, it’s the end of self-report personality testing as we know it – let alone all that hot-air about structured interviews being superior to unstructured interviews.

    A quick summary of Table 1 is at:
    View at Medium.com

    And this is given even more legs in a beautifully imaged and well-referenced series of informative slides by Andrew Munro:
    Personality testing in employee selection Challenges, controversies and future directions

    Click to access Personality-Testing-Employee-Selection.pdf

    Reply
    1. Dr Paul Englert Post author

      Thank you for such a thoughtful response, Paul. I agree that Andrew’s thought pieces are a must-read. He is truly artful in his ability to cut through the BS.

      The final word on this topic for me is captured in the point you make: ” The ultimate problem for anyone producing a fully or quasi-ipsative forced-choice assessment is that it forces people to make choices between responses (or have to rank them non-equally) that they otherwise might find truly impossible to make”.

      The ethics, or lack thereof, of being forced to make incomparable choices between constructs that are only understandable relative to other people, is at best questionable. No amount of statistical jiggery-pokery will get around this problem.

      Reply

Please Comment!