The myth that Criterion related validity is a simple correlation between test score and work outcome

This is a myth that can be discussed with relative simplicity: Criterion validity is far more than the simple correlations that are found in technical manuals. Validity in this sense is more appropriately described as whether an assessment can deliver a proposed outcome in a given setting with a given group.  Criterion validity is thus ‘does this test predict some real world outcome in a real world setting’.

Assessments can add value, as discussed last month, but we need to think deeper about criterion related validity if this value is going to be more effectively demonstrated. Criterion validity is too often determined by correlating a scale on a test (e.g. extroversion) with an outcome (e.g. training). The problem is that neither the scale score nor the outcome exists in a vacuum. They are both sub-parts of greater systems (i.e. both consist of multiple variables). In the case of the test, the scale score is not independently exclusive. Rather, it is one scale among many that have been used to understand a person’s psychological space better (e.g. one of the big five scales). Any work outcome is the sum total of a system working together. Outcomes are likely to be impacted by variables; like the team a person is working in, or the environmental context (both micro and macro), what they are reinforced for, etc.. In a normal research design, these aspects are controlled for, but when it comes to criterion validity correlations reported by test publishers this is unlikely to be the case.

When it comes to criterion validity, we are very much in the dark as to how psychological variables impact work outcomes in the real world despite claims to know otherwise. As an example, let’s consider the variable of conscientiousness. The test publisher research tells us that the higher a person’s conscientiousness the better they are likely to perform on the job. Common sense would tell us that people who are excessively conscientious may however not perform well due to their need to achieve a level of perfection that detracts from delivery in a timely manner. Not surprisingly recent research does not support the idea of a linear correlation in that for many traits too much of the trait is detrimental: Le, H., Oh, I-S., Robbins, S.B., Ilies, R., Holland, E., & Westrick, P. (2011). Too much of a good thing: Curvilinear relationships between personality traits and job performance. Journal of Applied Psychology, 96, 1, 113-133.

This is supported by work I was involved in with Dr. Paul Wood that showed that intelligence and conscientiousness may be negatively correlated in certain circumstances and therefore indicate that there are multiple ways of completing a task to the level of proficiency required. Intelligence Compensation Theory: A Critical Examination of the Negative Relationship Between Conscientiousness and Fluid and Crystallised Intelligence The Australian and New Zealand Journal of Organisational Psychology / Volume 2 /August 2009, pp 19-29. The problem that both studies highlight is that we are simply looking at the concept of criterion validity in a too reductionist manner. These simple 1-1 correlations do not represent validity in terms of how the practitioner would think of the term (“is this going to help me select better”). This question cannot be answered because the question itself requires thinking about the interaction between psychological variables and the unique context that the test will be applied in.

To understand how the problem of validity has become an accepted norm, one must look to the various players in the field. As is often the case, a reductionist view of validity stems from associations such as the BPS, who have simplified the concept of validity to suit their requirements. This then forces test publishers to adhere to this and clamor over each other to produce tables of validity data. The practitioners then understand validity within this paradigm. To add injury to insult, the criteria of quality becomes: to have as many of these seemingly meaningless validity studies as possible, further proliferating this definition of validity. The fact that a closer look at these studies show validity correlation coefficients going off in all sorts of directions is seemingly lost, or deemed irrelevant!

The solution to this nonsense is that the way we think of criterion validity must change. We need to be taking a more holistic approach that is more thorough and system based to answer the real questions practitioners have. This would incorporate both qualitative and quantitative approaches, and is perhaps best captured in the practice of evaluation, which is taking this approach seriously: http://www.hfrp.org/evaluation/the-evaluation-exchange/issue-archive/reflecting-on-the-past-and-future-of-evaluation/michael-scriven-on-the-differences-between-evaluation-and-social-science-research.

Finally to survive the criteria used to evaluate tests, the likes of the BPS needs to change. Without this change test publishers cannot adopt alternative practices as their tests will not be deemed “up to standard”. So alas, I think we may be stuck with this myth for a bit longer yet.

Posted in I/O Psychology | Tagged , , , , , , , , , , , , , | Leave a comment

Attacking the Myth That Personality Tests Don’t Add Value

For this month’s myth, the last on the topic of psychometrics, I have chosen a slightly different approach. I’m coming out in defence of personality tools; when they are used correctly and understood in the right context. Rather than reinvent the wheel in this regard, I have chosen to highlight what I believe to be a very reasoned article on the topic in Forbes by Tomas Chamorro-Premuzic.

Before posting a direct link to the article, I want to set the context for the value in psychometrics. For me, it is based on 5 key points:

 

  1. Predicting human behaviour is difficult: The psychometric industry often over steps the mark with the levels of prediction it claims to have. Assessments are not crystal balls and the search for the greatest predictive tool which is easily generalizable across multiple contexts is futile. The corollary, however, is not true that due to human complexity assessments have no application and understanding a little more about a person’s behavioural preference, and understanding a framework of personality, has no value. On the contrary, it is valuable for the very reason that human beings are complex and more information on individual differences and frameworks to help us conceptualise behavioural patterns does add value to the people decisions we need to make. Psychometric tools provide a framework for understanding personality and provides a simple, relative measurement model to assist decision-making..

 

  1. Human beings have free-will: It never ceases to amaze me when I meet people who are sycophantic with respect to their devotion to a particular assessment tool. It is as if they choose to ignore the concept of free-will. Behaviour will inevitably change across situations and with different reinforces, and this is so inherent that it needs no further explanation. What psychometric tools can do however is estimate the likelihood of behavioural change and the preference for behaviour. The assessment does not supersede free-will but rather helps us to understand how free-will be displayed a little bit better.

 

  1. Lying, or distortion, is a problem for any assessment method: Lying is something humans often do! A common argument against personality tools is that people may present themselves in an overly positive light. It should be noted that the same criticism can reside in any assessment methodology, from interviews to CVs.  It affects many dimensions of life, from employment to those hoping to meet Mr or Ms Right via an online dating site. Quality personality tools attempt to mitigate this issue with response style indicators such as measures of social desirability, central tendency and infrequency.

 

  1. Behaviour is an indication between the situation and preference: Much like the comment on free-will, the situation should never be ignored when attempting to understand behaviour. Personality tests provide us with part of the puzzle, and in doing so they help us understand how someone is likely to behave. The keyword in that sentence is ‘likely’, and how ‘likely’ depends on the strength of the behavioural preference and the situation.

 

  1. Personality assessments are a simple, coherent and quick method for shedding light on human complexity: The bulk of personality tools are used for recruitment. When recruiting a person, we need to make an expensive decision on limited information and in a short timeframe. This necessitates the need to look at all the feasible ways of making an informed judgement. At its most basic, the instrument is: a collection of items that have been clustered along psychometric principles, resulting in a degree of reliability over time and internal consistency thus giving meaning to a wider trait. A person’s responses are then compared to others who have taken the tests. Assuming the norm is relevant and up-to-date, and with spread, it gives us an indication of the person’s relative behavioural preference against a comparison group of interest. The information is used to make inferences on likely behaviour together with other information collected. That is the sum total of the process. For argument’s sake, the alternative would be to say that human behaviour is all too complex and we should operate without asking any questions at all. That is equally untenable.

 

The problem is not that personality tests have no value, but that practitioners over estimate their value and predictive power. Psychometric test providers may also confuse the issue by over promoting their assessment, marketing their uniqueness; and extolling the magical powers of their special item set. When understood in the right context, personality assessment can add value. When used as part of a complete system, interlinking recruitment to training to performance management, a deeper understanding of how personality impacts company performance can result. I agree that there are some tools that do not meet minimum psychometric standards and as such their usefulness is limited, but for those assessments that attempt to simply ‘do as they say on the tin’ the problem lies not with the assessment but the practice of the users and unrealistic expectations.

 

I strongly encourage you to read this short piece on the seven common, but irrational, reasons for hating personality tests: http://www.forbes.com/sites/tomaspremuzic/2014/05/12/seven-common-but-irrational-reasons-for-hating-personality-tests/

Posted in Uncategorized | Tagged , , , , , , , , , , , , | 2 Comments

The Myth of Impartiality (Part 2)

Last month, I discussed the issue of impartiality with reference to universities and research. This month, I want to look at the myth of impartiality from the perspective of the users and suppliers of psychometrics. With respect to users, my focus is HR professionals and recruiters. The suppliers I refer to are the plethora of assessment suppliers from around the world.

Practioners

Much of the credibility of psychometric tests is assumed through their application. The general public’s interaction with psychometric assessment comes primarily through the job application process. The corollary is that those who are responsible for those processes must be skilled practitioners in their field and have a highly justifiable reason for the application of psychometrics and the application of a given assessment. This gives rise to the myth of impartiality in reference to practitioners.

The practitioner is often reliant on test providers as their source of information on psychometrics. However, this is akin to asking a financial advisor, who is selling a particular investment, to describe the principles of investment to you! It is important to recall that those who are psychologically trained are subject to issues of impartiality (as discussed in last month’s blog post).

Research has indicated that practitioners’ beliefs in predictive power does not marry with reality https://www2.bc.edu/~jonescq/articles/rynes_AME_2002.pdf. While this may change over time, practitioners who lack the skills needed to read the statistics and understand how the tools are applied are unaware of their own blind-spots when it comes testing.

Examples I have witnessed include:

  • Assuming that the correlation can be read as a percentage (%). For example, a common misconception is assuming that a scale that has a correlation of 0.3 between job performance and conscientiousness accounts for 30% of the variability and not 9%.
  •  Talking about the validity of a test when it is not so much a test that is ‘valid’ as the scales inside the test that correlate with a given outcome.
  • Not understanding that the correlation they are citing as evidence for the value of the test is not linear. According to the research, extreme ends of the scale are best for predictive purposes. However, most practitioners will warn of the problems with extremes. The contradiction between application and research is clear.
  • Assuming a quoted validity is applicable to their organisation. Validity varies greatly between jobs, organisations, and time. These are only 3 variables. To talk of using a given validity as ‘applicable to your organisation’ is often a big leap in logic.
  • Validity is ultimately more than a number on a page. It is a system of interacting parts to produce an outcome. To simplify it to a number makes the commonly relied upon concept near redundant.
  • While many practioners ask about the size of a norm group very few ask about the makeup of the norm group.
  • Those that ask about the makeup of the norm group fail to ask about the spread of data.
  • A classic example is the request for industry based norms. People fail to understand that the request for industry based norms has inherent problems such as the restriction of range that comes by taking a more homogenous sample. This is highly apparent when looking at industry norms for cognitive ability.

A practitioner may be influenced about a product as a result of its branding, rather than its substance, if critical evaluation tools are not used to evaluate the assessment more fully. If a tool is a ‘leadership tool’ than it is presumed what’s needed for leadership. If the assessment claims to ‘predict psychopathic behaviour at work’, than it is assumed that it must do so. The practitioner is convinced that the right tool is found for the job and the brand may even justify its high costs.

Rather than be impartial, practitioners tend to use what they are comfortable with and endorse it accordingly. Often, they don’t have full knowledge of the options available to them http://wolfweb.unr.edu/homepage/ystedham/fulltext2%20467.pdf, and testing may become a tick box service that is transactional rather than strategic in nature. Many HR professionals are so busy with a multitude of HR concerns that they do not have the time to spend on turning psychometrics into a strategic solution. Neither do they investigate validity in a more sophisticated way. Ironically, this then elevates the value of the aura of the psychometric tool and the myth of impartiality continues.

The solution to this problem is relatively simple. Firstly, HR professionals who use assessments need to attend some basic training that covers the myths and realities of psychometric testing. I’m proud to say that OPRA has been running these courses, together with thought pieces like this, since the late 1990’s. The solution however is not to attend an OPRA course but to attend any course that takes a critical look at the application of psychometrics. The second is to understand the limitations of testing and opt for a simple broad brush measure of personality and cognitive ability that is cost effective for the organisation without giving the test more credibility than it is worth. Finally, adopt a more critical outlook to testing that enables one to truly be impartial.

 

Psychometric Test Providers

The final area of impartiality I want to look at is the test providers themselves; it is only fitting that I close with a critical review of the industry I’m entrenched in. The reality is that any claims to impartiality by someone who is selling a solution should be regarded with caution. Many people do not realise that the testing industry is increasingly lucrative as demonstrated by recent acquisitions. For example, in recent times we have seen the $660 million acquisition of SHL by CEB http://ir.executiveboard.com/phoenix.zhtml?c=113226&p=irol-newsArticle&ID=1711430&highlight= or Willey’s purchase of Inscape http://www.disclearningsolutions.com/wiley-acquires-inscape-a-leading-provider-of-disc-based-learning-solutions/, and more recently Profiles International http://www.themiddlemarket.com/news/john-wiley-pays-51-million-for-profiles-international-248848-1.html

It would be naïve to think that such businesses could be truly impartial. The fact is that testing companies build and hold a position much like other industries such as soft drink or food. The result is that innovation ceases and marketing takes over.

No technology of which we are aware- computers, telecommunications, televisions, and so on- has shown the kind of ideational stagnation that has characterized the testing industry. Why? Because in other industries, those who do not innovate do not survive. In the testing industry, the opposite appears to be the case. Like Rocky I, Rocky II, Rocky III, and so on, the testing industry provides minor cosmetic successive variants of the same product where only the numbers after the names substantially change. These variants survive because psychologists buy the tests and then loyally defend them (see preceding nine commentaries, this issue).

Sternberg, R. J., & Williams, W. M. (1997). Does the Graduate Record Examination predict meaningful success in the graduate training of psychologists? A case study. American Psychologist, 52,

The solution to this problem is not innovation for innovation’s sake. This tends to happen when we try to achieve greater levels of measurement accuracy and lose sight of what we are trying to achieve (such as predict outcomes). As an example, the belief that IRT based tests will provide us with greater validity does not appear to be supported by recent studies http://www.nmd.umu.se/digitalAssets/59/59524_em-no-42.pdf and
http://heraldjournals.org/hjegs/pdf/2013/august/adedoyin%20and%20adedoyin.pdf.

Moreover, we can contrast increase measurement sophistication with moves toward the likes of single item scales and the results are surprisingly equivalent: (cf. Samuel, D.B., Mullins-Sweatt. S.N., & Widiger, T.A. (2013) An investigation of the factor structure and convergent and discriminant validity of the Five-Factor Model Rating Form. Assessment, 20, 1, 24-35.)

There is simply a limitation to how much an assessment will ultimately be able to capture the complexity of human behaviour that itself is subject to freewill. It is no more complex than this. Rather than highlighting on the magical uniqueness of their test, psychometric test providers need to be upfront about the limitations of their assessments. No one has access to a crystal ball and claims that one exists are fundamentally wrong.

The future for testing companies lies in acknowledging the limitations of their tests and recognising that they are simply part of an HR ecosystem. It is within that system that innovation can reside. The focus then moves away from trying to pretend that a given test is significantly better than others, and instead focuses on the how the test will add value through such things as:

  • Integration with an applicant tracking system to aid screening
  • Integration with learning and development modules to aid learning
  • Integration with on-boarding systems to ensure quick transition into work.

There are a range of solid respectable tests available and their similarities are far greater than their differences. Tests should meet minimum standards, but once these standard are met, the myth of impartiality is only addressed but accepting that there are a collection of quality tools of equivalent predictive power and the eco-system not the assessment should be the focus point.

I realise I’m still a myth behind in the series and will follow up with a short piece that provides more support for the use of psychometrics in industry; addressing the myth that psychometric tests have little value for employment selection.

Posted in Uncategorized | Tagged , , , , , , , , , , , | Leave a comment

Effective Talent Management

There is no doubt that more and more organisations are implementing talent management strategies and frameworks. However whilst talent management is fast becoming a strategic priority for many organisations, Collings & Mellahi (2009) suggest that the topic of talent management lacks a consistent definition and is still largely undefined. Literature reviews reveal that one reason for this is that the empirical question of “what is talent?” has been left unanswered.

The term talent has undergone considerable change over the years. It was originally used in the ancient world to denote a unit of money, before adopting a meaning of inclination or desire in the 13th century, and natural ability or aptitude in the 14th century (Tansley 2011, as cited in Meyers, Woerkom, & Dries, 2013). Today’s dictionary definition of talent is “someone who has a natural ability to be good at something, especially: without being taught” (Cambridge Dictionaries Online, 2014).  This definition implies that talent is innate rather than acquired. This holds important implications for the application of talent management in practice. For example, it influences whether we should focus more on the identification/selection of talent or the development of talent.

Talent management is defined as “an integrated, dynamic process, which enables organisations to define, acquire, develop, and retain the talent that it needs to meet its strategic objectives” (Bersin, 2008).

Integrated talent management implies we take a more holistic approach; starting with the identification of key positions and capabilities required which contribute to an organisations sustainable competitive advantage (Collings & Mellahi, 2009). Equipped with this information we are better able to gather talent intelligence to help determine capability gaps, identify individual potential, and any areas for development.  Talent intelligence and performance tools capable of gathering this type of information include: well validated psychometric assessments, 360° surveys, engagement surveys, post appointment and exit interviews etc. Strategic and integrated talent management is not only essential in the current market, but provides an opportunity to be pro-active rather than reactive in addressing your critical talent needs.

We suggest that key components of an effective talent management process would include:

  1. A clear understanding of the organisations current and future strategies.
  2. Knowledge of key positions and the associated knowledge, skills, and abilities required (job analysis and test validation projects can assist here).
  3. Objective metrics that identify gaps between the current and required talent to drive business success.
  4. A plan designed to close these gaps with targeted actions such as talent acquisition and talent development.
  5. Integration with HR systems and processes across the employee lifecycle.

What is clear is that talent management is becoming more and more important as organisations fight for the top talent in a tight job market. Key to success will be identifying what ‘talent’ looks like for your organisation and working to ensure they are fostered through the entire employment lifecycle.

 

Meyers, M. C., van Woerkom, M., & Dries, N. (2013). Talent—Innate or acquired? Theoretical considerations and their implications for talent management. Human Resource Management Review, 23(4), 305-321.

Collings, D. G., & Mellahi, K. (2009). Strategic talent management: A review and research agenda. Human Resource Management Review, 19(4), 304-313.

Bersin Associates. (2008). Talent Management Factbook.

Posted in I/O Psychology, Talent Management | Tagged , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , | Leave a comment

Outplacement: What are ‘Employers of Choice’ doing in the Face of Job Cuts?

With the current downturn in the mining industry, management are making tough decisions regarding asset optimisation, cost management, risk management and profitability. Naturally, head count is being scrutinised more closely than ever. What isn’t hitting the headlines is what genuine ‘employers of choice’ are doing to support their exiting workforce and their remaining staff.

A leading Global Engineering Consultancy recently made a corporate decision to discontinue a once profitable consulting arm of their Australian operation. With increased competition, reduction in mining demand and eroding profit margin a very difficult restructure resulted in the redundancy of 40 national engineering roles. As an employee-owned organisation that lives its company values which include Teamwork, Caring, Integrity, and Excellence, this decision was not made easily. Throughout the decision-making process Management was naturally mindful to uphold these values, and BeilbyOPRA Consulting was engaged to provide Outplacement and Career Transition services to individuals for a period of up to 3 months.

The objectives of the project were to ensure that individual staff were adequately supported through this period of transition and ultimately, to gain alternate employment as quickly as possible.

BeilbyOPRA’s Solution:

BeilbyOPRA Consulting’s solution was led by a team of Organisational Psychologists and supported by Consultants being on site in seven locations throughout Australia on the day that the restructure was communicated to employees. Consultants provided immediate support to displaced individuals through an initial face-to-face meeting, where the Career Transition program was introduced.  From here, individuals chose whether or not to participate in the program, the key topics of which included:

  • Taking Stock – Understanding and effectively managing the emotional reactions to job change.
  • Assessment – Identifying skills and achievements through psychometric assessment and feedback sessions.
  • Preparation – Learning about time management skills; developing effective marketing tools; resume writing and cover letter preparation; telephone techniques.
  • Avenues to Job Hunting – Tapping into the hidden job market; responding to advertisements; connecting with recruitment consultants.
  • Interviews – Formats; preparation; how to achieve a successful interview.
  • Financial Advice – BeilbyOPRA partnered with a national financial services firm to offer participants complimentary financial advice.

 The Outcome:

Of the 40 individuals whose positions were made redundant:

  • 78% engaged in the first day of the program.
  • Of this group, 48% participated in the full program, as the remainder only utilised one or two of the services before securing employment.
  • 83% of those who participated in the full program gained employment within 3 months

Some of the learning outcomes from this project for organisations include:

  • Conduct thorough due diligence before committing to the restructure.
  • Create a steering committee to oversee the redundancy process.
  • Ensure accurate, relevant and timely communication is provided to all those involved.
  • Have a trial run of the entire process.
  • Have a dedicated internal project manager to facilitate the outplacement project.
  • Ensure that the staff who remain employed with your organisation, ‘the survivors’, are informed and supported.

In summary, the value of outplacement support was best captured by the National HR Manager who stated:

“It is about supporting staff and upholding our values through good and difficult times. From a legal, cultural and branding perspective outplacement support is critical. As the market changes we will hope to re-employ some of the affected staff and some will become clients in the future’.

Posted in I/O Psychology | Tagged , , , , , , , , , , | Leave a comment

The Myth of Impartiality: Part 1

In last month’s post I signed off by noting that impartiality was a pervasive myth in the industry. The corollary is that assuming impartiality allows many of the myths in the industry to not only continue but flourish. Very few in the industry can lay claims to being completely impartial, yours truly included. The industry at all levels has inherent biases that any critical psychologist must be mindful of. The bias starts at with university and research and then the myth is passed, often by practioners on to the consumer (be that person or organisation).

A colleague recently sent me a short paper that I think is compulsory reading for anyone with a critical mind in the industry. The article uses the metaphor of Dante’s Inferno to discuss the demise of science. Keeping with the theme, I would like use another biblical metaphor of the Four Horseman of the Apocalypse in reference to the myth of impartiality. These Horsemen represent the four areas where impartiality is professed but often not practiced, resulting in a discipline that fails to deliver for its followers the Promised Land being touted. The Four Horsemen in this instance are: University, Research, Practioners, and Human Resources.

Unlike the biblical version, destiny is in our hands and I want to continue to present solutions rather than simply highlight problems. Thus, each of the Four Horseman of impartiality can be defended (or at least be inflicted with a flesh wound) with some simple virtuous steps that attack the myth of impartiality. Sometimes these steps require nothing more than acknowledging that the science and practice of psychology is not impartial. Other times we are called to address the impartiality directly. Because of the length of the topic, I will break this into two blogs for our readers.

 

Universities

Many universities are best thought of as corporations. Their consumers are students. Like any other corporation they must market to attract consumers (students) and give students what they want (degrees). To achieve this end a factory type process is often adopted; which in the world of education often means teaching, and having students repeat and apply rules. Moreover, students want to at least feel that they are becoming educated and numbers and rules provide this veil. Finally, the sheer complexity of human behaviours means that restrictive paradigms for psychology are adopted in opposition to a deep critical analysis of the human condition. This in turn gives the much-needed scale required to maximise the consumer base (i.e. easy to digest product, respectability, capacity to scale the production (education) for mass consumption).

 

For this reason Psychology is often positioned purely as a science, which it is not. This thinking is reinforced by an emphasis on quantitative methodologies reinforcing the myth of measurement. Papers are presented without recognising the inherent weaknesses and limitations of what is being discussed. Quality theoretical thinking is subordinated to statistics. The end result is that while university is presented as an impartial place of learning, this ignores the drivers for impartiality that are inherent in the system. Often the rules of learning that are created to drive the learning process do so to meet the needs of the consumer and increase marketability and the expense of impartial education. Those who come out of the system may fail to fully appreciate the limitations in their knowledge, and as the saying goes ‘a little knowledge is dangerous’.

 

University is the most important of the Four Horsemen of impartiality as it is within university that many of the other myths are generated. By training young minds in a way of thinking and appearing impartial, universities create ‘truths’ in the discipline that are simply a limited way of viewing the topic. This results in myths like the myth of measurement (and various conclusions drawn from research), that become accepted as truth and students graduate with faulty information or over confidence in research findings. Those who do not attend university, but hold graduates with a degree of esteem, likewise fail to understand that they are now also victims of a myth of impartiality.

 

The virtuous steps

 This blog is too short to address all the shortcomings of universities in the modern environment. However if we don’t, we will lose more and more quality researchers and teachers from our ranks [see: http://indecisionblog.com/2014/04/07/viewpoint-why-im-leaving-academia/]. What I suggest is that Psychology re-embrace its theoretical roots by being more multi-discipline in its approach, incorporating science and statistics with the likes of philosophy and sociology.

 

The second step is to make compulsory a course in ‘Critical Psychology’. This would in turn go beyond the sociopolitical definition of critical psychology often given and focus on issues of critique as discussed in these blogs. These would include: issues of measurement, the role of theory, the problems of publish or perish, etc. In short, a course that covers the problems inherent in the discipline; acknowledging that these are things that every psychologist, applied or researching, must be mindful of. For the Universities already taking these steps in a meaningful way I commend you.

Research

The idea that research is impartial has been dismissed some time ago by all but the most naïve. The problem is not so much one of deliberate distortion, although this can be problematic also as we will see later on. Rather it is the very system of research that is not impartial.

Firstly there is the whole ‘publish or perish’ mentality that pervades all those that conduct research, whether academics or applied psychologists. Researchers are forced by market drivers or university standards to publish as much as possible as ‘evidence’ that we are doing our job. The opportunity cost is simply that quality research is often in short supply. For one of the best summaries of this problem I draw your attention to Trimble, S.W., Grody, W.W., McKelvey, B., & Gad-el-Hak, M. (2010). The glut of academic publishing: A call for a new culture. Academic Questions, 23, 3, 276-286. There are many powerful points made in this paper and some of the key points are that quality research takes time and is counter to the ‘publish or perish’ mentality. Moreover, a real contribution often goes against conventional wisdom and therefore puts one in the direct firing of many current contemporaries.

Why does this glut occur? I can think of three key reasons.

The first is that researchers are often graded by the quantity, not quality, of the work they produce. The general public tends not to distinguish between grades of journals, and academic institutions have key performance indicators that require a certain number of publications per year.

The second reason is that journals create parameters by which research will be accepted. I have discussed this topic to death in the past, but evidence of bias include favouring novel findings to replication, favouritism to papers that reject the null hypothesis, and numbers as the criteria of supporting evidence over logic and theory. This in turn creates a body of research that projects itself as the body-of-knowledge in our discipline when in reality it is simply a fraction, and distorted fraction at that, of how we understand human complexity (c.f. Francis, G. (2014). The frequency of excess success for articles in Psychological Science. Psychonomic Bulletin and Review (In Press; http://www1.psych.purdue.edu/~gfrancis/pubs.htm ),1-26).

Abstract: Recent controversies have questioned the quality of scientific practice in the field of psychology, but these concerns are often based on anecdotes and seemingly isolated cases. To gain a broader perspective, this article applies an objective test for excess success to a large set of articles published in the journal Psychological Science between 2009-2012. When empirical studies succeed at a rate higher than is appropriate for the estimated effects and sample sizes, readers should suspect that unsuccessful findings were suppressed, the experiments or analyses were improper, or that the theory does not properly account for the data. The analyses conclude problems for 82% (36 out of 44) of the articles in Psychological Science that have four or more experiments and could be analyzed.

The third reason is funding. Where money is involved there is always a perverse incentive to distort. This occurs in universities where funding is an issue, and through industry where a psychologist may be brought in to evaluate such an intervention. The reasons are obvious and are often more subtle than straight distortion. For example, universities that require funding from certain beneficiaries may be inclined to undertake research that, by design, returns positive findings in a certain area, thus being viewed positively by grants committees. The same may be true in industry where an organisational psychology company is asked to evaluate a social programme but the terms of the evaluation are such that the real negative findings (such as opportunity cost) are hidden. This had led to calls for transparency in the discipline, such as in Miguel, E., Camerer, C., Casey, K., Cohen, J., Esterling, K., Gerber, A., Glennerster, R.,Green, D., Humphreys, M., Imbens, G., Laitin, D., Madon, T., Nelson, L., Nosek, B.A., …, Simonsohn, U., & Van der Laan, M. (2014). Promoting transparency in social science research. Science, 343, 6166, 30-31. While the paper makes a strong argument for quality design it also notes the trouble with previse incentives:

Accompanying these changes, however, is a growing sense that the incentives, norms, and institutions under which social science operates undermine gains from improved research design. Commentators point to a dysfunctional reward structure in which statistically significant, novel, and theoretically tidy results are published more easily than null, replication, or perplexing results ( 3, 4). Social science journals do not mandate adherence to reporting standards or study registration, and few require data-sharing. In this context, researchers have incentives to analyze and present data to make them more “publishable,” even at the expense of accuracy. Researchers may select a subset of positive results from a larger study that overall shows mixed or null results (5) or present exploratory results as if they were tests of pre-specified analysis plans (6).

Then there are the outright frauds (see: http://en.wikipedia.org/wiki/Diederik_Stapel). For those who have not read this in other blogs I urge you to look at this New York Times interview. My favourite quote:

“People think of scientists as monks in a monastery looking out for the truth,” he said. “People have lost faith in the church, but they haven’t lost faith in science. My behavior shows that science is not holy.”… What the public didn’t realize, he said, was that academic science, too, was becoming a business. “There are scarce resources, you need grants, you need money, there is competition,” he said. “Normal people go to the edge to get that money. Science is of course about discovery, about digging to discover the truth. But it is also communication, persuasion, marketing. I am a salesman…

 

The virtuous steps

 To address this issue of impartiality of research we need a collective approach. Universities that have a commitment to research must aim for quality over quantity and allow researchers the time to develop quality research designs that can be tested and examined over longer periods. Research committees must be multi-disciplinary to make sure that a holistic approach to research prevails.

We must have arm’s-length between funding and research. I don’t have an answer for how this would occur, but until it does universities will be dis-incentivised to conduct fully impartial work. Journals need to be established that provide an outlet for comprehensive research. This will see a removal of word limits in favour of comprehensive research designs that attempt to cover more for alternative hypothesis to be tested and dismissed. Systems thinking needs to become the norm and not the exception.

Finally, and most importantly, our personal and professional ethics must be paramount. We must contribute to the body of knowledge that is critiquing the discipline for the improvement of psychology. We must make sure that we are aware of any myth of impartiality in our work and make this explicit while trying to limits its effect on our work; whether it is as a researcher or applied. We must challenge the institutions (corporate and universities) we work for to raise the game, in incremental steps

In Part Two, I will take a critical look at my industry, psychometric testing and applied psychology, and how the myth of impartiality is prevalent. I will also discuss how this is then furthered by those who apply our findings within Human Resource departments.

Posted in I/O Psychology, Scientist - Practitioner, Uncategorized | Tagged , , , , , , , , , , | Leave a comment

Myth 3: The Myth of Measurement

I would like to begin by apologising for not getting a myth out last month. I was working in the Philippines. Having just arrived back in Singapore I will make sure to get out two myths this month.

The first myth for April that I wish to highlight is a myth that some may see as almost to commit sacrilege in the industry. The idea that I wish to challenge is that I/O psychology can truly be classed as a strong measurement science. To be clear, I’m not saying that I/O is not a science or that it does not attempt to measure aspects of human behaviour related to work. Rather what I’m suggesting is that it is not measurement as the word is commonly used. The corollary is to talk of measurement in our field if it was similar to the common use of the term and in doing so give the discipline more predictability and rigour than it deserves.

The classic paper that challenged my thinking in regards measurement was ‘Is Psychometrics Pathological Science‘ by Joel Michell.

Abstract

Pathology of science occurs when the normal processes of scientific investigation break down and a hypothesis is accepted as true within the mainstream of a discipline without a serious attempt being made to test it and without any recognition that this is happening. It is argued that this has happened in psychometrics: The hypothesis, upon which it is premised, that psychological attributes are quantitative, is accepted within the mainstream, and not only do psychometricians fail to acknowledge this, but they hardly recognize the existence of this hypothesis at all.

In regards to measurement, Michell presents very clear and concise arguments about what constitutes measurable phenomena and why psychological attributes fail this test. While in parts these axioms are relatively technical, the upshot is that just because a quantitative variable is ordered does not itself constitute measurement. Rather, ‘measurement’ requires further hurdles to be adhered to. A broad example of this concept is addititivity and the many associated operations that come when variables (or combinations) are added to produce a third variable, or provide support for an alternative equation. Psychological attributes fail on this and many other properties of measurement. As such, the basis for claims of measurement, in my opinion, are limited (or at least come with caution and disclaimers) and therefore the basis for much of the claim to being part of the ‘measurement-based-science’ school is not substantiated.

The limitations of the discipline as a measurement science is so fundamental that it should challenge the discipline far more than is currently so. The outcome should be both a downplaying of measurement practices and a greater focus on areas such as theory building which is then tested using a range of alternative methodologies. These same calls for the discipline have been made over the past few years and the disquiet in the discipline is growing:

Klein, S.B. (2014). What can recent replication failures tell us about the theoretical commitments of psychology? Theory and Psychology, 1-14.

Abstract

I suggest that the recent, highly visible, and often heated debate over failures to replicate results in the social sciences reveals more than the need for greater attention to the pragmatics and value of empirical falsification. It is also a symptom of a serious issue—the under-developed state of theory in many areas of psychology.

Krause, M.S. (2012). Measurement validity is fundamentally a matter of definition, not correlation. Journal of General Psychology, 16, 4, 391-400.

Abstract

….However, scientific theories can only be known to be true insofar as they have already been demonstrated to be true by valid measurements. Therefore, only the nature of a measure that produces the measurements for representing a dimension can justify claims that these measurements are valid for that dimension, and this is ultimately exclusively a matter of the normative definition of that dimension in the science that involves that dimension. Thus, contrary to the presently prevailing theory of construct validity, a measure’s measurements themselves logically cannot at all indicate their own validity or invalidity by how they relate to other measures’ measurements unless these latter are already known to be valid and the theories represented by all these several measures’ measurements are already known to be true….This makes it essential for each basic science to achieve normative conceptual analyses and definitions for each of the dimensions in terms of which it describes and causally explains its phenomena.

Krause, M.S. (2013). The data analytic implications of human psychology’s dimensions being ordinally scaled. Journal of General Psychology, 17, 3, 318-325.

Abstract

Scientific findings involve description, and description requires measurements on the dimensions descriptive of the phenomena described. …Many of the dimensions of human psychological phenomena, including those of psychotherapy, are naturally gradated only ordinally. So descriptions of these phenomena locate them in merely ordinal hyperspaces, which impose severe constraints on data analysis for inducing or testing explanatory theory involving them. Therefore, it is important to be clear about what these constraints are and so what properly can be concluded on the basis of ordinal-scale multivariate data, which also provides a test for methods that are proposed to transform ordinal-scale data into ratio-scale data (e.g., classical test theory, item response theory, additive conjoint measurement), because such transformations must not violate these constraints and so distort descriptions of studied phenomena.

What these papers identify is that:

  1. We must start with good theory building and the theory must be deep and wide enough to enable the theory to be tested and falsified.
  2. That construct validity is indeed important but correlations between tests are not enough. We need agreement on the meaning of attributes (such as the Big Five).
  3. That treating comparative data (such as scores on a normal curve) as if it were rigorous measurement is at best misleading and at worst fraud.

So where does this leave the discipline? Again, as is the theme threading through all these myths, we must embrace the true scientist/practioner model and recognise that our discipline is a craft. To overly rely on quantitative techniques is actually extremely limiting for the discipline and we need alternative ways of conceptualising ‘measurement’. In this regard I’m a big fan of the evaluation literature (e.g. Reflecting on the past and future of evaluation: Michael Scriven on the differences between evaluation and social science research) as providing alternative paradigms to solve I/O problems.

We must at the same time embrace the call for better theory building. If I/O Psychology, and psychology in general, is going to have valuable contributions to the development of human thought it will start with good, sound theory. Just putting numbers to things does not constitute theory building.

When using numbers we must also look for alternative statistical techniques to support our work. An example is Grice’s (2011): Observation Oriented Modelling: Analysis of cause in the behavioural sciences.  I looked at this work when thinking about how we assess reliability (and then statistical demonstrate it) and think it has huge implications.

Finally, when using numbers to substantiate an argument, support theory, or find evidence for an intervention we need to be clear on what they are really saying. Stats can lie and at best mislead and we must be clear as to what we are and are not saying, as well as the limitation in any conclusions we draw from a reliance on data. To present numbers as if they that had measurement robustness is simply wrong.

In the next blog I want to discuss the myth of impartiality and why these myths continue to pervade the discipline.

Acknowledgement: I would like to acknowledge Professor Paul Barrett for his thought leadership in this space and opening my eyes to the depth of measurement issues we face. Paul brought to my attention the articles cited and I’m deeply grateful for his impact on my thinking and continued professional growth.

Posted in I/O Psychology | Tagged , , , , , , , , , , , , , , , , , , , , , , , , , , , | Leave a comment