Category Archives: Uncategorized

Why HR Doesn’t Count – The Touchy Feely Side of Human Resources

A hallmark feature of high performing businesses is the commercial value that the Human Resources department contributes to the organisation. In such organisations, HR has a seat at the boardroom table and even junior HR personnel are highly attuned to the commercial drivers of the business.

For example they can quote staff attrition rates and the return on investment (ROI) ratios for the most recent leadership development program. In short, they know how their role contributes to saving or making money for their organisation. In most organisations however, HR is perceived as a cost centre and serves little more than an administration function and these HR personnel struggle to communicate how they make a difference to an organisation. Which type are you? Continue reading →

Tips to spot a myth

The Myth that Training is an Art not a Science

3 Replies

For many training is seen as an art, and a black art at that, rather than a science. The idea that there is actually a science to training, and a methodology to be followed to ensure its effectiveness, is an anathema to those that view their own training as some special gift that they alone possess. Much like the claims in the psychometric industry that a single test is the holy grail of testing these outrageous training claims are the same myths that simply distract from the truth. On the contrary training is an area that is now well researched and there is indeed a science to making training work.

Building on from their seminal work on training for team effectiveness Salas and his team have produced an excellent paper outlining what the science of training, (Salas, E., Tannenbaum, S.I., Kraiger, K., & Smith-Jentsch, K.A. (2012) The science of training and development in organizations: What matters in practice. Psychological Science in the Public Interest, 13, 2, 74-101).

The paper is a free download and is one of those must haves for all practioners. Firstly, the paper covers various meta-analysis that have been conducted on training and note that training has been found to be an effective from everything from managerial training and managerial leadership development through to behavioural modelling training.

Moreover the paper provides clear guidelines as to how to enhance training effectiveness. Building on the research the guidelines for practitioners include:

Pre-training recommendations (Training needs analysis)

1. Analysis of the job
2. Analysis of the organisation
3. Analysis of the person
Communication strategy
1. Notify attendees
2. Notify supervisors
During training interventions
1. Creating the learner mind-set
2. Following appropriate instructional principles
3. Using technology wisely.
Post training
1. Ensure training transfer
2. Evaluation methodology

The paper in many ways is what our discipline is all about.;there is a strong research base, culminating research from multiple sources, with useful guidance for the practioner provided. This is applied psychology and this is the scientist-practioner model in practice.

As noted by Paul Thayer in his editorial to the paper:

“… There is a system and a science to guide organizations of all types in developing and/or adopting training to help achieve organizational goals. Salas et al. do an excellent job of summarizing what is known and providing concrete steps to ensure that valuable dollars will be spent on training that will improve performance and aid in the achievement of those goals. In addition, they provide a rich bibliography that will assist anyone needing more information as to how to implement any or all the steps to provide effective training. Further, they raise important questions that organizational leaders and policymakers should ask before investing in any training program or technology”.

There are many myths that pervade business psychology. Unfortunately these often result in the baby often being thrown out with the bath water and people dismissing the discipline as a whole. The key for any discerning HR professional and i/o psychologist is to be able to tell the myth from reality and have a simple framework, or check points, to be a discerning reader of research. More on this tomorrow in the last blog for the year.

The myth that training to improve team functioning doesn’t work

1 Reply

Yesterday we noted that there was little support for the Belbin team model. The idea that there is a prescribed model for a team is simply not supported and the Belbin model does not improve organisational effectiveness. Taking this into consideration, does training to improve team functionality actually make a difference?

I’m pleased to note that training to improve team performance is an area that is both well researched and the research is generally positive. Not only do interventions appear to improve team effectiveness, we also have an idea through research as to what moderates the success of team interventions.

In terms of the research around team training, the seminal work in the area was a meta-analysis conducted in 2008. For those not from a research background, a meta-analysis can be thought of as an analysis of analysis. The researchers bring together various studies and re-analyse the data to gain greater confidence in the results through establishing a larger sample size. While the technique has its critics and may lead to statistical over estimates, this is one of the better methods we have to establish an evidence base for generalisable trends in applied research.

The team training effectiveness meta-analysis was extremely thorough in examining both outcomes and moderators. A range of outcomes were assessed, including:

Cognitive outcomes predominantly consisted of declarative knowledge gains.
Team member affective outcomes included socialisation, trust and confidence in team members’ ability and attitudes concerning the perceived effectiveness of team communication and coordination processes.
Team processes included behavioural measures of communication, coordination, strategy development, self-correction, assertiveness, decision making and situation assessment.
Team performance integrated quantity, quality, accuracy, efficiency and effectiveness outcomes.

Moderator variables included:

Training content (taskwork, teamwork, mixed)
Team stability (intact, ad hoc)
Team size (large, medium, small)

While a blog post is not sufficient to explore the research in depth, suffice to say that moderate to strong positive outcomes were found for all four outcomes. Team process appears to be the most malleable. Training teams to communicate better, avoid group think, make effective decisions and think strategically, is likely to be an investment that delivers returns for organisations. Training to improve affective outcomes, such as trust and confidence in team members, appears less effective. This was especially the case when applied to large teams.

Aside from team size, the results were moderated by team stability with well-established teams responding better to training than ad hoc teams. Training content had limited effect on the outcomes of the training with both task work and team work oriented interventions producing positive results.

The results of this meta-analysis are encouraging for i/o psychology. Team effectiveness is an area where there is a strong research basis for intervention and where intervention is likely to have a positive impact. This is an area where the scientist-practitioner model that is central to our discipline appears to be alive and well. We have interventions that are well researched and have some understanding of the levels of effectiveness taking into account other variables. Does this lead to science of training? Are there principles we can take from the literature that can be applied to make training effective? Or is training an art and not a science? This is the question for tomorrow.

The myth of team models (Belbin)

2 Replies

In yesterday’s blog we discussed the power of two and the myth of the single star innovator. The follow on from this discussion is naturally: ‘If two is better than one, a team is surely better than two’. Unfortunately, the literature is far less supportive of this idea.

The most pervasive model of team work, especially in the UK, is the idea of the Belbin team. For those not aware, the Belbin model is defined by 9 supposed team types in part defined by orientation to the people side of a task or the thing/doing side of a task. The idea is that teams operate better when these various positions are fulfilled.

The assumptions behind the Belbin team roles don’t stack up to the hype. Firstly, the psychometric properties of the model have been found wanting (Furnham, A., Steele, H. and Pendleton, D. (1993), “A psychometric assessment of the Belbin Team-Role Self-Perception Inventory”, Journal of Occupational and Organizational Psychology, 66: 245–257; Fisher S.G., Macrosson W.D.K, Sharp, G. (1996) “Further evidence concerning the Belbin Team Role Self‐Perception Inventory”, Personnel Review, Vol. 25, pp.61 – 67). Research indicates that the model lacks the proposed factor structure and offers little above what a standard personality tool may prescribe in terms of how people would like to work.

In essence, we get the same preferences by simply looking at one’s personality but with the added advantage of a replicable psychological model. While the Belbin model may be useful as a descriptive model, this is different to what one often wants when thinking about such things psychometrically.

Perhaps more importantly, the relationship between the model and actual job performance is weak to say the least. (Wouter van Walbeek, R.B, & Maur, W (2013) “Belbin role diversity and team performance: is there a relationship?”, Journal of Management Development, Vol. 32 , pp.901 – 913). There is no link that this supposed diversity aids team performance. Even leaders under the model failed to demonstrate improved performance.

So what is the ultimate number for a team and what are the team roles that need to be fulfilled? The most accurate answer to this question is ‘it depends’ (the details are covered well in Wikipedia’s description of ‘team’).

Like much of i/o psychology there are no simple answers, and the only people that ever prescribe simple answers are those that have something to sell. To solve real-world problems – such as optimal team size for a given organisation – requires an analysis of the tasks, time frames for completion, competing demands of individuals, competence and willingness of the team and trainability, to name but a few variables. Ours is an applied discipline and what is required is the application of knowledge inside a given system to find individual solutions that work. This is not surprisingly applicable to our work around teams.

I want to make the point that a team is distinct from a ‘group’ and this simple point is often overlooked by practitioners. More often than not when I’m asked to do a ‘team workshop’ it is to help a group of employees who know their jobs well but need to learn how to get along. To describe them as a ‘team’ is to miss the forest for the trees. These groups tend to comprise of people with individual differences that need techniques and models to understand each other better, get along, and harness each other’s strengths and weaknesses. Ironically this type of intervention is what many team interventions consist of. Do these interventions work? This is the topic for tomorrow.

Myths about Teams and Stars – The Myth of the Single Star

2 Replies

I’m a couple of blogs behind for the year. While this is indicative of a busy and successful year at OPRA, it is no excuse for not completing the 12 part series on myths for 2014. So with a week’s holiday, and 5 myths to go, what better time to finish this year’s topic for the OPRA blog? In good scientific fashion this also provides a royal opportunity to test whether a series of blogs over a week is more effective than one a month.

A topic that has many permutations in respect to myths is that of teams and stars. People love the idea of teams but literature and research in the space is less complementary. In this series of posts I want to look at the work of teams both from a practice and literature perspective, and try and separate myth from reality.

To begin I want to look at the anti-hero of the team, namely the notion of the star or sole genius. This is pervasive in modern business culture with the likes of Branson, Jobs and Trump; people perceived as sole innovators of the creativity that defined the businesses they are associated with. This is not to say that these people necessarily endorsed the idea that they themselves were the be-all-and-all. Rather, the common myth purported by society is that the company’s success is mainly attributed to a single individual.

The idea that success can be attributed to one person is not borne out in either pop research or academic research. A recent book highlighted this issue by noting what they term the ‘Power of Two’:

The book examines the process for creativity noting why two is the magic number for the creative process to realise returns. In doing so the book covers the role of serendipity that lies in much a success (a point that is so often glossed over in the literature) i.e. they have to meet! They need to have differences that combine to form a single powerful entity. They must work as a pair but enjoy enough distance and role separation to cultivate distinct ideas. In short it is not the individual that creates success but the individual and their side kick that achieve optimal results.

Evidence for the power of two is also borne out in academic literature. Business decisions are invariably preceded by a decision to act, and when it comes to decision making the power of two is again apparent.

In a 2012 article published in Science (Koriat, A. (2012) ‘When two heads are better than one’. Science, 20 April 2012: Vol. 336 no. 6079 pp. 360-362) evidence was found for two people decision making to be superior to that of the individual. While I will not go into the study in depth, the key is the ability for each individual in the dyad to be able to communicate their confidence in judgements freely (i.e. a truly equal playing field). Thus, where the dyad falls down is when one person’s confidence over powers the pair.

This study builds on earlier work that likewise states that the benefit of the pair comes from the ability to express confidence in decision making freely. The key outcome is thus that to enhance the power of two in decision making, there should be similar levels of competence and the ability to freely express confidence. Once again this shows the inherent nature of psychological research being multi-faced as this invariably will involve having people of similar levels of self-esteem, emotional intelligence, etc. for this magical effect to be optimised.

So if one is not the answer and two is clearly better, what happens when team size increases? More on this tomorrow.

Attacking the Myth That Personality Tests Don’t Add Value

6 Replies

For this month’s myth, the last on the topic of psychometrics, I have chosen a slightly different approach. I’m coming out in defence of personality tools; when they are used correctly and understood in the right context. Rather than reinvent the wheel in this regard, I have chosen to highlight what I believe to be a very reasoned article on the topic in Forbes by Tomas Chamorro-Premuzic.

Before posting a direct link to the article, I want to set the context for the value in psychometrics. For me, it is based on 5 key points:

Predicting human behaviour is difficult: The psychometric industry often over steps the mark with the levels of prediction it claims to have. Assessments are not crystal balls and the search for the greatest predictive tool which is easily generalizable across multiple contexts is futile. The corollary, however, is not true that due to human complexity assessments have no application and understanding a little more about a person’s behavioural preference, and understanding a framework of personality, has no value. On the contrary, it is valuable for the very reason that human beings are complex and more information on individual differences and frameworks to help us conceptualise behavioural patterns does add value to the people decisions we need to make. Psychometric tools provide a framework for understanding personality and provides a simple, relative measurement model to assist decision-making..

Human beings have free-will: It never ceases to amaze me when I meet people who are sycophantic with respect to their devotion to a particular assessment tool. It is as if they choose to ignore the concept of free-will. Behaviour will inevitably change across situations and with different reinforces, and this is so inherent that it needs no further explanation. What psychometric tools can do however is estimate the likelihood of behavioural change and the preference for behaviour. The assessment does not supersede free-will but rather helps us to understand how free-will be displayed a little bit better.

Lying, or distortion, is a problem for any assessment method: Lying is something humans often do! A common argument against personality tools is that people may present themselves in an overly positive light. It should be noted that the same criticism can reside in any assessment methodology, from interviews to CVs. It affects many dimensions of life, from employment to those hoping to meet Mr or Ms Right via an online dating site. Quality personality tools attempt to mitigate this issue with response style indicators such as measures of social desirability, central tendency and infrequency.

Behaviour is an indication between the situation and preference: Much like the comment on free-will, the situation should never be ignored when attempting to understand behaviour. Personality tests provide us with part of the puzzle, and in doing so they help us understand how someone is likely to behave. The keyword in that sentence is ‘likely’, and how ‘likely’ depends on the strength of the behavioural preference and the situation.

Personality assessments are a simple, coherent and quick method for shedding light on human complexity: The bulk of personality tools are used for recruitment. When recruiting a person, we need to make an expensive decision on limited information and in a short timeframe. This necessitates the need to look at all the feasible ways of making an informed judgement. At its most basic, the instrument is: a collection of items that have been clustered along psychometric principles, resulting in a degree of reliability over time and internal consistency thus giving meaning to a wider trait. A person’s responses are then compared to others who have taken the tests. Assuming the norm is relevant and up-to-date, and with spread, it gives us an indication of the person’s relative behavioural preference against a comparison group of interest. The information is used to make inferences on likely behaviour together with other information collected. That is the sum total of the process. For argument’s sake, the alternative would be to say that human behaviour is all too complex and we should operate without asking any questions at all. That is equally untenable.

The problem is not that personality tests have no value, but that practitioners over estimate their value and predictive power. Psychometric test providers may also confuse the issue by over promoting their assessment, marketing their uniqueness; and extolling the magical powers of their special item set. When understood in the right context, personality assessment can add value. When used as part of a complete system, interlinking recruitment to training to performance management, a deeper understanding of how personality impacts company performance can result. I agree that there are some tools that do not meet minimum psychometric standards and as such their usefulness is limited, but for those assessments that attempt to simply ‘do as they say on the tin’ the problem lies not with the assessment but the practice of the users and unrealistic expectations.

I strongly encourage you to read this short piece on the seven common, but irrational, reasons for hating personality tests: http://www.forbes.com/sites/tomaspremuzic/2014/05/12/seven-common-but-irrational-reasons-for-hating-personality-tests/

The Myth of Impartiality (Part 2)

1 Reply

Last month, I discussed the issue of impartiality with reference to universities and research. This month, I want to look at the myth of impartiality from the perspective of the users and suppliers of psychometrics. With respect to users, my focus is HR professionals and recruiters. The suppliers I refer to are the plethora of assessment suppliers from around the world.

Practioners

Much of the credibility of psychometric tests is assumed through their application. The general public’s interaction with psychometric assessment comes primarily through the job application process. The corollary is that those who are responsible for those processes must be skilled practitioners in their field and have a highly justifiable reason for the application of psychometrics and the application of a given assessment. This gives rise to the myth of impartiality in reference to practitioners.

The practitioner is often reliant on test providers as their source of information on psychometrics. However, this is akin to asking a financial advisor, who is selling a particular investment, to describe the principles of investment to you! It is important to recall that those who are psychologically trained are subject to issues of impartiality (as discussed in last month’s blog post).

Research has indicated that practitioners’ beliefs in predictive power does not marry with reality https://www2.bc.edu/~jonescq/articles/rynes_AME_2002.pdf. While this may change over time, practitioners who lack the skills needed to read the statistics and understand how the tools are applied are unaware of their own blind-spots when it comes testing.

Examples I have witnessed include:

Assuming that the correlation can be read as a percentage (%). For example, a common misconception is assuming that a scale that has a correlation of 0.3 between job performance and conscientiousness accounts for 30% of the variability and not 9%.
Talking about the validity of a test when it is not so much a test that is ‘valid’ as the scales inside the test that correlate with a given outcome.
Not understanding that the correlation they are citing as evidence for the value of the test is not linear. According to the research, extreme ends of the scale are best for predictive purposes. However, most practitioners will warn of the problems with extremes. The contradiction between application and research is clear.
Assuming a quoted validity is applicable to their organisation. Validity varies greatly between jobs, organisations, and time. These are only 3 variables. To talk of using a given validity as ‘applicable to your organisation’ is often a big leap in logic.
Validity is ultimately more than a number on a page. It is a system of interacting parts to produce an outcome. To simplify it to a number makes the commonly relied upon concept near redundant.
While many practioners ask about the size of a norm group very few ask about the makeup of the norm group.
Those that ask about the makeup of the norm group fail to ask about the spread of data.
A classic example is the request for industry based norms. People fail to understand that the request for industry based norms has inherent problems such as the restriction of range that comes by taking a more homogenous sample. This is highly apparent when looking at industry norms for cognitive ability.

A practitioner may be influenced about a product as a result of its branding, rather than its substance, if critical evaluation tools are not used to evaluate the assessment more fully. If a tool is a ‘leadership tool’ than it is presumed what’s needed for leadership. If the assessment claims to ‘predict psychopathic behaviour at work’, than it is assumed that it must do so. The practitioner is convinced that the right tool is found for the job and the brand may even justify its high costs.

Rather than be impartial, practitioners tend to use what they are comfortable with and endorse it accordingly. Often, they don’t have full knowledge of the options available to them http://wolfweb.unr.edu/homepage/ystedham/fulltext2%20467.pdf, and testing may become a tick box service that is transactional rather than strategic in nature. Many HR professionals are so busy with a multitude of HR concerns that they do not have the time to spend on turning psychometrics into a strategic solution. Neither do they investigate validity in a more sophisticated way. Ironically, this then elevates the value of the aura of the psychometric tool and the myth of impartiality continues.

The solution to this problem is relatively simple. Firstly, HR professionals who use assessments need to attend some basic training that covers the myths and realities of psychometric testing. I’m proud to say that OPRA has been running these courses, together with thought pieces like this, since the late 1990’s. The solution however is not to attend an OPRA course but to attend any course that takes a critical look at the application of psychometrics. The second is to understand the limitations of testing and opt for a simple broad brush measure of personality and cognitive ability that is cost effective for the organisation without giving the test more credibility than it is worth. Finally, adopt a more critical outlook to testing that enables one to truly be impartial.

Psychometric Test Providers

The final area of impartiality I want to look at is the test providers themselves; it is only fitting that I close with a critical review of the industry I’m entrenched in. The reality is that any claims to impartiality by someone who is selling a solution should be regarded with caution. Many people do not realise that the testing industry is increasingly lucrative as demonstrated by recent acquisitions. For example, in recent times we have seen the $660 million acquisition of SHL by CEB http://ir.executiveboard.com/phoenix.zhtml?c=113226&p=irol-newsArticle&ID=1711430&highlight= or Willey’s purchase of Inscape http://www.disclearningsolutions.com/wiley-acquires-inscape-a-leading-provider-of-disc-based-learning-solutions/, and more recently Profiles International http://www.themiddlemarket.com/news/john-wiley-pays-51-million-for-profiles-international-248848-1.html

It would be naïve to think that such businesses could be truly impartial. The fact is that testing companies build and hold a position much like other industries such as soft drink or food. The result is that innovation ceases and marketing takes over.

No technology of which we are aware- computers, telecommunications, televisions, and so on- has shown the kind of ideational stagnation that has characterized the testing industry. Why? Because in other industries, those who do not innovate do not survive. In the testing industry, the opposite appears to be the case. Like Rocky I, Rocky II, Rocky III, and so on, the testing industry provides minor cosmetic successive variants of the same product where only the numbers after the names substantially change. These variants survive because psychologists buy the tests and then loyally defend them (see preceding nine commentaries, this issue).

Sternberg, R. J., & Williams, W. M. (1997). Does the Graduate Record Examination predict meaningful success in the graduate training of psychologists? A case study. American Psychologist, 52,

The solution to this problem is not innovation for innovation’s sake. This tends to happen when we try to achieve greater levels of measurement accuracy and lose sight of what we are trying to achieve (such as predict outcomes). As an example, the belief that IRT based tests will provide us with greater validity does not appear to be supported by recent studies http://www.nmd.umu.se/digitalAssets/59/59524_em-no-42.pdf and
http://heraldjournals.org/hjegs/pdf/2013/august/adedoyin%20and%20adedoyin.pdf.

Moreover, we can contrast increase measurement sophistication with moves toward the likes of single item scales and the results are surprisingly equivalent: (cf. Samuel, D.B., Mullins-Sweatt. S.N., & Widiger, T.A. (2013) An investigation of the factor structure and convergent and discriminant validity of the Five-Factor Model Rating Form. Assessment, 20, 1, 24-35.)

There is simply a limitation to how much an assessment will ultimately be able to capture the complexity of human behaviour that itself is subject to freewill. It is no more complex than this. Rather than highlighting on the magical uniqueness of their test, psychometric test providers need to be upfront about the limitations of their assessments. No one has access to a crystal ball and claims that one exists are fundamentally wrong.

The future for testing companies lies in acknowledging the limitations of their tests and recognising that they are simply part of an HR ecosystem. It is within that system that innovation can reside. The focus then moves away from trying to pretend that a given test is significantly better than others, and instead focuses on the how the test will add value through such things as:

Integration with an applicant tracking system to aid screening
Integration with learning and development modules to aid learning
Integration with on-boarding systems to ensure quick transition into work.

There are a range of solid respectable tests available and their similarities are far greater than their differences. Tests should meet minimum standards, but once these standard are met, the myth of impartiality is only addressed but accepting that there are a collection of quality tools of equivalent predictive power and the eco-system not the assessment should be the focus point.

I realise I’m still a myth behind in the series and will follow up with a short piece that provides more support for the use of psychometrics in industry; addressing the myth that psychometric tests have little value for employment selection.

The Myth of Impartiality: Part 1

5 Replies

In last month’s post I signed off by noting that impartiality was a pervasive myth in the industry. The corollary is that assuming impartiality allows many of the myths in the industry to not only continue but flourish. Very few in the industry can lay claims to being completely impartial, yours truly included. The industry at all levels has inherent biases that any critical psychologist must be mindful of. The bias starts at with university and research and then the myth is passed, often by practioners on to the consumer (be that person or organisation).

A colleague recently sent me a short paper that I think is compulsory reading for anyone with a critical mind in the industry. The article uses the metaphor of Dante’s Inferno to discuss the demise of science. Keeping with the theme, I would like use another biblical metaphor of the Four Horseman of the Apocalypse in reference to the myth of impartiality. These Horsemen represent the four areas where impartiality is professed but often not practiced, resulting in a discipline that fails to deliver for its followers the Promised Land being touted. The Four Horsemen in this instance are: University, Research, Practioners, and Human Resources.

Unlike the biblical version, destiny is in our hands and I want to continue to present solutions rather than simply highlight problems. Thus, each of the Four Horseman of impartiality can be defended (or at least be inflicted with a flesh wound) with some simple virtuous steps that attack the myth of impartiality. Sometimes these steps require nothing more than acknowledging that the science and practice of psychology is not impartial. Other times we are called to address the impartiality directly. Because of the length of the topic, I will break this into two blogs for our readers.

Universities

Many universities are best thought of as corporations. Their consumers are students. Like any other corporation they must market to attract consumers (students) and give students what they want (degrees). To achieve this end a factory type process is often adopted; which in the world of education often means teaching, and having students repeat and apply rules. Moreover, students want to at least feel that they are becoming educated and numbers and rules provide this veil. Finally, the sheer complexity of human behaviours means that restrictive paradigms for psychology are adopted in opposition to a deep critical analysis of the human condition. This in turn gives the much-needed scale required to maximise the consumer base (i.e. easy to digest product, respectability, capacity to scale the production (education) for mass consumption).

For this reason Psychology is often positioned purely as a science, which it is not. This thinking is reinforced by an emphasis on quantitative methodologies reinforcing the myth of measurement. Papers are presented without recognising the inherent weaknesses and limitations of what is being discussed. Quality theoretical thinking is subordinated to statistics. The end result is that while university is presented as an impartial place of learning, this ignores the drivers for impartiality that are inherent in the system. Often the rules of learning that are created to drive the learning process do so to meet the needs of the consumer and increase marketability and the expense of impartial education. Those who come out of the system may fail to fully appreciate the limitations in their knowledge, and as the saying goes ‘a little knowledge is dangerous’.

University is the most important of the Four Horsemen of impartiality as it is within university that many of the other myths are generated. By training young minds in a way of thinking and appearing impartial, universities create ‘truths’ in the discipline that are simply a limited way of viewing the topic. This results in myths like the myth of measurement (and various conclusions drawn from research), that become accepted as truth and students graduate with faulty information or over confidence in research findings. Those who do not attend university, but hold graduates with a degree of esteem, likewise fail to understand that they are now also victims of a myth of impartiality.

The virtuous steps

This blog is too short to address all the shortcomings of universities in the modern environment. However if we don’t, we will lose more and more quality researchers and teachers from our ranks [see: http://indecisionblog.com/2014/04/07/viewpoint-why-im-leaving-academia/]. What I suggest is that Psychology re-embrace its theoretical roots by being more multi-discipline in its approach, incorporating science and statistics with the likes of philosophy and sociology.

The second step is to make compulsory a course in ‘Critical Psychology’. This would in turn go beyond the sociopolitical definition of critical psychology often given and focus on issues of critique as discussed in these blogs. These would include: issues of measurement, the role of theory, the problems of publish or perish, etc. In short, a course that covers the problems inherent in the discipline; acknowledging that these are things that every psychologist, applied or researching, must be mindful of. For the Universities already taking these steps in a meaningful way I commend you.

Research

The idea that research is impartial has been dismissed some time ago by all but the most naïve. The problem is not so much one of deliberate distortion, although this can be problematic also as we will see later on. Rather it is the very system of research that is not impartial.

Firstly there is the whole ‘publish or perish’ mentality that pervades all those that conduct research, whether academics or applied psychologists. Researchers are forced by market drivers or university standards to publish as much as possible as ‘evidence’ that we are doing our job. The opportunity cost is simply that quality research is often in short supply. For one of the best summaries of this problem I draw your attention to Trimble, S.W., Grody, W.W., McKelvey, B., & Gad-el-Hak, M. (2010). The glut of academic publishing: A call for a new culture. Academic Questions, 23, 3, 276-286. There are many powerful points made in this paper and some of the key points are that quality research takes time and is counter to the ‘publish or perish’ mentality. Moreover, a real contribution often goes against conventional wisdom and therefore puts one in the direct firing of many current contemporaries.

Why does this glut occur? I can think of three key reasons.

The first is that researchers are often graded by the quantity, not quality, of the work they produce. The general public tends not to distinguish between grades of journals, and academic institutions have key performance indicators that require a certain number of publications per year.

The second reason is that journals create parameters by which research will be accepted. I have discussed this topic to death in the past, but evidence of bias include favouring novel findings to replication, favouritism to papers that reject the null hypothesis, and numbers as the criteria of supporting evidence over logic and theory. This in turn creates a body of research that projects itself as the body-of-knowledge in our discipline when in reality it is simply a fraction, and distorted fraction at that, of how we understand human complexity (c.f. Francis, G. (2014). The frequency of excess success for articles in Psychological Science. Psychonomic Bulletin and Review (In Press; http://www1.psych.purdue.edu/~gfrancis/pubs.htm ),1-26).

Abstract: Recent controversies have questioned the quality of scientific practice in the field of psychology, but these concerns are often based on anecdotes and seemingly isolated cases. To gain a broader perspective, this article applies an objective test for excess success to a large set of articles published in the journal Psychological Science between 2009-2012. When empirical studies succeed at a rate higher than is appropriate for the estimated effects and sample sizes, readers should suspect that unsuccessful findings were suppressed, the experiments or analyses were improper, or that the theory does not properly account for the data. The analyses conclude problems for 82% (36 out of 44) of the articles in Psychological Science that have four or more experiments and could be analyzed.

The third reason is funding. Where money is involved there is always a perverse incentive to distort. This occurs in universities where funding is an issue, and through industry where a psychologist may be brought in to evaluate such an intervention. The reasons are obvious and are often more subtle than straight distortion. For example, universities that require funding from certain beneficiaries may be inclined to undertake research that, by design, returns positive findings in a certain area, thus being viewed positively by grants committees. The same may be true in industry where an organisational psychology company is asked to evaluate a social programme but the terms of the evaluation are such that the real negative findings (such as opportunity cost) are hidden. This had led to calls for transparency in the discipline, such as in Miguel, E., Camerer, C., Casey, K., Cohen, J., Esterling, K., Gerber, A., Glennerster, R.,Green, D., Humphreys, M., Imbens, G., Laitin, D., Madon, T., Nelson, L., Nosek, B.A., …, Simonsohn, U., & Van der Laan, M. (2014). Promoting transparency in social science research. Science, 343, 6166, 30-31. While the paper makes a strong argument for quality design it also notes the trouble with previse incentives:

Accompanying these changes, however, is a growing sense that the incentives, norms, and institutions under which social science operates undermine gains from improved research design. Commentators point to a dysfunctional reward structure in which statistically significant, novel, and theoretically tidy results are published more easily than null, replication, or perplexing results ( 3, 4). Social science journals do not mandate adherence to reporting standards or study registration, and few require data-sharing. In this context, researchers have incentives to analyze and present data to make them more “publishable,” even at the expense of accuracy. Researchers may select a subset of positive results from a larger study that overall shows mixed or null results (5) or present exploratory results as if they were tests of pre-specified analysis plans (6).

Then there are the outright frauds (see: http://en.wikipedia.org/wiki/Diederik_Stapel). For those who have not read this in other blogs I urge you to look at this New York Times interview. My favourite quote:

“People think of scientists as monks in a monastery looking out for the truth,” he said. “People have lost faith in the church, but they haven’t lost faith in science. My behavior shows that science is not holy.”… What the public didn’t realize, he said, was that academic science, too, was becoming a business. “There are scarce resources, you need grants, you need money, there is competition,” he said. “Normal people go to the edge to get that money. Science is of course about discovery, about digging to discover the truth. But it is also communication, persuasion, marketing. I am a salesman…

The virtuous steps

To address this issue of impartiality of research we need a collective approach. Universities that have a commitment to research must aim for quality over quantity and allow researchers the time to develop quality research designs that can be tested and examined over longer periods. Research committees must be multi-disciplinary to make sure that a holistic approach to research prevails.

We must have arm’s-length between funding and research. I don’t have an answer for how this would occur, but until it does universities will be dis-incentivised to conduct fully impartial work. Journals need to be established that provide an outlet for comprehensive research. This will see a removal of word limits in favour of comprehensive research designs that attempt to cover more for alternative hypothesis to be tested and dismissed. Systems thinking needs to become the norm and not the exception.

Finally, and most importantly, our personal and professional ethics must be paramount. We must contribute to the body of knowledge that is critiquing the discipline for the improvement of psychology. We must make sure that we are aware of any myth of impartiality in our work and make this explicit while trying to limits its effect on our work; whether it is as a researcher or applied. We must challenge the institutions (corporate and universities) we work for to raise the game, in incremental steps

In Part Two, I will take a critical look at my industry, psychometric testing and applied psychology, and how the myth of impartiality is prevalent. I will also discuss how this is then furthered by those who apply our findings within Human Resource departments.

The myth of significance testing

5 Replies

When I decided to leave work and go to University to study psychology I did so because of a genuine fascination with the study of human behaviour, thought, and emotion. Like many I was drawn to the discipline not by the allure of science but by the writings of Freud, Jung, Maslow, and Fromm. I believed at the time that the discipline was as much philosophy as it was science and had the romantic notion of sitting in the quad talking theory with my class mates.

Unfortunately from day one I was introduced not to the theory of psychology but the maths of psychology. This, I was told, was the heart of the discipline and supporting evidence came not from the strength of the theory but from the numbers. It did not matter that, as an 18 year old male I was supremely conscious of the power of libidos. Unless it could be demonstrated on a Likert scale it did not exist. The gold standard supporting evidence was significance testing.

I always struggled with the notion that the significance test (ST) was indeed as significant as my professors would have me believe. However it was not until I completed my post graduate diploma in applied statistics that the folly of ST truly came home to me. Here for the first time I was introduced to the concept of fishing for results and techniques such as the Bonferroni correction (http://en.wikipedia.org/wiki/Bonferroni_correction). Moreover I truly understood how paltry the findings in psychology were and to establish robustness of such findings through a significance test was somewhat oxymoronic.

In 2012 a seminal paper on this topic came out and I would encourage everyone who works in our field to be aware of it. This is indeed the myth for this month: the myth of significance testing:

Lambdin, C. (2012) Significance tests as sorcery: Science is empirical – significance tests are not. Theory and Psychology, 22, 1, 67-90.

Abstract

Since the 1930s, many of our top methodologists have argued that significance tests are not conducive to science. Bakan (1966) believed that “everyone knows this” and that we slavishly lean on the crutch of significance testing because, if we didn’t, much of psychology would simply fall apart. If he was right, then significance testing is tantamount to psychology’s “dirty little secret.” This paper will revisit and summarize the arguments of those who have been trying to tell us— for more than 70 years—that p values are not empirical. If these arguments are sound, then the continuing popularity of significance tests in our peer-reviewed journals is at best embarrassing and at worst intellectually dishonest.

The paper is a relatively easy read and the arguments are simple to understand:

“… Lykken (1968), who argues that many correlations in psychology have effect sizes so small that it is questionable whether they constitute actual relationships above the “ambient correlation noise” that is always present in the real world. Blinkhorn and Johnson (1990) persuasively argue, for instance, that a shift away from “culling tabular asterisks” in psychology would likely cause the entire field of personality testing to disappear altogether. Looking at a table of results and highlighting which ones are significant is, after all, akin to throwing quarters in the air and noting which ones land heads.” (ala fishing for results)

The impact of this paper for so much of the discipline cannot be over stated. In an attempt to have a level of credibility beyond its station psychological literature has bordered on the downright fraudulent in making sweeping claims from weak but significant results. The impact is that our discipline becomes the laughing stock of future generations who will see through the emperors clothes that are currently parading as science.

“ … The most unfortunate consequence of psychology’s obsession with NHST is nothing less than the sad state of our entire body of literature. Our morbid overreliance on significance testing has left in its wake a body of literature so rife with contradictions that peer-reviewed “findings” can quite easily be culled to back almost any position, no matter how absurd or fantastic. Such positions, which all taken together are contradictory, typically yield embarrassingly little predictive power, and fail to gel into any sort of cohesive picture of reality, are nevertheless separately propped up by their own individual lists of supportive references. All this is foolhardily done while blissfully ignoring the fact that the tallying of supportive references—a practice which Taleb (2007) calls “naïve empiricism”—is not actually scientific. It is the quality of the evidence and the validity and soundness of the arguments that matters, not how many authors are in agreement. Science is not a democracy.

It would be difficult to overstress this point. Card sharps can stack decks so that arranged sequences of cards appear randomly shuffled. Researchers can stack data so that random numbers seem to be convincing patterns of evidence, and often end up doing just that wholly without intention. The bitter irony of it all is that our peer-reviewed journals, our hallmark of what counts as scientific writing, are partly to blame. They do, after all, help keep the tyranny of NHST alive, and “[t]he end result is that our literature is comprised mainly of uncorroborated, one-shot studies whose value is questionable for academics and practitioners alike” (Hubbard & Armstrong, 2006, p. 115).” P. 82

Is there a solution to this madness? Using the psychometric testing industry as a case in point I believe the solution is multi-pronged. ST’s will continue to be part of our supporting literature as they are the requirement of the marketplace and without them test publishers will not be viewed credibly. However through education such as training for test users, this can be balanced so that the reality of ST’s can be better understood. This will include understanding the true variance that is accounted for in tests of correlation and therefore the true significance of the significance test will be understood! This will need to be equally matched with an understanding of the importance of theory building when testing a hypothesis and required alterations such as Bonferroni correction when conducted multiple tests with one set of data. Finally, in keeping with the theme in this series of blogs the key is to treat the discipline as a craft not a science. Building theory, applying results in unique and meaningful ways and being focussed on practical outcomes is more important and more reflective of sound practice then militant adherence to a significance test.

P.S. For those interested in understanding how to use statistics as a craft to formulate applied solutions I strongly recommend this book http://www.goodreads.com/book/show/226575.Statistics_As_Principled_Argument

P.P.S. This article just out http://www.theguardian.com/science/head-quarters/2014/jan/24/the-changing-face-of-psychology . Seems that there may be hope for the discipline yet.