I am going to see my lawyer
John Golden has been very good at starting interesting discussions/debates on LinkedIn. One of his recent topics is: 360 degree assessments – are you aware of any lawsuits in the past against an organization for using them for personnel decisions (including using them for promotions, compensation, etc.)? I have been searching for an answer to that question for a long time as well, and, so far, no one has been able to come up with a citation in the LinkedIn exchange.
Since no one has been able to adequately answer John’s question (including me), the discussion quickly evolved into other topics. The first of which is the seemingly eternal debate over the proper use of 360 data, usually described as “development only vs. appraisal (or decision making).” I am going to put that topic “on hold” for my next blog, “There is no such thing as ‘development only.’”
Another major theme that emerged was that of the “validity” of 360 feedback, and that is what I would like to focus on here. Let me note here that I have considerable experience with the “I” side of I/O Psychology, with responsibility for selection systems at two major companies. I also have considerable experience in the design and delivery of assessment centers. I consider 360 feedback as being a form of assessment, and the notion of “validity” (and associated measurement issues) is always on my mind. (As always, that doesn’t mean I am always right.)
So what is “validity?” If we go back a ways, we see definitions like this:
“… a measuring instrument is valid if it does what it is intended to do.” (Nunnally, 1978)
This traditional emphasis on the “instrument” has been a central premise in I/O Psychology and part of the “I” tradition in assessment. It is only natural that practitioners with that background would be drawn to the instrument as the means to define and determine the validity of data used to make decisions.
When we consider the universe of assessments that are used in the realm of talent management, 360 Feedback is much more similar to assessment centers than other assessments that use instruments that are commonly validated (e.g., skill inventories, personality tests, biodata inventories). One of the major common elements of assessment centers and 360 processes is that the data is not generated by the focal person but by others. This creates a totally different requirement for establishing validity.
There are also significant differences between 360’s and assessment centers. The core methodology in assessment centers is the simulation where an “instrument” is not used in the same sense as a 360. (Assessment centers may use instruments to supplement the simulation data.) So 360’s require the symbiotic effectiveness of an instrument (questionnaire) and the user (rater).
As for raters, in assessment centers have the advantages of trained, committed raters (often psychologists) in a controlled setting. 360’s, unfortunately, rarely have many, if any, of those features. This is a major opportunity area for 360’s to improve the capabilities of raters.
Some of us would, therefore, contend that a definition of “validity” in the context of 360’s has to go far beyond the focus on the instrument and consider a systems view. In Bracken, Timmreck, Fleenor and Summers (2001), we propose this definition:
Sustained, observed improvement in behaviors valued by the organization.
I also plan to have a blog on, “Is your 360 valid?” at a later date.
Given that background, I have taken the liberty to excise a few quotes (in italics) from Golden’s LinkedIn discussion, and will offer some observations. (I will leave them unattributed, but they are not anonymous in LinkedIn if you are that interested in the source.)
“I have seen very few organizations that have actually spent the time and money to do a thorough job analysis and then derive the competencies that are aligned with performance in their organization.”
It is difficult to say how prevalent this type of alignment (or lack thereof) there is, but I see many organizations building leadership competency models that serve as the basis for their 360 content.
When I see “job analysis” referenced, it suggests a mental model of what a competency model might be and/or what a 360 can measure. My first experience with 360 feedback was back in 1986 with BellSouth, who was trying to establish its own identity (read “culture”) that was different from AT&T (Ma Bell). We developed a 360 instrument built around the Values of this new company, a process that was fascinating in seeing managers try to assign behaviors to these new Values that were a break from the past. The top 1500 managers of the company were exposed to the 360 as part of a Leadership Institute (including me, who was something like 1499).
Are behaviors tied to values the same as “competencies?” Probably not in traditional thinking. But they can be an extremely effective way to define the “how” (vs. “what”) of performance.
… having an instrument that gives you the same results over and over is not the same as measuring what you are supposed to measure — that is, skills shown to have a positive impact on performance in that particular organization and culture. The best that you might be able to do is content validation. It seems getting the sample sizes needed to do criterion related validity and then cross validating might be a really daunting task — not to mention, expensive.
If a content validation is “the best you can do,” then that may indeed be the best! This comment implies that criterion related validity is necessary. I do not, and I am not alone.
Part of the issue is the word “skills.” I would rather use the term “behaviors” for a number of reasons. One is that many skills are better assessed through means other than 360’s. Take the extreme example (to hopefully make the point) of programming skills. There are many ways to measure those skills that are more accurate than asking coworkers. We should also be asking 360 raters to report on what they see (i.e., behaviors), not what they think a person knows, believes, understands, should do, might do, or anything else that is out of “line of sight” or projective.
I believe that a more important point is that often content validity is the proper strategy and criterion-related validity is not. This is particularly true when a culture change is desired, and the 360 is used to define that new culture. By definition, “change” in this context probably means that too few (if any) of the current leaders are demonstrating the desired behaviors (or skills, if you must), so there is an insufficient number of high performers to do a criterion study.
The leadership of an organization has the right to define the kind of employees/leaders they need to have to accomplish their strategy. They may say, for example, that teamwork will be required to be successful (across boundaries, perhaps), and, in order to be successful, an employee will need to behave that way. And it will be measured (at least in part) by behaviors that are defined as demonstrating teamwork.
I sometimes use the example of ethics or integrity as values that are so inherently important to many organizations (maybe even more so now) that they find their way into 360’s with little question. Now tell me that we have to do a criterion study to prove that ethical leaders are more successful? Of course not. It is a content validation that says that leaders that are ethical ARE successful, and those who are not ethical ARE NOT successful. That same rationale can be applied to most any behavior, i.e., if you behave that way, you are successful; if you do not, you should move on.
“Our 3-year research of managers shows absolutely no correlation between scores provided by multi-rater feedback instruments and that same individual’s performance evaluation. Does this mean that there is a validity issue with 360-degree feedback? No 360 is inherently valid or invalid, although statistically and psychometrically we can certainly prove both validity and reliability of these instruments.”
The main points this person is making (I think) is that 1) performance appraisals include factors that are not included in a 360 (e.g., achieving objectives, quantitative performance), and 2) the same point I made earlier that a reliable instrument is only part of the validity equation since we rely on raters to generate the data. I will add that, in this type of research, the other (criterion?) variable (in this case performance evaluations) are less reliable that the 360 data, which reduces the validity coefficient.
“… the primary issue is ensuring that the competencies and key actions being evaluated are based on a job relatedness analysis. If the 360 is being used for anything other than development, the job relatedness analysis should be robust and the competencies longitudinally validated.”
I think I have covered much of this. My additional comment/question is, why wouldn’t we want valid data for development? Why would we develop irrelevant skills on misguided data?
“If one wishes to use 360s for anything other than developmental purposes e.g., promotions, bonuses, they must be related to job performance. This is easier said than done. Defining “job performance” properly is tricky especially as one moves higher up an organization. While most job analyses are sufficient for those jobs lower in the organizational hierarchy (where tasks can be clearly defined), this gets fuzzier as you move up the hierarchy… However, if one wishes to use a 360 assessment for these purposes, it does need to be validated from multiple angles – content + criterion.”
This comment just came in overnight, and it is a good synopsis of the traditional view of validity, which you already know I disagree with. Of course measurement needs to be related to job performance. If we use behaviors based on organizational values as an indication of performance, then they apply no matter what level the leader is at. If we use a more traditional competency model, there are many examples of models that do differentiate levels.
One thing I haven’t said up to this point is that I do not believe that 360’s should be scored and used as a sole criterion for any decision. One other LinkedIn commenter made that point, that it needs to be one piece of information. Neither are 360’s designed to cover all aspects of performance, including only factors that are behavioral, observable, and developable, with the underlying need to, of course, be relevant.
As I noted earlier, I do not believe that criterion validity is necessary. One factor to consider if/when you conduct a criterion-related study is the need to hold constant ALL design elements for all users and for all longitudinal studies, and that list of design elements is very long (e.g., number of raters, rater selection, manager approval, rater training, communications, etc.). Assessment centers have long been supported primarily on content validity, and writings and personal communications from Bill Byham to me have acknowledged the right of organizations to define the leadership requirements used in assessment centers and 360’s as sufficient to be legally defensible.
IF I were ever asked to advise a complainant against a 360 process, I would first look at inconsistencies in practice and usage, not the content of the instrument. It is MUCH more likely that unfairness will have been inflicted by the administration and usage phases, and/or insufficiently reliable number of raters, than by any inherent shortcomings in the instrument (though we would get around to that as well).
In closing: I recently had the occasion to talk about the systems view of 360 feedback and many of the points relating to validity that I have shared here. I made the point in wrapping up that even the most reliable, well designed instrument would be rendered invalid if the raters were not reliable. One participant raised her hand and said, “I don’t care what you say. I would send the questionnaire to my lawyer.” Oh well.
Bracken, D.W., Timmreck, C.W., Fleenor, J.W., & Summers, L. (2001b). 360 degree feedback from another angle. Human Resource Management, 40 (1), 3-20.
Nunnally, J. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill.
©2010 David W. Bracken