This is not a test
There is a 360 Feedback LinkedIn group that I recently discovered (thank you, Dale Rose), so I missed a number of good discussions. I had been planning on writing on the topic of treating 360’s as “tests” anyways, so one particular string caught my eye, having to do with rater training. The person who started the discussion offered this point of clarification (in italics):
I am suggesting that people who attend training sessions as a precursor to participating in a 360 process are often trained in the very topics they are about to be asked to self report – i.e., they are “primed” with new knowledge of psychological phenomena before being tested. This would not be allowed in an experimental setting – the data would be obviously tainted (biased) so I am wondering why this practice persists.
Treating 360’s as “tests” is a mentality within the I/O Psychology community that may serve some purpose by allowing I/O’s to use their “expert “ status, but it is totally inappropriate. Let’s start with the notion of what “tests” are and are not. A test is most commonly viewed as a means to discover something about the test taker, whether that is knowledge, skills, personality, background, and so on. In 360 feedback processes, raters generate the data but it is data about another person, not about themselves (except for self ratings, which is a phenomenon unto itself that we will set aside for another day). In fact, we go to great lengths (but often not far enough) to try to take the rater’s characteristics out of the equation, other than to know their relationship to the ratee (e.g., direct report, manager, peer, customer, etc.). We also go to great lengths to preserve anonymity in the hope of fostering an environment of trust and honesty, leading to the use of questionnaires as the primary method of input by the raters. (Of course, there are other methods of collecting 360 feedback such as interviews, but we will not talk about that today either.)
Tests are not typically considered a communication method, though some tests (such as simulations) may help communicate some of the requirements of the position to the test taker. In contrast, 360 instruments are primarily a communication device. They are designed to allow the rater to anonymously communicate their perceptions of the ratee’s behavior. The instrument should also communicate to the rater (including the self rater) what the organization defines as effective leader behavior.
360 Feedback is most akin to assessment methods such as assessment centers and performance appraisals where the data are also generated by an observer, not by the target person (as in a traditional testing setting). In performance appraisals and especially assessment centers, we often expend considerable effort to standardize the process through instructions and training with the explicit goal of minimizing individual difference effects in the raters.
I have said before, and will say again, that rater training is perhaps the most neglected design feature that has been shown to have a major benefit to creating reliable feedback. This can be, but does not have to be, done in workshops. More commonly, some slides are presented to the rater the first time he/she accesses the questionnaire that cover topics such as purpose, their role (accountability) as a rater, how to complete the questionnaire, common rater errors (halo, leniency, etc.), and even sample rating exercises.
Since 360 is not a “test” in any sense (other than having multiple choice items), we have to free our minds and treat it as what it is, i.e., a communication device. That means we have to help users, not trick them. What do I mean, “trick them?” Carol Jenkins (who is doing the SIOP Pre-Conference Workshop on 360 with me) was telling me of a standard 360 instrument she had recently seen where the items are randomly ordered. This is the classic example of a professional using a testing mentality to design an instrument.
What we need to be doing is to do everything we can to help the rater understand our intent in writing an item and including it in the survey. I have this fantasy that some muse could whisper in the ear of a rater, something like this:
“OK, Mary, the next X questions are about Teamwork. Our senior leadership believes that behaving like a team is going to be essential in achieving our vision and strategy, and our leaders need to act that way and support others that do as well. That’s why these items are on this survey and labeled as Teamwork. Now, this first item is, ‘Shares information with other teams and groups that need it.’ It may sound like a communication item, but we don’t care at this point how this manager does it. What you should be thinking about when you answer it is how effective this leader is doing it in the spirit of demonstrating teamwork by supporting other parts of the company.”
No matter how hard we try, it is just a fact that the nuances of written language are going to make 360 items susceptible to different interpretations/meanings. This is often compounded by our need to keep the instrument of reasonable length, limiting our ability to get very specific about behavior. Let me give an example.
I was consulting and coaching with a client that had an existing item, “Listens effectively.” When we think about that item, it has a least two major constructs: 1) the physical act of listening, and 2) acknowledging the speaker’s viewpoint. We can help the rater decide between those two uses by putting the item in a dimension that describes the construct, perhaps 1) Effective Communication, or 2) something relating to openness to alternative positions (e.g., flexibility, innovation, diversity). The client in question put it in a communication dimension, but even so there are many behaviors involved in being an effective listener.
I had one coachee who received a surprisingly low score on this item, and the write in comments didn’t do much to help. He and I could have come up with all sorts of possible solutions, books to read, behaviors to try, courses to take, and so on. Fortunately, the organization strongly encouraged the participants to discuss their results with their raters, and one of the tasks of the coach was to prepare the ratee for that experience. When he had his review session with the raters, he discovered that the real issue was that he was constantly using his PDA (Blackberry), and the team felt that he was constantly distracted. That was an “easy” fix once his awareness was raised, and his scores on that item reflected that in subsequent administrations. Unfortunately, many organizations actively discourage those kinds of reviews, and in doing so encourage misaligned behavior change (or what we might call superstitious behavior).
Similar to performance appraisals and even employee surveys, the instrument (questionnaire) needs to be a reliable measure of relevant (i.e., valid) content. Perhaps the best way to ensure relevance (content validity) is to have a direct line of sight from organizational priorities and the behavioral items. This logic would point to a custom instrument that is constructed around a competency (leadership) model that mirrors the requirements for successful execution of a strategy/vision. Another version of organizational priorities is a values statement.
If users are so worried about validity, the starting point should be the content. An off-the-shelf instrument would seem to fall woefully short on that criterion. Of course, I am not a fan of validity generalization either, and my dissertation was about the effect of organizational culture on the predictive validity of a selection test. More importantly, as a communication tool, a 360 (like an engagement survey) can be a great way to reinforce the existing culture of the organization, and/or to signal a shift in the culture.
The advantage of an off-the-shelf instrument is that the content has been tested for reliability. That is one of many reasons that a custom instrument needs to be pretested (as in a pilot, for example) and analyzed for “bad” items using early administrations.
The second major threat to validity is the implementation process. As I noted in an earlier blog (“I am going to see my lawyer”), we would all be better off if as much (or more) attention was given to implementation decisions and execution as is given to the instrument. I also noted earlier that it is my opinion that legal challenges are much more likely to come from inconsistencies in administration and use of 360 results that can easily result in real and perceived unfairness.
360 feedback processes are more similar to performance appraisals and employee surveys than they are to “tests” as we traditionally think of them. In that regard, a 360 is more of a performance measure than a predictor. Our concepts of reliability and validity need to reflect that distinction.
©2010 David W. Bracken