Strategic 360s

Making feedback matter

Posts Tagged ‘rater honesty

I Need an Exorcism

leave a comment »

[tweetmeme source=”anotherangle360″]

Being the 360 Feedback nerd I am, I love it when some new folks get active on the LinkedIn 360 discussion group. One discussion emerged recently that caught my eye, and I have been watching it with interest, mulling over the perspectives and knowing I had to get my two cents in at some point.

Here is the question:

How many raters are too many raters?

We normally recommend 20 as a soft limit. With too many, we find the feedback gets diluted and you have too many people that don’t work closely enough with you to provide good feedback. I’d be curious if there are any suggestions for exceptions.

This is an important decision amongst the dozens that need to be made in the course of designing and implementing 360 processes. The question motivated me to pull out The Handbook of Multisource Feedback and find the excellent chapter on this topic by James Farr and Daniel Newman (2001), which reminded me of the complexity of this decision. Let me also reiterate that this is another decision that has different implications for “N=1” 360 processes (i.e., feedback for a single leader on an ad hoc basis) versus “N>1” systems (i.e., feedback for a group of participants); this blog and discussion is focused on the latter.

Usually people argue that too many surveys will cause disruption in the organization and unnecessary “soft costs” (i.e., time). The author of this question poses a different argument for limiting the rater population, which he calls “dilution” due to inviting unknowledgeable raters.  For me, one of the givens of any 360 system is that the raters must have sufficient experience with the ratee to give reliable feedback. One operationalization of that concept is to require that an employee must have worked with/for the ratee for some minimum amount of time (e.g., 6 months or even 1 year), even if he/she is a direct report. Having the ratee select the raters (with manager approval) is another practice that is designed to help get quality raters that then also facilitate the acceptance of the feedback by the ratee. So “dilution” due to unfamiliarity can be combated with that requirement, at least to some extent.

One respondent to this question offers this perspective:

The number of raters depends on the number of people that deal with this individual through important business interactions and can pass valuable feedback based on real experience. There is no one set answer.

I agree with that statement. Though, while there is no one set answer, some answers are better than others (see below).

In contrast, someone else states:

We have found effective to use minimum 3 and maximum 5 for any one rater category.

The minimum of 3 is standard practice these days as a “necessary but not sufficient” answer to the number of raters. As for the maximum of 5, this is also not uncommon but seems to ignore the science that supports larger numbers.  When clients seek my advice on this question of number of raters, I am swayed by the research published by Greguras and Robie (1998) who collected and researched the question of the reliability of various rater sources (i.e., subordinates, peers and managers). They came to the conclusion that different rater groups provide differing levels of reliable feedback, probably because the number of “agendas” lurking within the various types of raters. The least reliable are the subordinates, followed by the peers, and then the managers, the most reliable rater group.

One way to address rater unreliability is to increase the size of the group (another might be rater training, for example). Usually there is only one manager and best practice is to invite all direct reports (who meet the tenure guidelines), so the main question is the number of peers. This research suggests that 7-9 is where we need to aim, noting also that that is the number of returns needed, so inviting more is probably a good idea if you expect less than a 100% response rate.

Another potential rater group is external customers. Recently I was invited to participate in a forum convened by the American Board of Internal Medicine (ABIM) to discuss the use of multisource feedback in physician recertification processes. ABIM is one of 24 member Boards of the American Board of Medical Specialties (ABMS), which has directed that some sort of multisource (or 360) feedback be integrated into recertification.

The participants in this forum included many knowledgable, interesting researchers on the use of 360 in the context of medicine (a whole new world for me, which was very energizing). I was invited to represent the industry (“outside) perspective. One of the presenters spoke to the challenge of collecting input from their customers (i.e., patients), a requirement for them. She offered up the number of 25 as the number of patients needed to create a reliable result, using very similar rationale as Greguras and Robie regarding the many individual agendas of raters.

Back to LinkedIn, there was then this opinion:

I agree that having too many raters in any one rater group does dilute the feedback and make it much harder to see subtleties. There is also a risk that too many raters may ‘drown out’ key feedback.

This is when my head started spinning like Linda Blair in The Exorcist.  This perspective is SO contrary to my 25 years of experience in this field that I had to prevent myself from discounting it as my head continued to rotate.  I have often said that a good day for me includes times when I have said, “Gee, I have never thought of (insert topic) in that way.” I really do like hearing new and different views, but it’s difficult when they challenge some foundational belief.

For me, maybe THE most central tenet of 360 Feedback is the reliance on rater anonymity in the expectation (or hope) that it will promote honesty. This goes back to the first book on 360 Feedback by Edwards and Ewen (1996) where 360’s were designed with this need for anonymity being in the forefront. That is why we use the artificial form of communication of using anonymous questionnaires and usually don’t report in groups of less than 3. We know that violations of the anonymity promise result in less honesty and reduced response rates, with the grapevine (and/or social media) spreading violated trust throughout the organization.

The notion that too many raters will “drown out key feedback” seems to me to be a total reversal of this philosophy of protecting anonymity. It also seems to place an incredible amount of emphasis on the report itself where the numbers become the sole source of insight. Other blog entries of mine have proposed that the report is just the conversation starter, and that true insight is achieved in the post-survey discussions with raters and manager.

I recall that in past articles (see Bracken, Timmreck, Fleenor and Summers, 2001) we made the point that every decision requires what should be a conscious value judgment as to who the most important “customer” is for that decision, whether it be the rater, ratee, or the organization. For example, limiting the number of raters to a small number (e.g., 5 per group or not all Direct Reports) indicates that the raters and organization are more important than the ratee, that is, that we believe it is more important to minimize the time required of raters than it is to provide reliable feedback for the ratee. In most cases, my values cause me to lobby on behalf of the ratee as the most important customer in design decisions.  The time that I will rally to the defense of the rater as the most important customer in a decision is when anonymity (again, real or perceived) is threatened. And I see these arguments for creating more “insight” by keeping rater groups small or subdivided are misguided IF these practitioners share the common belief that anonymity is critical.

Finally (yes, it’s time to wrap this up), Larry Cipolla, an extremely experienced and respected practitioner in this field, offers some sage advice with some comments, including the folly of increasing rater group size by combining rater groups. As he says, that is pure folly. But I do take issue with one of his practices:

We recommend including all 10 raters (or whatever the n-count is) and have the participant create two groups–Direct Reports A and Direct Reports B.

This seems to me to be a variation on the theme of breaking out groups and reducing group size with the risk of creating suspicions and problems with perceived (or real) anonymity. Larry, you need to show that doing this kind of subdividing creates higher reliability in a statistical sense that can overcome the threats to reliability created by using smaller N’s.

Someone please stop my head from spinning. Do I just need to get over this fixation with anonymity in 360 processes?

References

Bracken, D.W., Timmreck, C.W., and Church, A.H. (2001). The Handbook of Multisource Feedback. San Francisco: Jossey-Bass.

Bracken, D.W., Timmreck, C.W., Fleenor, J.W., and Summers, L. (2001). 360 feedback from another angle. Human Resource Management, 1, 3-20.

Edwards, M. R., and Ewen, A.J.  (1996). 360° Feedback: The powerful new model for employee assessment and performance improvement. New York: AMACOM.

Farr, J.L., and Newman, D.A. (2001). Rater selection: Sources of feedback. In Bracken, D.W., Timmreck, C.W., and Church, A.H. (eds.), The Handbook of Multisource Feedback. San Francisco: Jossey-Bass.

Greguras, G.J., and Robie, C. (1998).  A new look at within-source interrater reliability of 360-degree feedback ratings. Journal of Applied Psychology, 83, 960-968.

©2012 David W. Bracken

The Death Card

leave a comment »

[tweetmeme source=”anotherangle360″]

A number of (pre-recession) years ago, I belonged to a firm that was operating in the black and held some very nice off-site meetings for its consultants. At one such event, we had an evening reception that had some fun activities, one of which being a Tarot reader. I don’t even read horoscopes but there was no one waiting and I decided to give it a try (the first and last time).  I obviously didn’t know much about Tarot but it seemed like the last card to be turned over was the most important. And, lo and behold, it was the Death card! I remember a pause from the Reader (perhaps an intake of breath?), and then a rapid clearing of the cards with some comment to the effect of, “That’s not important.”  Session over.

Well, I guess the good news is that I am still here (most people would agree with that I think).  My purpose for bringing this up is not to discuss superstitions and the occult, but to reflect on how people react to and use 360 feedback.

In fact, I have been known to call some 360 processes “parlor games, “which relates directly to my Tarot experience. That was a true “parlor game.”  What is a parlor game? My definition, for this context, is an activity that is fun and has no consequences, where a person can be the focus of attention with low risk of embarrassment and effort.  Since I strongly believe in self determination, I do my best to not let arbitrary events that I cannot control to affect my life. That would include a turn of a card, for starters.

So how do we ensure that 360 Feedback isn’t a parlor game and does matter? I propose that two important factors are Acceptance and Accountability.

Some of the design factors that promote Acceptance would include:

  • Use a custom instrument (to create relevance)
  • Have the rater select raters, with manager approval (to enhance credibility of feedback)
  • Enhance rater honesty and reliability (to help credibility of data)
  • Invite enough raters to enhance reliability and minimize effects of outliers
  • Be totally transparent to purpose, goals, and use (not mystical, magic, inconsistent or arbitrary)

Factors that can help create Accountability (and increase the probability of behavior change) include:

  • Require leaders to discuss results and development plans with raters (like going public with a New Year’s Resolution)
  • Include results as a component of performance management, typically in the development planning section, to create consequences for follow through, or lack thereof
  • Ensure that the leader’s manager is also held accountable for properly using results in managing and coaching
  • Conduct follow-up measures such as mini-360’s and/or annual readministrations.

Some 360 processes appear to define success as just creating awareness in the participants, hoping that the leader will be self motivated to change. That does happen; some leaders do change, at least for a while, and maybe even in the right way. (Some people probably change based on Tarot readings too!).  For those leaders who need to change the most, it usually doesn’t happen without Acceptance and Accountability.

Simply giving a feedback report to a leader and stopping there seems like a parlor game to me. A very expensive one.

©2011 David W. Bracken

Maybe Purpose Doesn’t Matter?

leave a comment »

[tweetmeme source=”anotherangle360″]

While there are many discussions and debates within the 360 Feedback community (including one regarding randomizing items currently on LinkedIn that I will address in a later blog), probably none is more intense and enduring than the issue of the proper use of 360 results. In The Handbook of MultiSource Feedback, a whole chapter (by Manny London) was dedicated to “The Great Debate” regarding using 360 for developmental vs. decision making purposes. In fact, in the late 90’s an entire book was published by the Center for Creative Leadership based on a debate I organized at SIOP.

I have argued in earlier blogs and other forums that I believe this “either/or” choice is a false one for many reasons. For example, even “development only” uses require decisions that affect personal and organizational outcomes and resources. Also, even when used for decision (including succession planning, staffing, promotions, and, yes, performance management), there is always a development component.

One of the aggravating blanket statements that is used by the “development only” crowd is that respondents will not be honest if they believe that the results will be used to make decisions that might be detrimental to the ratee, resulting in inflated scores with less variability. I would say that, in fact, that is by far the most common argument for the “development only” proponents, and one that is indeed supported by some research studies.

I have just become aware of an article published 3 years ago in the Journal of Business and Psychology (JBP) relating to multisource feedback, titled “Factors Influencing Employee Intentions to Provide Honest Upward Feedback Ratings” (Smith and Fortunato, 2008).  For those of you who are not familiar with JBP, it is a refereed journal of high quality that should be on your radar and, in full disclosure, a journal for which I am an occasional reviewer.

The study was conducted at a behavioral health center with a final sample of 203 respondents. The employees filled out a questionnaire about various aspects of an upward feedback process that was being implemented in the future.

The article is fairly technical and targeted toward the industrial/organizational community. I have pulled out one figure for the geeks in the audience to consume if desired (click on “360 Figure”) . But let me summarize the findings of the study.

The outcome (dependent variable) that was of primary interest to the researchers is foreshadowed in the title, i.e., what factors lead to intentions to respond honestly in ratings of a supervisor (upward feedback).  The most surprising result (as highlighted in the discussion by the authors) was that purpose (administrative versus developmental) had no predictive value at all! Of all the predictor variables measured, it was the least influential with no practical (statistical) significance.

What does predict intentions to provide honest feedback? One major predictor is the level of cynacism, with (as you might guess) cynical attitudes resulting in less honesty. The study suggests that cynical employees fear retaliation by supervisors and are less likely to believe that the stated purpose will be followed. The authors suggest that support and visible participation by senior leaders might help reduce these negative attitudes. We also need to continue to protect both real and perceived confidentiality, and to have processes to identify cases of retaliation and hold the offending parties accountable.

The other major factor is what I would label as rater self confidence in their ability as a feedback provider. Raters need to feel that their input is appropriate and valued, and that they know how the process will work. They also have a need to feel that they have sufficient opportunity to observe.  The authors appropriately point to the usefulness of rater training to help accomplish these outcomes. They do not mention the rater selection process as being an important determinant of opportunity to observe, but that is obviously a major factor in ensuring that the best raters are chosen.

One suggestion the authors make (seemingly out of context) that is purported to help improve the honesty of the feedback is to use reverse-worded items to keep raters from choosing only socially desirable responses (e.g., Strongly Agree).  I totally disagree with practices such as reverse wording and randomization which may actually reduce the reliability of the instrument (unless the purpose is for research only). For example, at our SIOP Workshop, Carol Jenkins and I will be showing an actual 360 report that uses both of those methods (reverse wording and randomization). In this report (that Carol had to try to interpret for a client), the manager (“boss”) of the ratee had give the same response (Agree) to two versions of the same item where one was reverse scored. In other words, the Manager was Agreeing that the ratee was both doing and not doing the same thing.

Now what? The authors of this study seem to suggest that situations like this would invalidate the input of this manager, arguably the most important rater of all.  Now we could just contact the manager and try to clarify his/her input. But the only reason we know of this situation is that the manager is not anonymous (and they know that going into the rating process). If this same problem of rating inconsistency occurs with other rater groups, it is almost impossible to rectify since the raters are anonymous and confidential (hopefully).

This is only one study, though a well designed and analyzed study in a respected journal. I will not say that this study proves that purpose does not have an effect on honesty. Nor should anyone say that other studies prove that purpose does affect honesty. To be clear, I have always said that it may be appropriate to use 360 results in decision making under the right conditions, conditions that are admittedly often difficult to achieve. This is in contrast to some practitioners who contend that it is never appropriate to do so, under any conditions.

Someday when I address the subject of organizational readiness, I will recall the survey used in this research which was administered in anticipation of implementing an upward feedback process. This brief (31 item) survey used for this study would be a great tool to assess readiness in all 360 systems.

One contribution of this research is to point out that intention to be honest is as much a characteristic of the process as it is of the person. Honesty is a changeable behavior in this context through training, communication, and practice. Making blanket statements about rater behavior and how a 360 program should or shouldn’t be used are not productive.

360 Figure

©2011 David W. Bracken