I Need an Exorcism
Being the 360 Feedback nerd I am, I love it when some new folks get active on the LinkedIn 360 discussion group. One discussion emerged recently that caught my eye, and I have been watching it with interest, mulling over the perspectives and knowing I had to get my two cents in at some point.
Here is the question:
How many raters are too many raters?
We normally recommend 20 as a soft limit. With too many, we find the feedback gets diluted and you have too many people that don’t work closely enough with you to provide good feedback. I’d be curious if there are any suggestions for exceptions.
This is an important decision amongst the dozens that need to be made in the course of designing and implementing 360 processes. The question motivated me to pull out The Handbook of Multisource Feedback and find the excellent chapter on this topic by James Farr and Daniel Newman (2001), which reminded me of the complexity of this decision. Let me also reiterate that this is another decision that has different implications for “N=1” 360 processes (i.e., feedback for a single leader on an ad hoc basis) versus “N>1” systems (i.e., feedback for a group of participants); this blog and discussion is focused on the latter.
Usually people argue that too many surveys will cause disruption in the organization and unnecessary “soft costs” (i.e., time). The author of this question poses a different argument for limiting the rater population, which he calls “dilution” due to inviting unknowledgeable raters. For me, one of the givens of any 360 system is that the raters must have sufficient experience with the ratee to give reliable feedback. One operationalization of that concept is to require that an employee must have worked with/for the ratee for some minimum amount of time (e.g., 6 months or even 1 year), even if he/she is a direct report. Having the ratee select the raters (with manager approval) is another practice that is designed to help get quality raters that then also facilitate the acceptance of the feedback by the ratee. So “dilution” due to unfamiliarity can be combated with that requirement, at least to some extent.
One respondent to this question offers this perspective:
The number of raters depends on the number of people that deal with this individual through important business interactions and can pass valuable feedback based on real experience. There is no one set answer.
I agree with that statement. Though, while there is no one set answer, some answers are better than others (see below).
In contrast, someone else states:
We have found effective to use minimum 3 and maximum 5 for any one rater category.
The minimum of 3 is standard practice these days as a “necessary but not sufficient” answer to the number of raters. As for the maximum of 5, this is also not uncommon but seems to ignore the science that supports larger numbers. When clients seek my advice on this question of number of raters, I am swayed by the research published by Greguras and Robie (1998) who collected and researched the question of the reliability of various rater sources (i.e., subordinates, peers and managers). They came to the conclusion that different rater groups provide differing levels of reliable feedback, probably because the number of “agendas” lurking within the various types of raters. The least reliable are the subordinates, followed by the peers, and then the managers, the most reliable rater group.
One way to address rater unreliability is to increase the size of the group (another might be rater training, for example). Usually there is only one manager and best practice is to invite all direct reports (who meet the tenure guidelines), so the main question is the number of peers. This research suggests that 7-9 is where we need to aim, noting also that that is the number of returns needed, so inviting more is probably a good idea if you expect less than a 100% response rate.
Another potential rater group is external customers. Recently I was invited to participate in a forum convened by the American Board of Internal Medicine (ABIM) to discuss the use of multisource feedback in physician recertification processes. ABIM is one of 24 member Boards of the American Board of Medical Specialties (ABMS), which has directed that some sort of multisource (or 360) feedback be integrated into recertification.
The participants in this forum included many knowledgable, interesting researchers on the use of 360 in the context of medicine (a whole new world for me, which was very energizing). I was invited to represent the industry (“outside) perspective. One of the presenters spoke to the challenge of collecting input from their customers (i.e., patients), a requirement for them. She offered up the number of 25 as the number of patients needed to create a reliable result, using very similar rationale as Greguras and Robie regarding the many individual agendas of raters.
Back to LinkedIn, there was then this opinion:
I agree that having too many raters in any one rater group does dilute the feedback and make it much harder to see subtleties. There is also a risk that too many raters may ‘drown out’ key feedback.
This is when my head started spinning like Linda Blair in The Exorcist. This perspective is SO contrary to my 25 years of experience in this field that I had to prevent myself from discounting it as my head continued to rotate. I have often said that a good day for me includes times when I have said, “Gee, I have never thought of (insert topic) in that way.” I really do like hearing new and different views, but it’s difficult when they challenge some foundational belief.
For me, maybe THE most central tenet of 360 Feedback is the reliance on rater anonymity in the expectation (or hope) that it will promote honesty. This goes back to the first book on 360 Feedback by Edwards and Ewen (1996) where 360’s were designed with this need for anonymity being in the forefront. That is why we use the artificial form of communication of using anonymous questionnaires and usually don’t report in groups of less than 3. We know that violations of the anonymity promise result in less honesty and reduced response rates, with the grapevine (and/or social media) spreading violated trust throughout the organization.
The notion that too many raters will “drown out key feedback” seems to me to be a total reversal of this philosophy of protecting anonymity. It also seems to place an incredible amount of emphasis on the report itself where the numbers become the sole source of insight. Other blog entries of mine have proposed that the report is just the conversation starter, and that true insight is achieved in the post-survey discussions with raters and manager.
I recall that in past articles (see Bracken, Timmreck, Fleenor and Summers, 2001) we made the point that every decision requires what should be a conscious value judgment as to who the most important “customer” is for that decision, whether it be the rater, ratee, or the organization. For example, limiting the number of raters to a small number (e.g., 5 per group or not all Direct Reports) indicates that the raters and organization are more important than the ratee, that is, that we believe it is more important to minimize the time required of raters than it is to provide reliable feedback for the ratee. In most cases, my values cause me to lobby on behalf of the ratee as the most important customer in design decisions. The time that I will rally to the defense of the rater as the most important customer in a decision is when anonymity (again, real or perceived) is threatened. And I see these arguments for creating more “insight” by keeping rater groups small or subdivided are misguided IF these practitioners share the common belief that anonymity is critical.
Finally (yes, it’s time to wrap this up), Larry Cipolla, an extremely experienced and respected practitioner in this field, offers some sage advice with some comments, including the folly of increasing rater group size by combining rater groups. As he says, that is pure folly. But I do take issue with one of his practices:
We recommend including all 10 raters (or whatever the n-count is) and have the participant create two groups–Direct Reports A and Direct Reports B.
This seems to me to be a variation on the theme of breaking out groups and reducing group size with the risk of creating suspicions and problems with perceived (or real) anonymity. Larry, you need to show that doing this kind of subdividing creates higher reliability in a statistical sense that can overcome the threats to reliability created by using smaller N’s.
Someone please stop my head from spinning. Do I just need to get over this fixation with anonymity in 360 processes?
Bracken, D.W., Timmreck, C.W., and Church, A.H. (2001). The Handbook of Multisource Feedback. San Francisco: Jossey-Bass.
Bracken, D.W., Timmreck, C.W., Fleenor, J.W., and Summers, L. (2001). 360 feedback from another angle. Human Resource Management, 1, 3-20.
Edwards, M. R., and Ewen, A.J. (1996). 360° Feedback: The powerful new model for employee assessment and performance improvement. New York: AMACOM.
Farr, J.L., and Newman, D.A. (2001). Rater selection: Sources of feedback. In Bracken, D.W., Timmreck, C.W., and Church, A.H. (eds.), The Handbook of Multisource Feedback. San Francisco: Jossey-Bass.
Greguras, G.J., and Robie, C. (1998). A new look at within-source interrater reliability of 360-degree feedback ratings. Journal of Applied Psychology, 83, 960-968.
©2012 David W. Bracken