Strategic 360s

Making feedback matter

Posts Tagged ‘ratings

No Fighting in The War Room!

leave a comment »

[tweetmeme source=”strategic360s”]

My apologies (or sympathies) to those of you who have not seen the black satire, “Dr. Strangelove: or How I Learned to Stop Worrying and Love the Bomb,” which contains the line, “No fighting in the War Room!”  I was reminded of this purposively humorous contradiction in reading an otherwise very insightful summary of the state of feedback tools by Josh Bersin that I hope you can access via LinkedIn here:

Mr. Bersin seems quite supportive of the “ditch the ratings” bandwagon that is rolling through the popular business literature, and his article is a relatively comprehensive survey of the emerging technologies that are supporting various versions of the largely qualitative feedback market.  But right in the middle he made my head spin in Kubrick-like fashion when he starts talking about the need for ways to “let employees rate their managers,” as if this a) is something new, and b) can be done without using ratings.  Instead of “No fighting in the War Room!”, there is “No rating in the evaluation system!”   I’m curious: Is an evaluation not a “rating” because it doesn’t have a number? Won’t someone attach a number to the evaluation? Either explicitly or implicitly? And wouldn’t it be better if there were some agreement as to what number is attached to that evaluation?

What I think is most useful in Bersin’s article is his categorization and differentiation of the types of feedback processes and tools that seem to be evolving in our field, using his labels:

  • Next Generation Pulse Survey and Management Feedback Tools
  • “Open Suggestion Box” and Anonymous Social Network Tools
  • Culture Assessment and Management Tools
  • Social Recognition Tools

I want to focus on Culture Assessment and Management Tools, in the context of this discussion of ratings and performance management, and, in doing so, referencing some points I have made in the past. If you look at Mr. Bersin’s “Simply Irresistible Organization” (in the article), it contains quite a few classic HR terms like “trust,”, “coaching”, transparency,” “support,” “humanistic,” “inspiration,” “empowered,” and so on, that he probably defines somewhere but nonetheless cry out for behavioral descriptors to tell us what we will see happening when they are being done well, if at all. Ultimately it is those behaviors and the support for those behaviors that defines the culture. Furthermore, we can observe and measure those behaviors, and then hold employees accountable for acting in ways consistent with the organization’s needs.

To quote from Booz & Co in 2013:

On the informal side, there must be tangible behaviors that demonstrate what the culture looks like, and they must be granular enough that all levels of the organization can exhibit the behaviors.”

“On the formal side — and where HR can help out — the performance management and rewards systems must reward people for displaying the right behaviors that exemplify the culture. Too often, changes to the culture are not reflected in the formal elements, such as the performance-management process. This results in a relapse to the old ways of working, and a culture that never truly evolves.

Of course, all that requires measurement, which requires ratings. Which, in turn, begs for 360 Feedback, if we agree that supervisory ratings by themselves are inadequate. My experience is that management demand ratings. My prediction is that unchecked qualitative feedback will also run its course and be rejected as serving little purpose in supporting either evaluation or development.

There may be a place for the kind of feedback that social networks provide that is open and basically uncontrolled in providing spontaneous recognition. But I totally disagree with Mr. Bersin who states that any feedback is better than no feedback.  I have and still do counsel against survey comment sections that are totally open and beg for “please whine here” types of comments that are often not constructive and not actionable.

Mr. Bersin brings up the concept of feedback as a “gift” that I recently addressed as going against the notion that feedback providers need to have accountability for their feedback and see it as an investment, not a gift, especially a thoughtless gift (

There is a very basic, important difference in how the field of feedback is trending, i.e., more quantity, less quality, too many white elephants. We need more 401Ks.

©2015 David W. Bracken

What I Learned at SIOP

with one comment

[tweetmeme source=”anotherangle360″]

The annual conference of the Society for Industrial/Organizational Psychology (SIOP) was held in Chicago April 14-16 with record attendance. I had something of a “360 Feedback-intensive” experience by running two half-day continuing education workshops (with Carol Jenkins) on 360 feedback, participating on a panel discussion of the evolution of 360 in the last 10 years (with other contributors to The Handbook of Multisource Feedback), and being the discussant for a symposium regarding Implicit Leadership Theories that largely focused on cultural factors in 360 processes. Each forum gave me an opportunity to gauge some current perspectives on this field, and here are a few that I will share.

The “debate” continues but seems to be softening. The “debate” is, of course, how 360 feedback should be used: development only and/or for decision making. In our CE Workshop, we actually had participants stand up and stand in corners of the room to indicate their stance on this issue, and, judging from that exercise, there are still many strong proponents of each side of that stance. That said, one of the conclusions the panel seemed to agree upon is that there is some blurring of the distinction between uses and some acknowledgement that 360’s are successfully being used for decision making, and that 360’s are far less likely to create sustainable behavior change without accountability that comes with integration with HR systems.

We need to be sensitive to the demands we place on our leaders/participants. During our panel discussion, Janine Waclawski (who is currently an HR generalist at Pepsi) reminded us of how we typically inundate 360 participants with many data points, beginning with the number of items multiplied by the number of rater groups. (I don’t believe the solution to this problem is reducing the number of items, especially below some arbitrary number like 20 items.)  Later, I had the opportunity to offer commentary on four terrific research papers that had a major theme of how supervisors need to be aware of the perspectives of their raters that may well be caused by their cultural backgrounds.

As someone who is more on the practitioner end of the practitioner-scientist continuum, I tried to once again put myself in the seat of the feedback recipient (where I have been many times) and consider how this research might be put into practice. On one hand, organizations are using leadership competency models and values statements to create a unified message (and culture?) that spans all segments of the company. We can (and should) have debates about how useful and realistic this practice is, but I think most of us agree that the company has a right to define the behaviors that are expected of successful leaders. 360 processes can be a powerful way to define those expectations in behavioral terms, to help leaders become aware of their perceived performance of those behaviors, to help them get better, and to hold leaders accountable for change.

On the other hand, the symposium papers seem to suggest that leader behaviors should be molded from “the bottom up,” i.e., by responding to the expectations of followers (raters) that may be attributed to their cultural backgrounds and their views of what an effective leader should be (which may differ from the leader’s view and/or the organization’s view of effective leadership).  By the way, this “bottoms up” approach applies also to the use of importance ratings (which is not a cultural question).

My plea to the panel (perhaps to their dismay) was to at least consider the conundrum of the feedback recipient who is being given this potentially incredibly complex task of not only digesting the basic data that Janine was referring to, but then to fold in the huge amount of information created by having to consider the needs of all the feedback providers. Their research is very interesting and useful in raising our awareness of cultural differences that can affect the effectiveness of our 360 processes. But PLEASE acknowledge the implications for putting all of this to use.

The “test” mentality is being challenged.  I used the panel discussion to offer up one of my current pet peeves, namely to challenge the treatment of 360 Feedback as a “test.”  Both in the workshops and again at the panel, I suggested that applying practices such as randomizing items and using reverse wording to “trick” the raters is not constructive and most likely is contrary to our need to help the raters provide reliable data. I was gratified to receive a smattering of applause when I made that point during the panel.  I am looking forward to hopefully discussing (debating) this stance with the Personnel Testing Council of Metropolitan Washington in a workshop I am doing in June, where I suspect some of the traditional testing people will speak their mind on this topic.

This year’s SIOP was well done, once again. I was especially glad to see an ongoing interest in the evolution of the field of 360 feedback judging from the attendance at these sessions, let alone the fact that the workshop committee identified 360 as a topic worthy of inclusion after going over 10 years since the last one.  360 Feedback is such a complex process, and we are still struggling with the most basic questions, including purpose and use.

©2011 David W. Bracken

Silly Survey Formats?

with one comment

[tweetmeme source=”anotherangle360″]

My recent webinar, “Make Your 360 Matter” led to a blog entry called “Snakes in Suits” that was primarily about 360 processes being true to their objectives. Dale Rose, a highly experienced consultant and good friend (and collaborator) was motivated to submit a comment, part of which included this thought:

This also raises one of the problems with using that silly survey format where you can list all the ratees together while answering the survey. If raters are comparing across people while rating, then they are not thinking closely about what is going on specific to that person because a bunch of their attention is focused on comparing them to someone else. What happens when the context changes and I’m rating them compared to two different people? At best, if ratees have a professional helping to interpret the data they may actually think about the implications and draw reasonable conclusions. At worst, the shift in context messes with the data so much that no one knows what the differences mean.

In communicating with Dale, I learned that he had been unable to listen in to the webinar, which had included a brief discussion of the rating format that he references. To bring everyone up to speed, the multiratee format we are addressing is (or can be) a spreadsheet with names of ratees on one axis and the competencies on the other. The cells are where ratings are entered; the version I shared had a drop down list of response alternatives (e.g., strongly agree to strongly disagree). The instructions have the rater work across the ratees, which encourages comparisons. Some users do not like the idea of comparisons, and that is one of a number of reasons (besides Dale’s) that it might not make sense to implement.

I have successfully used this format on a few occasions. One was with a group of anesthesiologists who wanted to give feedback to each other, and also get feedback from nurses they regularly worked with. This format worked very well since the ratees were all of relatively equal status, and there was a large number of them (19).  I have used it with other groups where raters have had to give multiple ratings.

Part of my original motivation for trying this format came from comments with raters who had to complete many forms. I remember one manager who told me he took his forms home, spread them out on his deck and tried to consider all of the ratees at the same time. Another manager told me how she had wanted to go back and redo some of her ratings when she got to the 8th or 10th and realized that her own internal calibration had changed as she completed the ratings. In other words, she was saying that she was a different person (rater) when she did the first one compared to when she had more experience and perspective in doing later questionnaires.

Another way that raters become “different” as they fill out forms is simple fatigue which undoubtedly affects both the quality and quantity (i.e., response rate) of feedback. This becomes an issue of fairness where, by luck of the draw, the ratees later in the queue are penalized in terms of the feedback they receive.

If (and I emphasize “if”) your process supports comparisons, this multiratee format seems to solve many problems. Some users have commented on the potential problem of having the list of ratees not being comparable in position, level, etc., and indeed there should be care to include ratees that have similar levels of responsibility.

Now let’s consider Dale’s view that this whole notion is “silly.”  Let me start by saying that Dale is very experienced, and his opinions carry a lot of weight with me and others. He and I have collaborated often and we agree more often than not, but not always. This topic is one where we don’t agree, and where this is no “right” answer but more a perspective on how to treat raters and what we can/should expect of them.

His main point seems to be that raters should be considering the context of the ratee when providing feedback (i.e., giving a rating).  This suggests that the rater should muse over the ratee’s situation (however that is defined) before making each evaluation. I would assume and hope that raters are explicitly instructed to consider this context factor so that there is some semblance of consistency in communicating our expectations for the role of rater. But then it promotes inconsistency by asking raters to consider a complex situational variable and probably apply it in unpredictable ways.

In contrast, I am an advocate of making the rater’s task as simple and straightforward as possible. In past blogs, I have positioned that thought as attempting to minimize the individual differences in raters that can create rater error (or inconsistency). Adding a “context” instruction can only make the ratings that much more complex to both give and interpret.

My position is that the “context” discussion should happen after the data is in, not during its collection. I absolutely believe (and it appears Dale agrees ) that 360 results need to be couched in the ratee’s situation, whether that is by the ratee’s manager and/or coach, and especially be any other users (e.g., human resources).

In the final tabulation, I believe that this “silly” rating format has many more benefits than problems. It can be an effective solution to the rater overload issue that some consultants try to solve by making instruments shorter and shorter at the expense of providing quality information to the ratee. It also solves some of the problems that occur when raters are asked to complete multiple ratings that penalize the ratees at the end of the queue.

I am quite sure that we will be hearing from Dale.

©2010 David W. Bracken

The Importance of Importance: The Role of the Rater

leave a comment »

[tweetmeme source=”anotherangle360″]

In my last blog, I surfaced the critical issue in 360 feedback processes as to ways to identify the behaviors/competencies that should have the highest priority (importance) for development planning. If there is one thing that there seems to be unanimity with in this field is that focusing the leader’s efforts on 2-3 things is a best practice. So how do we determine what those 2-3 things are when the leaders are so different across functions, levels, experience, ability (competence), potential, and so on?

If we start with the instrument content (items) as defining the universe of behaviors (putting aside write in comments for another day), then topics for discussion include 1) the need for different versions for different functions, 2) different versions for different positions (e.g., executive, supervisory, individual contributor), 3) different versions for different rater groups (e.g., direct reports vs. peers), and, finally, the length of the instrument.

I would like to put aside for the moment the question of multiple versions and pick it up later. The question of versions is not so much about importance as it is opportunity to perform (relevance to job) and opportunity to observe by raters.

Last time, I started the discussion of length of the instrument and raised some questions regarding guidelines in the One Page Talent Management book.  I posted the blog link in the OPTM LinkedIn discussion, and Marc Effron responded with his rejoinders. I think we have agreed to disagree, though I am apprehensive about disagreeing since he feels that people that disagree with him should lose their job: “…the talent management leader should be the one who says that (how many items there should be) and they should be promptly fired if their answer is anything above 25.”  (Note: Don’t bother to look for this thread since it has apparently been removed. I will leave it to others to surmise why.)

I would like to open up the Pandora’s Box of who is in the best position to indicate what the “most important” issues are for the participant: the raters or the manager (boss), or the coach?  It doesn’t necessarily have to be an “either/or” type of thing, but many 360 processes have a strong inclination one way or the other, including having the instrument designed to ask raters for the most important items. Some processes exclude the manager from seeing the feedback report, and I would take that as a strong signal about the perceived role of the boss. Some have reports go to the coach and not the manager.

For example, my impression from the OPTM chapter on 360 is that they definitely place heavy weight on the raters. For starters, their process has the raters identify the most important behaviors (and then offer comments and suggestions). Secondly, I don’t see any specific reference to the role of the manager and his/her potential role in using the feedback.

I attended a SIOP Workshop on Talent Management a few years ago that was led, in part, by Morgan McCall (if that name doesn’t ring a bell, start with the concept of “derailers”). In his commentary, he said that “the manager is the most important factor in an employee’s development.”  I totally agree with him. The manager should be in the best position to place the feedback in context (e.g., special circumstances affecting the ratings) and then apply it to the current and future needs of the ratee, the team and the organization.

As for raters providing importance ratings (or the equivalent), I think it is a bad idea in both 360’s and employee surveys. I have toyed with the metaphor of “putting the inmates in charge of the asylum,” with deserved trepidation (no, employees are not inmates and companies are not asylums), but, of course, to question the wisdom of putting decision making into the hands of the less informed and questionably motivated.  I would say it is quite clear that, in employee surveys, the issues that are labeled as “most important” are not the issues that drive engagement and behavior, such as turnover.

In the context of 360’s, we do acknowledge the views of raters through the ratings themselves.  (The OPTM suggestion of not providing Top/Bottom score lists is interesting. When there are only 15-25 items, maybe they are indeed less necessary than with longer instruments. Of course, they also have raters pick the most important items.)  I basically don’t think employees should dictate what the highest priorities are. Importance ratings are arguably more about the raters’ agendas than the ratees’. Plus they probably know less about the developmental plans of the ratee than the manager. Including importance ratings (or the equivalent) in a 360 sets explicit expectations that that is their role, i.e., to identify the highest priority development needs. Those expectations should not be encouraged.

As for the coach, I will address this in a blog called, “When coaches go too far.”  Let me just say that usually coaches are the least informed of all parties about the best development plans for the ratee, and usually are not around long enough to provide continuity.

Raters are not in as good a position to determine developmental priorities as the manager (boss). If the manager is not in the best position to interpret the feedback and to guide development priorities, than the system is broken. Managers need to be held accountable for the proper use of 360 feedback, for ensuring follow through by the ratee, and to assist in providing the developmental resources.  When 360’s don’t matter, this may be one of the reasons why.

©2010 David W. Bracken

Put your scale where your money is (or isn’t)

with 5 comments

[tweetmeme source=”anotherangle360″]

I had the opportunity to listen in on a web broadcast recently where Marc Effron was the featured speaker (as part of his “One Page Talent Management” approach, which I have not read as of yet), and the topic was advertised as, “Why 360’s Don’t Work (and What You Can Do About It.)”  I found it very interesting and thought provoking, as did others on the call judging from the questions that were submitted during the call.  In fact, I listened to it twice! Perhaps his most controversial position is his support for using 360 for decision making, which I support under the right conditions. I plan to write a couple blogs regarding a few of his positions regarding 360, and I hope Marc will comment if he feels I have misrepresented him and/or am incorrect in some way (which is definitely a possibility).

Marc proposed a number of approaches to 360 that he believes make the process more effective and efficient. One that caught my attention was his approach to the rating scale. In fact, it motivated me to submit a question during the session, and I will discuss his response as well.

Marc spoke of a “Do More/Less” scale that he uses that ranges from “Do Much More” at one end, to “Do Much Less” at the other end, and “Don’t Change” as the midpoint.  I have seen a presentation where it is a 5 point scale, but I could easily see a 7 point scale as well.

During the web cast, as he was describing this scale, I believe he said, “We don’t care how good or bad you are.”  In other words, he is proposing an “ipsative” approach to measurement (if I remember my graduate training correctly) where the focus is within-person ability (versus “normative” which is between-person comparison).  In this context, the ipsative scale acknowledges that we all have a stack ranking of abilities from best to worst, regardless of how well we perform in comparison to others. In a development focused process, this has great appeal in communicating that we all are better at some things than others, and we all have a “worst at” list even if we think we are still pretty good at those things relative to others.

It seems arguable as to whether all raters use the “More/Less” scale as an ipsative scale. I am assuming that Marc intends it to be used that way based on his “don’t care how good or bad you are” comment. It would be nice if the instructions to the rater reinforced that point (i.e., don’t compare this person to others), and maybe they do.  There are other ways to generate within-person rankings, such as paired comparisons and check lists, which seem more direct but probably have their own drawbacks (I have never used them, so I am no expert).

I see ipsative approaches to 360 rating scales as potentially being fantastic solutions for “development only” processes where users are forbidden from using the data for decision making (or so they say).  Many of us know of supposed “development only” programs where the data is being used for decision making in various ways, creating potential inconsistency, unfairness, and misuse. If these companies used an ipsative scale such as Marc’s, that would theoretically prevent them from using it for decision making since the data is totally within-person and inappropriate (or worse) to use for comparing performance results across employees.

The problem with Marc’s situation is that he IS using this scale for decision making. So that was my question to him, namely how can you use a nonevaluative (ipsative) scale to make comparisons (i.e., decisions) among employees?  His response was basically that, a) the “Do More” list is generally indicative of areas in need of development, and b) the 360 results are supplemented by other data . Point A seems to fly in the face of the “don’t care how good or bad you are” position. It would also seem to be inconsistent with the “develop your strengths” movement where people are encouraged to leverage their strengths (in a nutshell).  The second point is sound advice regarding not using 360 results in isolation, but doesn’t give me much faith in the rigor of his 360 data.

If we are going to use 360 results to make decisions about employees, that means that someone is going to get more of something (e.g., pay, promotions, development opportunities, training, high potential designation) and someone is going to get less based, at least in part, on the 360 data. That is what “decision making” means.

Marc speaks of transparency in use of 360 as a central premise to his approach. If that is the case, we can start by being totally “transparent” with our raters by telling them how their ratings are being used. If it is a totally “development only” process, use a within-person (ipsative) scale with explicit directions to not compare the ratee to others when assigning ratings.

If the 360 is supporting decision making, ask the rater to help us make comparisons by using what I call a “normative” scale. I have successfully used normative scales, and they usually look something like this:

5 = Role Model (in the top 5-10%)

4 = Above Average (in the top 20-25%)

3 = Comparable to Other Leaders

2 = Below Average  (in the bottom 20-25%)

1 = Far Below Average (in the bottom 5-10%)

The directions to the raters can help define the comparison group, or “other leaders.” But clearly we are creating a frame of reference for the rater that encourages something closer to a normal distribution and a direct attack on leniency error. I believe that the traditional leniency problem with 360 problems is at least partially attributable to the ambiguousness of common rating scales such as the Likert (Agree/Disagree) scale where the user (rater) is left to attach their own meaning to the scale points. Rater training can help combat rating errors, but, as I have noted before, is rarely implemented.

Want to be transparent and communicate the most important decision about your 360 process, i.e., its purpose? Put your scale where your money is (or isn’t): ipsative for development only, normative for decision making.

©2010 David W. Bracken

Written by David Bracken

September 2, 2010 at 1:49 pm