Strategic 360s

Making feedback matter

Posts Tagged ‘rating errors

What is a “5” ?

leave a comment »

[tweetmeme source=”anotherangle360″]

The Sunday New York Times Business section typically has a feature called “Corner Office” where a CEO is interviewed. These CEO’s seem to often be from small businesses. The one today (October 16, 2011), for example, is the CEO of a 25-person, $6 million consulting firm. The questions are often the same, having to do with the process of becoming a leader, lessons learned, hiring and promotion strategies. I have referenced these in a couple earlier blogs since they touch on topics relative to behavior change, leadership development and culture change that are relevant to aspects of 360 processes.

In today’s column, the CEO was asked about how he hires. He notes that part of the interview often includes asking the applicant to rate him/herself on a five point scale regarding various areas of knowledge and experience from their resume. If the applicant rates themselves as a “5,” he then asks if there is nothing else that they could learn about whatever it is. Of course, they say, “Oh, no, no. There is.” To which the CEO asks, “then why did you rate yourself a five?”  And he goes on to say he has never hired someone who rates themself as a five.

While this CEO’s decision not to hire these high self raters may seem arbitrary, think of the culture he is trying to create by the people he selects and the message this communicates to new hires during the process. He says that he appreciates humbleness and their understanding that they don’t know everything.  (In an earlier column, the CEO asks applicants if they are “nice,” and then explains what “nice” means in their culture, i.e., a good team player.)

(Someone told me of a senior executive who greeted applicants in the lobby and asked them whether they should take the stairs or elevator. If they said elevator, he didn’t hire them. That seems less job related, but is our “5” CEO doing a similar thing? Food for thought.)

We don’t use 360’s to hire people (though I have heard of multisource reference checks  from  past coworkers that are being positioned as 360’s), but we do have an opportunity with 360’s to create or support a culture when they involve many people. But we also know that 360’s are notorious for having severe leniency, i.e., mostly 4’s and 5’s on a five point scale.

Do all these 5’s that we collect mean that our leaders can’t get any better at what they do? Of course not. But that seems to be the message that we allow and even reward (even if not tangibly).

The vast majority of 360 processes use an Agree/Disagree (Likert) scale where “Strongly Agree” is scored as a “5” (scales that score it as a “1” seem confusing and counterintuitive to me).  The VAST majority of processes also do not include rater training that could be used to help raters (and ratees for that matter) to stop attaching any meaning to “Strongly Agree” that they wish. Which they currently do.

I have used a rating scale where “5” is defined as “role model, in the top 5-10%” that attempts to create a frame of reference for raters (and ratees) that does help reduce leniency effects.

What if we defined “5” as “can’t get any better” or something equivalent to that. I think “role model” implies that this person can be a teacher as well as example to others, and perhaps doesn’t need to get better (i.e., focus on other areas of improvement).  Some raters will undoubtedly ignore those directions, but rater training can help drill in the need for everyone to reconfigure their conceptualization of what optimal behavior is and, by the way, foster the learning and development culture that our CEO seems to be nurturing.

A recalibration of rating scales is badly needed in this field. We need to stop raters from giving all “5’s” and from ratees giving self ratings of all “5’s”.  With our current mentality on rating scales, there is really nothing to stop rating inflation. It should be no surprise that senior leaders find it to be difficult to use and support our 360 programs.

©2011 David W. Bracken

I Don’t Care

leave a comment »

[tweetmeme source=”anotherangle360″]

Last week I led a workshop for the Personnel Testing Council of Metropolitan Washington that was a modified reprise of the workshop Carol Jenkins and I did at the Society for Industrial and Organizational Psychology in April. I really enjoy these workshops and the opportunity to interact face-to-face with practitioners in the field of 360 degree feedback.

I do wish that participants in these workshops would engage me in a little more debate, and, to that end, I sometimes throw out comments in the hope of raising some hackles. For example, at the PTCMW session, I twice said “I don’t care” regarding two topics that I will explain below. Unfortunately, no one took the bait in the workshop, but maybe I can lure some of you into the discussion using this blog as a vehicle.

So here are the two areas where a ton of research is being done but where, as a practitioner, I don’t care:

1)      The personality of the participant. I don’t care. Everyone seems to want to know how the personality of the participant is going to affect his/her reaction to the feedback.  In past blogs, I have fessed up to being a behaviorist, and in that respect all I really “care” about is getting the person to accept the feedback and to change, whether they want to or not. In my last blog, I used the examples of people’s apparent reluctance to do simple things like apologize for mistake and/or to say “thank you.”  Behaviorally, those are pretty easy things to do, but evidently some internal force (e.g., personality) makes it difficult.  In fact, those internal forces vary greatly across people, and I find chasing them down to not be a very fruitful use of time for the participant or for myself. If the organization and feedback tells you that you need to modify your behavior, just do it!

Sometimes what is going on inside the person’s head is more an issue of awareness than of personality, and awareness is something we can change through 360’s. Occasionally the journey from awareness to acceptance is difficult due to personality factors. It is our job to design the 360 process to make it difficult to not accept the feedback, including ensuring that raters are knowledgeable, reliable, motivated and in sufficient quantity.

On a practical level, when many 360 processes involve dozens or hundreds of participants, it becomes very challenging to integrate personality assessment, for example, into the mix. Not to say it can’t be done. Carol Jenkins does some of that in her practice with groups of feedback recipients. But part of my “I don’t care” mentality has come from a need to get large numbers of people to use the feedback productively without being able to “get inside their head.”

2)      The gap between self-ratings and “other” ratings. I don’t care. As a psychologist, I do find it interesting to see how ratees approach self-ratings, especially the first time around. And they usually change their self-ratings once they see how they are perceived by others. But I am increasingly convinced that self-ratings are more a reflection of the ratee’s agenda than any real self-assessment. (All raters are susceptible to using their ratings to this kind of error.) One memorable instance for me was in working with a Chief Legal Officer who gave himself all 5’s and stated, “do you think I would be crazy enough to actually document less than optimal performance?”

I DO think that participants should complete the rating process, but for other reasons. One is to ensure that they are familiar with the content and how he/she is expected to behave as defined by the organization. Secondly, it is some evidence of at least minimal commitment to the process.

In general, I am not very interested in why a ratee behaves in a certain way if it needs to change. It is highly unlikely that we can change the “why” part of behavior (i.e., personality) other than to affect their awareness of how they are perceived and the importance of accepting that feedback on the way to behaving differently.What is going on in the person’s head is fun for psychologists to research, but doesn’t necessarily help achieve sustainable behavior change.

©2011 David W. Bracken

What I Learned at SIOP

with one comment

[tweetmeme source=”anotherangle360″]

The annual conference of the Society for Industrial/Organizational Psychology (SIOP) was held in Chicago April 14-16 with record attendance. I had something of a “360 Feedback-intensive” experience by running two half-day continuing education workshops (with Carol Jenkins) on 360 feedback, participating on a panel discussion of the evolution of 360 in the last 10 years (with other contributors to The Handbook of Multisource Feedback), and being the discussant for a symposium regarding Implicit Leadership Theories that largely focused on cultural factors in 360 processes. Each forum gave me an opportunity to gauge some current perspectives on this field, and here are a few that I will share.

The “debate” continues but seems to be softening. The “debate” is, of course, how 360 feedback should be used: development only and/or for decision making. In our CE Workshop, we actually had participants stand up and stand in corners of the room to indicate their stance on this issue, and, judging from that exercise, there are still many strong proponents of each side of that stance. That said, one of the conclusions the panel seemed to agree upon is that there is some blurring of the distinction between uses and some acknowledgement that 360’s are successfully being used for decision making, and that 360’s are far less likely to create sustainable behavior change without accountability that comes with integration with HR systems.

We need to be sensitive to the demands we place on our leaders/participants. During our panel discussion, Janine Waclawski (who is currently an HR generalist at Pepsi) reminded us of how we typically inundate 360 participants with many data points, beginning with the number of items multiplied by the number of rater groups. (I don’t believe the solution to this problem is reducing the number of items, especially below some arbitrary number like 20 items.)  Later, I had the opportunity to offer commentary on four terrific research papers that had a major theme of how supervisors need to be aware of the perspectives of their raters that may well be caused by their cultural backgrounds.

As someone who is more on the practitioner end of the practitioner-scientist continuum, I tried to once again put myself in the seat of the feedback recipient (where I have been many times) and consider how this research might be put into practice. On one hand, organizations are using leadership competency models and values statements to create a unified message (and culture?) that spans all segments of the company. We can (and should) have debates about how useful and realistic this practice is, but I think most of us agree that the company has a right to define the behaviors that are expected of successful leaders. 360 processes can be a powerful way to define those expectations in behavioral terms, to help leaders become aware of their perceived performance of those behaviors, to help them get better, and to hold leaders accountable for change.

On the other hand, the symposium papers seem to suggest that leader behaviors should be molded from “the bottom up,” i.e., by responding to the expectations of followers (raters) that may be attributed to their cultural backgrounds and their views of what an effective leader should be (which may differ from the leader’s view and/or the organization’s view of effective leadership).  By the way, this “bottoms up” approach applies also to the use of importance ratings (which is not a cultural question).

My plea to the panel (perhaps to their dismay) was to at least consider the conundrum of the feedback recipient who is being given this potentially incredibly complex task of not only digesting the basic data that Janine was referring to, but then to fold in the huge amount of information created by having to consider the needs of all the feedback providers. Their research is very interesting and useful in raising our awareness of cultural differences that can affect the effectiveness of our 360 processes. But PLEASE acknowledge the implications for putting all of this to use.

The “test” mentality is being challenged.  I used the panel discussion to offer up one of my current pet peeves, namely to challenge the treatment of 360 Feedback as a “test.”  Both in the workshops and again at the panel, I suggested that applying practices such as randomizing items and using reverse wording to “trick” the raters is not constructive and most likely is contrary to our need to help the raters provide reliable data. I was gratified to receive a smattering of applause when I made that point during the panel.  I am looking forward to hopefully discussing (debating) this stance with the Personnel Testing Council of Metropolitan Washington in a workshop I am doing in June, where I suspect some of the traditional testing people will speak their mind on this topic.

This year’s SIOP was well done, once again. I was especially glad to see an ongoing interest in the evolution of the field of 360 feedback judging from the attendance at these sessions, let alone the fact that the workshop committee identified 360 as a topic worthy of inclusion after going over 10 years since the last one.  360 Feedback is such a complex process, and we are still struggling with the most basic questions, including purpose and use.

©2011 David W. Bracken

Maybe Purpose Doesn’t Matter?

leave a comment »

[tweetmeme source=”anotherangle360″]

While there are many discussions and debates within the 360 Feedback community (including one regarding randomizing items currently on LinkedIn that I will address in a later blog), probably none is more intense and enduring than the issue of the proper use of 360 results. In The Handbook of MultiSource Feedback, a whole chapter (by Manny London) was dedicated to “The Great Debate” regarding using 360 for developmental vs. decision making purposes. In fact, in the late 90’s an entire book was published by the Center for Creative Leadership based on a debate I organized at SIOP.

I have argued in earlier blogs and other forums that I believe this “either/or” choice is a false one for many reasons. For example, even “development only” uses require decisions that affect personal and organizational outcomes and resources. Also, even when used for decision (including succession planning, staffing, promotions, and, yes, performance management), there is always a development component.

One of the aggravating blanket statements that is used by the “development only” crowd is that respondents will not be honest if they believe that the results will be used to make decisions that might be detrimental to the ratee, resulting in inflated scores with less variability. I would say that, in fact, that is by far the most common argument for the “development only” proponents, and one that is indeed supported by some research studies.

I have just become aware of an article published 3 years ago in the Journal of Business and Psychology (JBP) relating to multisource feedback, titled “Factors Influencing Employee Intentions to Provide Honest Upward Feedback Ratings” (Smith and Fortunato, 2008).  For those of you who are not familiar with JBP, it is a refereed journal of high quality that should be on your radar and, in full disclosure, a journal for which I am an occasional reviewer.

The study was conducted at a behavioral health center with a final sample of 203 respondents. The employees filled out a questionnaire about various aspects of an upward feedback process that was being implemented in the future.

The article is fairly technical and targeted toward the industrial/organizational community. I have pulled out one figure for the geeks in the audience to consume if desired (click on “360 Figure”) . But let me summarize the findings of the study.

The outcome (dependent variable) that was of primary interest to the researchers is foreshadowed in the title, i.e., what factors lead to intentions to respond honestly in ratings of a supervisor (upward feedback).  The most surprising result (as highlighted in the discussion by the authors) was that purpose (administrative versus developmental) had no predictive value at all! Of all the predictor variables measured, it was the least influential with no practical (statistical) significance.

What does predict intentions to provide honest feedback? One major predictor is the level of cynacism, with (as you might guess) cynical attitudes resulting in less honesty. The study suggests that cynical employees fear retaliation by supervisors and are less likely to believe that the stated purpose will be followed. The authors suggest that support and visible participation by senior leaders might help reduce these negative attitudes. We also need to continue to protect both real and perceived confidentiality, and to have processes to identify cases of retaliation and hold the offending parties accountable.

The other major factor is what I would label as rater self confidence in their ability as a feedback provider. Raters need to feel that their input is appropriate and valued, and that they know how the process will work. They also have a need to feel that they have sufficient opportunity to observe.  The authors appropriately point to the usefulness of rater training to help accomplish these outcomes. They do not mention the rater selection process as being an important determinant of opportunity to observe, but that is obviously a major factor in ensuring that the best raters are chosen.

One suggestion the authors make (seemingly out of context) that is purported to help improve the honesty of the feedback is to use reverse-worded items to keep raters from choosing only socially desirable responses (e.g., Strongly Agree).  I totally disagree with practices such as reverse wording and randomization which may actually reduce the reliability of the instrument (unless the purpose is for research only). For example, at our SIOP Workshop, Carol Jenkins and I will be showing an actual 360 report that uses both of those methods (reverse wording and randomization). In this report (that Carol had to try to interpret for a client), the manager (“boss”) of the ratee had give the same response (Agree) to two versions of the same item where one was reverse scored. In other words, the Manager was Agreeing that the ratee was both doing and not doing the same thing.

Now what? The authors of this study seem to suggest that situations like this would invalidate the input of this manager, arguably the most important rater of all.  Now we could just contact the manager and try to clarify his/her input. But the only reason we know of this situation is that the manager is not anonymous (and they know that going into the rating process). If this same problem of rating inconsistency occurs with other rater groups, it is almost impossible to rectify since the raters are anonymous and confidential (hopefully).

This is only one study, though a well designed and analyzed study in a respected journal. I will not say that this study proves that purpose does not have an effect on honesty. Nor should anyone say that other studies prove that purpose does affect honesty. To be clear, I have always said that it may be appropriate to use 360 results in decision making under the right conditions, conditions that are admittedly often difficult to achieve. This is in contrast to some practitioners who contend that it is never appropriate to do so, under any conditions.

Someday when I address the subject of organizational readiness, I will recall the survey used in this research which was administered in anticipation of implementing an upward feedback process. This brief (31 item) survey used for this study would be a great tool to assess readiness in all 360 systems.

One contribution of this research is to point out that intention to be honest is as much a characteristic of the process as it is of the person. Honesty is a changeable behavior in this context through training, communication, and practice. Making blanket statements about rater behavior and how a 360 program should or shouldn’t be used are not productive.

360 Figure

©2011 David W. Bracken

Not Funny

leave a comment »

[tweetmeme source=”anotherangle360″]

I seem to be in a bit of a rut with themes around humor and now commercials. Despite trying to bypass as many commercials as possible with my DVR, occasionally I do see one and sometimes even for the better.

One that caught my eye/ear is one by IBM that starts with a snippet of a Groucho Marx (whom I also like very much) where he states, “This morning I shot an elephant in my pajamas.”  Of course, the fun part is when he follows, “How he got in my pajamas, I will never know.”  Ba bump.

The commercial goes on to talk about a computer called Watson that has been developed by IBM with capabilities that will be used to compete on Jeopardy (another favorite). The point is that language has subtle meanings, euphemisms, metaphors, nuances and unexpected twists that are difficult for machines to correctly comprehend.

In the context of 360 Feedback, the problem is that we humans are sometimes not so good at picking up the subtleties of language as well. We need to do everything we can to remove ambiguity in our survey content, acknowledging that we can never be 100% successful.

We have all learned, sometimes the hard way, about how our attempts to communicate with others. How often have we had to come to grips with how our seemingly clear directions have been misunderstood by others?

I became sensitized to this question of ambiguity in language during the quality movement of the 80’s and the work of Peter Senge as embodied in The Fifth Discipline and the accompanying Fifth Discipline Fieldbook. (Writing this blog has spurred me to pull out this book; if you youngsters are not aware of Senge’s writings, it is still worth digging out. There is a 2006 Edition which I confess I have not read yet.)

There are many lessons in these books regarding the need to raise awareness about our natural tendencies as humans to fall back on assumptions, beliefs, values, etc., often unconsciously, in making decisions, trying to influence, and taking actions. One lesson that has particularly stuck with me in the context of 360’s is the concept of mental models, which Senge defines as, “deeply ingrained assumptions, generalizations, or even pictures or images that influence how we understand the world and how we take action.”  In the Fieldbook, he uses an example of the word “chair” and how that simple word will conjure up vastly different mental images of what a “chair” is, from very austere, simple seats to very lush, padded recliners and beyond. (In fact, it might even create an image of someone running a meeting if we are to take it even farther.)

So Groucho created a “mental model” (or assumed one) of us visualizing him in his pajamas with a gun chasing an elephant. Then he smashes that “assumption” we made by telling us that the elephant was wearing the pajamas. That is funny in many ways.

Sometimes we are amused when we find we have made an incorrect assumption about what someone has told us. I have told the story before of the leader who made assumptions about his low score on “Listens Effectively.” He unexpectedly found that his assumptions were unfounded and the raters were simply telling him to put down his PDA. That could be amusing and also a relief since it is an easy thing to act on.

360 Feedback is a very artificial form of communication where we rely on questionnaires to allow raters to “tell” the ratee something while protecting their anonymity. This also has the potential benefit of allowing us to easily quantify the responses which, in turn, can be used to measure gaps (between rater groups, for example) and track progress over time.

Of course this artificial communication creates many opportunities for raters to misunderstand or honestly misuse the intent of the items and, in turn, for ratees to misinterpret the intended message from the raters. We need to do our best to keep language simple and direct, though we can never prevent raters applying different “mental models.”

Take an item like, “Ensures the team has adequate resources.” Not a bad question. But, like “chair,” “resources” can create all sorts of mental images such as people (staff), money (budget), equipment (e.g., computers), access to the leader, and who knows what else! We could create a different item for each type of resource if we had an unlimited item budget, which we don’t.

This potential problem is heightened if there will be multiple languages used, creating all sorts of issues with translations, cultural perspectives, language nuances, and so on.

In the spirit of “every problem has a solution,” I can think of at least four basic recommendations.

First, be diligent in item writing to keep confusion to a minimum.  For example:

  • Use simple words/language
  • Don’t use euphemisms (“does a good job”)
  • Don’t use metaphors (“thinks outside the box”)
  • Don’t use sports language (“creates benchstrength”)
  • Keep all wording positive (or cluster negatively phrased items such as derailers in one dimension with clear instructions)

Second, conduct pilot tests with live raters who can give the facilitator immediate feedback on wording in terms of clarity and inferred meaning.

Third, conduct rater training. Some companies tell me that certain language is “ingrained” in their culture, such as “think outside the box.” (I really wonder how many people really know the origins of that metaphor. Look it up in Wikipedia if you don’t.)  I usually have to defer to their wishes, but still believe that their beliefs may be more aspirational than factual. Including a review of company-specific language (which does have some value in demonstrating the uniqueness of the 360 content) during rater training will have multiple benefits.

Fourth, acknowledge and communicate that it is impossible to prevent misinterpretations by the senders (raters) and the receivers (ratees). This will require that the ratee discuss results with the raters and ensure that they are all “on the same page”. (metaphor intended with tongue in cheek).

I bet that some ratees do actually laugh (or at least chuckle) if/when they hear how some raters interpret the questions.  But more typically it is not funny. And it is REALLY not funny if the ratee invests time and effort (and organizational resources) taking action on false issues due to miscommunication.

(Note: For those interested, Carol Jenkins and I will be talking about these issues in our SIOP Pre-Conference workshop on 360 Feedback on April 13 in Chicago.)

©2011 David W. Bracken

Making Mistakes Faster

leave a comment »

[tweetmeme source=”anotherangle360″]

The primary purpose of this brief blog entry is to bring to your awareness a new article by Dale Rose, Andrew English, and Christine Thomas in The Industrial/Organizational Psychologist (TIP). I assume that a minority of readers of this blog receive TIP, or, if they do, have not had a chance to read this article. (The title would not immediately draw attention to the fact that the majority of the content is about 360 Feedback for starters.)

The article can be accessed at http://www.siop.org/tip/jan11/04rose.aspx.

As you will see, Dale and colleagues focus primarily on how technology has affected 360 Feedback processes, for good and bad. This should be required reading for practitioners in this field.

They reference a discussion Dale and I had on this blog about the “silly” rating format where raters can rate multiple ratees at the same time using a kind of spread sheet layout. They are correct that there is no research that we are aware of that studies the effects of rating formats like this on the quality of ratings and user reactions (including response rates, for example). We won’t rehash the debate here, but suffice to say that it is one area where Dale and I are in disagreement.

Other than that, I endorse his viewpoints about the pitfalls of technology. I recall when computers first became available to us to support our research. As we all struggled to use technology effectively, I remember saying that computers allow us to make mistakes even faster.

I will use my next blog to talk about, “When Computers Go Too Far,” which builds on some of Dale’s observations. Hope you will tune in!

©2011 David W. Bracken

It’s wonderful, Dave, but…

with 2 comments

[tweetmeme source=”anotherangle360″]

This is one of my favorite cartoons (I hope I haven’t broken too many laws by using it here; I’m certainly not using it for profit!).  I sometimes use it to ask whether people are more “every problem has a solution” or “every solution has a problem” types. Clearly, Tom’s assistant is the latter.

I thought of this cartoon again this past week during another fun (for me, at least) debate on LinkedIn about the purpose of 360’s, primarily about the old decision making vs. development only debate.

Now, I don’t believe that 360 is comparable to the invention of the light bulb (though there is a metaphor lurking in there somewhere), nor did I invent 360. But, as a leading proponent of using 360 for decision making purposes (under the right conditions), by far the most common retort is something along the lines of, “It’s (360) wonderful, Dave, but using it for decisions distorts the responses when raters know it might affect the ratee.”

Yes, there is some data that suggests that raters report their ratings would be affected if they knew they would penalize the ratee in some way.  And it does make intuitive sense to some degree. But I offer up these counterpoints for your consideration:

  • I don’t believe I have ever read a study (including meta analyses) that even considers, let alone studies, rater training effects, starting with whether it is included as part of the 360 system(s) in question. In my recent webinar (Make Your 360 Matter), I presented what I think is some compelling data from a large sample of leaders on the effects of rater training and scale on 360 rating distributions. (We will discuss this data again at our SIOP Pre-Conference Workshop in April.) In the spirit of “every problem has a solution,” I propose that rater training has the potential to ameliorate leniency errors.
  • There is a flip side to believing that your ratings will affect the ratee in some way, which, of course, is believing that your feedback doesn’t matter. I am not aware of any studies that directly address that question, but there is anecdotal and indirect evidence that this also has negative outcomes. What would you do if you thought your efforts made no difference (including not being read)? Would you even bother to respond? Or take time to read the items? Or offer write in comments? Where is the evidence that “development only” data is more “valid” than that used for other purposes?  It may be different, but that does not always mean better.

The indirect data I have in mind are the studies published by Marshall Goldsmith and associates on the effect of follow up on reported behavioral change. (One chapter is in The Handbook of MultiSource Feedback; another article is “Leadership is a Contact Sport,” which you can find at marshallgoldsmith.com.)  The connection I am making here is in suggesting that lack of follow up by the ratee can be a signal that the feedback does not matter, with the replicated finding that reported behavior change is typically zero or even negative. Conversely, when the feedback does matter (i.e., the ratee follows up with raters), behavior change is almost universally positive (and increases with the more follow up reported).

It’s all too easy to be an “every solution has a problem” person. We all do it. I do it too often. But maybe it would help if we became a little more aware of when we are falling into that mode.  It may sound naïve to propose that “every problem has a solution,” but it seems like a better place to start.

©2010 David W. Bracken