I remember, quite a few years ago, giving the same introductory logic course two years running, as far as I could tell doing as a good a job each time. But my student evaluations plummeted between one year and the next. Why? I could only put it down to the fact that the first year I gave the course in relaxed casual dress; the next year (because a committee was scheduled the same afternoons) I wore a rather serious suit. So I supposedly came across as remote, unhelpful, and harder to understand.
I was reminded of that experience -- which made me permanently a tad sceptical about the worth of student evaluations -- when I read these two scepticism-reinforcing pieces*, by the philosophers Michael Huemer and Clark Glymour. I was particularly amused (in a world-weary sort of way) by this excerpt from the former:
[There was a] study, in which students were asked to rate instructors on a number of personality traits (e.g., "confident," "dominant," "optimistic," etc.), on the basis of 30-second video clips, without audio, of the instructors lecturing. These ratings were found to be very good predictors of end-of-semester evaluations given by the instructors' actual students. A composite of the personality trait ratings correlated .76 with end-of-term course evaluations; ratings of instructors' "optimism" showed an impressive .84 correlation with end-of-term course evaluations. Thus, in order to predict with fair accuracy the ratings an instructor would get, it was not necessary to know anything of what the instructor said in class, the material the course covered, the readings, the assignments, the tests, etc.So now you know: bounce in optimistically, wave your hands around confidently, and you can sell the kids anything ...
Williams and Ceci conducted a related experiment. Professor Ceci, a veteran teacher of the Developmental Psychology course at Cornell, gave the course consecutively in both fall and spring semesters one year. In between the two semesters, he visited a media consultant for lessons on improving presentation style. Specifically, Professor Ceci was trained to modulate his tone of voice more and to use more hand gestures while speaking. He then proceeded, in the spring semester, to give almost the identical course (verified by checking recordings of his lectures from the fall), with the sole significant difference being the addition of hand gestures and variations in tone of voice (grading policy, textbook, office hours, tests, and even the basic demographic profile of the class remained the same). The result: student ratings for the spring semester were far higher, usually by more than one standard deviation, on all aspects of the course and the instructor. Even the textbook was rated higher by almost a full point on a scale from 1 to 5. Students in the spring semester believed they had learned far more (this rating increased from 2.93 to 4.05), even though, according to Ceci, they had not in fact learned any more, as measured by their test scores. Again, the conclusion seems to be that student ratings are heavily influenced by cosmetic factors that have no effect on student learning.
And I should say that these days I always wear a suit to lecture (so I've a cast-iron excuse for any poor evaluations, of course).
Added For a bit of judicious balance, do read Richard Zach's second contribution (Comment 12 below), and the linked paper.
*Links from twitter, thanks to John Basl and Allen Stairs