I am always looking for ways of making my grading fairer and more balanced. One issue I've discussed with my peers is whether to grade blue books one book at a time or one question at a time. While I've been using the former strategy, I've also suspected that the latter may be less subject to bias; and I have now come across empirical evidence strongly suggesting that my suspicion was correct.
In his recent (2011) book, Thinking Fast and Slow, psychologist and Nobel Prize Winner Daniel Kahneman presents a famous study of ordering effects on people's judgments of character. The study, developed by Solomon Asch, asks subjects to evaluate the personalities of the following two people:
Alan: intelligent--industrious--impulsive--critical--stubborn--envious
Ben: envious--stubborn--critical--impulsive--industrious--intelligent
Most people judge Alan more favorably, even though Alan and Ben are described with exactly the same adjectives! The ordering of the list exerts a considerable influence on our overall judgment of Alan's and Ben's characters. As Kahneman explains, an immediate impression is formed after reading the first two or so adjectives, and this impression affects our interpretation of somewhat ambiguous adjectives such as 'critical'. If we have already started to form a positive impression based on the initial adjectives, we are more likely to interpret 'critical' in a positive way than if it is one of the first adjectives we are given. This is the unconscious brain's way of making coherent sense out of several distinct pieces of information that might otherwise be difficult to reconcile with one another. The process Kahneman describes here is part of a larger phenomenon referred to as the "halo effect": "The tendency to like (or dislike) everything about a person" (Kahneman 2011, pp.82-83).
This same ordering effect can occur in grading. Kahneman noticed this as he graded his blue-book exams, which he had been grading one book at a time. Like many of our philosophy exams, each of Kahneman's exams consisted of two long essays. After reading and grading the initial essay, he noticed that he was more inclined to grade the second essay in a way consistent with his grading of the first; i.e. if he gave a good grade to the first essay, he was more likely to be lenient in grading the second, and if he gave a poor grade to the first essay, he was more likely to be uncharitable and harsh in grading the second essay. Once Kahneman started grading his blue books one essay at a time, this effect was reduced, and he was very surprised at how much differently some of his students performed on the two essays. Indeed, when we went back afterward and found disparities between a student's two essay grades, he was strongly tempted to change one of the grades to reduce the disparity (Kahneman 2011, pp. 83-84).
This type of phenomenon seems to me to be pretty convincing evidence that we can be more objective in assessing our students' performance if we grade blue books one question at a time, rather than one book at a time. Ideally, the process should also be blind. It looks like I should change my approach.
What are some of the strategies you have adopted to minimize bias and maximize consistency in your grading of blue books and papers?
I always grade my blue book exams one essay at a time, because I suspected something like this was true. (I also just like focusing on one question at a time. I feel that helps fairness somewhat, too, since I'm seeing everyone's answer to the same question at the same time.)
For papers, I always have my students submit their papers with just their ID number on it. (And I always fold back the cover page of the blue book.) That way I don't know whose papers is whose until they're all graded.
Posted by: Jamin | 05/23/2012 at 10:34 PM
I think answer keys help a lot in this regard. If you have an answer key to refer to, you can compare students' answers to the key and limit the impact of these kind of framing effects. Sadly, essay questions often allow for such varied answers that keys don't work extremely well. One way around that problem is to craft questions that are more specific, which should make answers less varied. Another strategy is to make certain parts of the question very specific while leaving other parts more open-ended. Here's an example: "Explain Peter Singer's argument against factory farming. Does the argument succeed? Why or why not?" The critical evaluation of the argument will vary, but it should be easy to determine whether their presentation of Singer's argument is accurate and sufficiently detailed.
Posted by: Trevor Hedberg | 05/24/2012 at 12:39 PM
I've followed Kahneman's procedure for a long time, although I'd done it both because I thought it was fairer and because I found it easier and faster. Like Jamin, I fold back the cover so that I don't know whose exam I'm grading. I also shuffle the exams between questions.
There was a long discussion of this topic on Crooked Timber at the end of March: http://crookedtimber.org/2012/03/28/evaluating-students-the-halo-effect/
I think the most important counterargument to Kahneman from that discussion is that, depending on the exam, students might implicitly assume that you've read their earlier answers. So, if they define a term carefully in Essay #1, they might not do so again in Essay #2. Kahneman's method would lead you to mark down Essay #2, which is arguably unfair.
Posted by: David Morrow | 05/24/2012 at 03:12 PM
As for the point David brings up, there seems to be an obvious and easy solution: just tell your students ahead of time how you'l be grading. I try to be as transparent as possible with my students as to how I evaluate them.
To that end, by the way, I also use a grading rubric for their papers, which they have access to ahead of time, while they're preparing their papers.
Posted by: Jamin | 05/24/2012 at 05:37 PM