I've written a bunch of times on why empirical research suggests hiring practices in academic philosophy need a dramatic overhaul, and why first-round interviews specifically appear to be "worse than useless." Allen Wood's new post on interviews at the APA Blog very nicely illustrates the many problems with interviews and subjective evaluations of dossier materials. Wood writes that:
- When it comes to judging dossier materials, "Bigotry and philosophical sectarianism become major factors."
- "A search committee member who imagines that he or she has been smart enough to spot a philosophical error in an otherwise exemplary paper may become obsessed with this self-conceited crotchet, ignoring the paper’s excellence and voting in such a way that the best candidate gets the shaft."
- "Horse-trading between search committee members may also influence the process: two excellent candidates may each be unacceptable to one member of the committee, and a third inferior candidate may be selected as a compromise."
- "interviews are the most artificial kind of human encounter imaginable."
- "My grisly similes may give you the impression that most interviews do not go well for the candidate. In fact, most are utter calamities...The catastrophe is often the fault of the interviewers (or just of the artificiality of the situation). Interviewers are fallible human beings who may have good intentions but fall prey to their own intellectual limitations or unconscious prejudices."
- "Envy and fear of being shown up may make them not want to hire you precisely because of the same high qualifications that forced them to interview you. They are usually nervous about interviewing you because they probably haven’t read your dossier and would not be competent to judge your work even if they had."
- "A few manage to succeed, by not showing they realize what a gruesome disaster the whole thing is, and then somehow infecting the interviewers with this total misperception of what has just happened. How do they do it? I wish I could tell you."
- "Careful preparation, practice, and innate talent for interviewing are perhaps necessary conditions for an interview that impresses people, but it always seems to me to be mostly a matter of dumb luck."
Prof. Wood, in my view, has done a very good service in drawing more attention to all of these problems. As I recently argued in detail, these problems--and decades of empirical research--show that we should take candidate selection more out of the hands of subjective human judgers, and instead prioritize more objective, algorithmic narrowing processes. Fortune 500 companies and the US government are increasingly overhauling their selection and hiring methods in precisely this way. The science tells us, and Wood's post nicely illustrates, why we should follow suit. The "human element" does not improve the selection process. It turns it into an ungodly crapshoot.
What you say here sounds much stronger than the claim you made in the comments of your earlier post:
"My claim is not that the entire hiring process should be automated--that we should merely use algorithms to hire people. I don't know any I-O Psychologist who would advocate that. What I am claiming is that the empirical research indicates that we should use algorithmic means in early stages of selection to whittle down candidates to semi-finalists or finalists. That's all."
That would still leave the final selection in the hands of "subjective human judgers," particularly at the finalist stage where interviews are likely to play a bigger role. But - taken in context - you seem here to be advocating the removal of subjective judgment even at this late stage of decision making.
I'm left wondering both whether you really do think there is any sensible role for individual judgment and, if so, why, given your confidence in its unreliability as compared to more mechanical decision procedures.
Posted by: Derek Bowman | 01/26/2016 at 10:29 AM
Hi Derek: Thanks for your comment! I'm not sure why you think what I'm saying here is stronger than what I previously wrote, including in the passage you quote.
In the present post, I argue that the process of *narrowing down* applicants should be more algorithmic. Nothing I have said here indicates that the final stages of a hire shouldn't include subjective judgments. And indeed, or so my spouse tells me, the kind of actual, directly work-related demonstration that occurs late in a hiring process (e.g. a teaching demo) have been found to have some predictive power (though here too, I think, candidates should be scored by judgers, and scorings should probably be normalized to correct for whatever gender, race, etc., biases are known to influence such judgments).
Posted by: Marcus Arvan | 01/26/2016 at 10:54 AM
The reason it seems stronger is that this is in the context of a discussion of interviews, which often play a larger role at the later end of selection for which you previously claimed not to support the substitution of algorithms for individual decisions. How else could these features of late-stage-selection bear on your commitments to the preferability of algorithms?
Posted by: Derek Bowman | 01/26/2016 at 11:53 AM
I've followed this discussion across posts and comments -- and I read Marcus in the same way Derek does.
I truly have no idea how an "algorithm" for philosophy job hiring is supposed to work. There is no widely agreed upon standard for what counts as good or even interesting philosophy, or even sometimes what counts as philosophy (e.g., dismissals of work related to gender or race as "sociology"). Nor is there any widely agreed upon standard for how to value and weight the various considerations that go into hiring, including quantity publications.
For example, some philosophers attach value and weight to academic pedigree, for various reasons that seem legitimate to them -- which many younger philosophers active in the blogosphere seem to regard as a "bias." Other philosophers attach little value and weight, in effect, to gender and racial diversity, inclusion, or balance. Then there are area "bias," publication venue "bias," research vs. teaching "bias," etc.
In short, who would write the algorithm that is supposed to make philosophy job hiring not merely "more objective" but critically so? Even something as modest as countering gender or racial bias early in a search process (viz., through "blind" initial review, which requires no algorithm) is going to run into those same realities once the candidates are unmasked, so to speak. I'm not getting it.
Posted by: anon prime | 01/26/2016 at 02:37 PM
Hi Derek: In all honesty, I waver here. On the one hand, as I understand it, the empirical literature suggests that algorithmic approaches outperform human judgers on predictive reliability in *general*. Insofar as this is the case (to the best of my understanding), I would indeed advocate an algorithm-only approach. But, human beings being what they are, and difficulties setting up algorithms to measure some things (such as "collegiality") being what they are, I suspect that some amount of human judgment will always be with us.
My general contention, therefore, is that it seems especially important to go the algorithm approach early on in the narrowing/selection process, and (at best) only let subjective judgments into the game at the very end, when judging differences between say, two finalists. And indeed, as I mentioned before, my understanding (speaking to my spouse) is that actual *performances* (e.g. a teaching demo) have predictive power (power that early-stage interviews do not).
Posted by: Marcus Arvan | 01/26/2016 at 04:38 PM
Marcus, thanks for the reply.
I wonder if you understand just how strange your position sounds. It is one thing to say that we should take advantage of the best research in empirical psychology to help us create conditions under which the exercise of our own judgment can operate more reliably. But you seem to be saying that - in general - we would be better off doing away with human judgment in making human decisions entirely, if only we could. The only reason you waver - you seem to say - is that you think, sadly, we may be stuck relying on our own judgment in some areas (at least until we get better measures). Perhaps you mean the "in general" to be scope limited to hiring decisions, but it's hard to see what plausible cognitive model would make that one kind of decision special in this way.
This is not an argument that your position is wrong, but it does make me wonder if I've misunderstood you. It is a very radical claim, but you act surprised when people respond to it as such.
Posted by: Derek Bowman | 01/26/2016 at 07:47 PM
Hi Marcus,
Like others, I am confused about how far you want to take this. What do you think of the following:
"Human biases and irrationality inevitably play a role in hiring decisions, at least, under our current system. If possible, we should take steps to minimize this bias and irrationality. One way to do this is with the help algorithms that would narrow down the applicant pool. For example, we might use an algorithm to select, say, 30 promising candidates from the original pool. This can work alongside subjective review. For instance, search committee members might argue that a candidate who was not chosen by the algorithm is exceptional in unique ways, and therefore merits further review. If algorithmic selections are obviously problematic to the subjective eye, those applicants can be quickly eliminated."
I think that what is described above would help eliminate current problematic practices, while also leaving ample room for needed human judgement. Of course, there are some worries about the slippery slope of search committees making too many exceptions. Nonetheless, I would rather live with this worry than ban the possibility of live humans overruling an algorithm.
I get the impression you want to go much further than what is suggested above. Is that correct? If so why? Why couldn't we just make use of an algorithm without giving it overriding power?
Posted by: Bob | 01/26/2016 at 08:56 PM
Hi Derek: Sorry for taking so long to respond (and apologies to others for taking so long as well). I am swamped with work this week, but I *will* respond to everyone's comments as quickly as I can!
To answer your query, yes, I am well aware of "how strange my position sounds." However, I do not think I act surprised when people respond to it as such. Far from it! This is a fight that *empirical psychologists* who work on this have been fighting for decades. The kinds of studies I reference (e.g. http://psycnet.apa.org/journals/law/2/2/293/ ) have explicitly, and repeatedly, addressed "widespread resistance" to these conclusions, rebutting the very kinds of arguments people are raising in these threads!
The conclusions I am defending may sound strange--but they are (in my view) the conclusions empirical findings actually support. It's not the job of science to be "intuitive." The idea that space and time are relative once sounded outrageous, and was rejected wholesale by people (including leading physicists) who found it too "strange." We need to not dismiss empirical findings as counterintuitive, but instead learn from them.
In 2002, Billy Beane of the Oakland A's baseball team radically revamped their drafting process utilizing the actuarial methods I am describing. They fired their scouts, and selected players merely on the basis of on-base percentage and runs-generated--calculating precisely how many runs they would need to make the playoffs. Their scouts called Beane's plan absurd, saying that (obviously!) only they--in their infinite scouting wisdom, with 25+ years (or whatever) of experience--could select the best ballplayers. The scouts then scoffed at some of the players the A's drafts, such as Kevin Youkilis, who scouts considered too heavy and too slow to draft.
Well, what do you know: with one of the lowest payrolls in the entire major leagues, Bean's purely statistical strategy paid off. The A's made the playoffs every year *just* as their algorithms predicted, and Kevin Youkilis is now a 3 time all-star and 2 time World Series champion. Since then, other major league baseball teams have adopted a similar approach.
Now, of course, this is only baseball--and "baseball achievements" (walks, runs, hits, etc.) are easily quantifiable. But...here's the thing: the empirical literature I'm pointing to shows that, generally speaking, whatever you want to predict, to the extent that you can carefully draw up an algorithm (which does take time and careful empirical work to do), that algorithm will tend to be as good or better at predicting *that thing* than human judgers.
This, in brief, is why Industrial-Organizational psychology is the single, top-growing field in North America (growing at 53% a year http://abcnews.go.com/Business/americas-20-fastest-growing-jobs-surprise/story?id=22364716). Their field is increasingly transforming hiring/selection from a haphazard, empirically-unsupported process into a genuine science based on measuring the predictive accuracy of different approaches. And, by and large, as I understand it, the evidence broadly supports the use of algorithms, as far as we can
An increasing number of governmental industries (NSA, CIA, etc.) and Fortune 500 companies (Kellogg, etc.) are hiring I-O psychologists because the methods demonstratively work, leading to better hiring outcomes, better productivity, etc. It is time, in my view, for academia to broadly follow suit, adopting hiring process that actually have empirical support and predictive power.
Finally, though, I should add--as I have a few times--that in my understanding actual work-performances (e.g. things like teaching demos, not interviews) have been found to have some predictive value, so this is one place (late in the hiring process) where human judgment probably have a good, legitimate role to play.
Posted by: Marcus Arvan | 01/27/2016 at 05:13 PM
"But...here's the thing: the empirical literature I'm pointing to shows that, generally speaking, whatever you want to predict, to the extent that you can carefully draw up an algorithm (which does take time and careful empirical work to do), that algorithm will tend to be as good or better at predicting *that thing* than human judgers."
So, if I'm now getting it, your view is largely about procedure. The aim is to better predict the best job candidates given whatever values and priorities an algorithm is designed to reflect. Of course, this would require that an algorithm's designers have in advance a clear idea about those values and priorities.
I'm not sure how well this might address what some of us would think of as deeper, substantive issues of fairness in hiring, especially when not driven by statistics, grant money, awards, or other undisputed measurables. But I do understand how such an algorithm could help a philosophy department, whatever its values and priorities, better satisfy the clear hiring goals it happens to have.
Posted by: anon prime | 01/27/2016 at 06:36 PM
Marcus, thanks for the reply (and no rush!).
"We need to not dismiss empirical findings as counterintuitive, but instead learn from them."
This is a very dangerous attitude to have when it comes to moral issues of individual and group choice. One need only consider the many pernicious scientific empirical findings about the relative mental and moral powers of men over women, of white Europeans over other "races" or "civilizations", etc. (See for example Mill's discussion of the science of "brain size" in Chapter 3 of the Subjection of Women). To automatically surrender our judgment in the name of "empirical findings" is a grave abdication of our responsibility to critically incorporate such findings into our own best thinking about the world and our place in it.
"to the extent that you can carefully draw up an algorithm (which does take time and careful empirical work to do), that algorithm will tend to be as good or better at predicting *that thing* than human judgers."
I don't think anyone here has expressed doubts about this claim. What we doubt is that the thing measured by the algorithm will be identical with - or a reliable proxy for - what makes someone a good philosopher-scholar-teacher-colleague. I don't doubt that well-designed algorithms are very good at predicting the things that algorithms can measure and predict. What I doubt is the wisdom of forcing ourselves and our judgments about philosophical merit into such a Procrustean bed.
"in my understanding actual work-performances (e.g. things like teaching demos, not interviews) have been found to have some predictive value, so this is one place (late in the hiring process) where human judgment probably have a good, legitimate role to play."
But we have no plausible cognitive model for why this should be so. By hypothesis, wouldn't we expect to be better off using an algorithm to assess teaching demos, etc? If we're only relying on empirical generalizations, we have no plausible model of human thinking into which to fit these results.
Posted by: Derek Bowman | 01/27/2016 at 08:42 PM
I share Derek's doubt that "the thing measured by the algorithm will be identical with - or a reliable proxy for - what makes someone a good philosopher-scholar-teacher-colleague"--at least given how things stand currently.
My impression is that most philosophers have a hard time articulating what they think makes for a good colleague, or operationalizing it, or coming to consensus or compromise about it with others in their department.
However, I share some of Marcus' support for trying to get to a place where we can have such algorithms. I think it's possible to get to a place where they are reliable--at least for some aspects of hiring in academia. I suspect that we may always want to have human judgment as at least a kind of oversight at the end of the process, but I don't think that what we're looking for is for mystical, or nuanced, that we couldn't offload some of it onto an algorithm.
Again, I worry about us falling into the temptation to think the numbers are more reliable than they are. Or thinking that the only things that exist (or that matter) are the things that can be easily measured.
But even if to only push ourselves to articulate more clearly exactly what it is that we want, I think attempts to quantify aspects of hiring may be beneficial. And I think we can use such quantification without "automatically surrendering" our judgment or shirking our moral responsibility. (Though agreed--that is a danger that must be accounted for.)
Posted by: Stacey Goguen | 01/27/2016 at 10:32 PM
Anon prime: That's exactly right. The point is one of procedure. I care very much about the kind of deeper, substantive questions about fairness (etc.) that you mention. By all means, we should examine these issues carefully. However, in the meantime, as Allen Wood's post illustrates, academic searches utilize poor *processes* -- ones that fail to reliably predict outcomes (as well as algorithms) or counteract biases at all (leaving the hiring process entirely up to the whims of individual search committee members and committees). As everyone points out, under prevailing conditions, the job market is little more than a "crapshoot." My point is, yes of course, there are deeper issues to discuss, debate, and improve--but one place we can start is with better processes.
Posted by: Marcus Arvan | 01/29/2016 at 01:32 PM
Hi Derek: I think I largely agree with you.
We should not just "defer to science", as science itself can be biased and put to bad social-political uses. We should also not just assume that "the thing measured by the algorithm will be identical with - or a reliable proxy for - what makes someone a good philosopher-scholar-teacher-colleague." These are things that we should -- by all means-- think about carefully.
My points are merely that we need to (A) stop *ignoring* the science, and (B) think carefully about whether prevailing selection/hiring procedures in academia withstand critical empirical scrutiny. Just like we shouldn't ignore climate science in favor of intuitions about the weather [viz. "I don't see global warming. We've had such a cold winter"], we shouldn't ignore the science of selection--especially when, as I'm told, there is such consensus in the field.
I think Wood's posts illustrate just how absurd prevailing hiring procedures are, as well as why algorithms are generally better. As Wood's posts illustrate, prevailing hiring procedures in academic philosophy contain *no* real controls for any form bias whatsoever (prestige bias, gender bias, etc.). The process tends to be determined by the mere whims of search committee members (and, yes, committee politics, etc.). This, in my view, is why the market is such a "crapshoot", and why it seems to many people not even remotely meritocratic. My point is: if you wanted to come up with a reliable hiring process (in terms of predicting anything), the process we currently have is woefully poor -- not only upon reflection, but given empirical science. And we should at least look to the science to see if we can do better (which, I think, it suggests we can).
My point is that, however one defines "good scholar", "good colleague", or whatever, one should use the best, most validated methods for measuring those things. The empirical literature, as I understand it, strongly suggests that to whatever extent we *can* operationalize things with algorithms, we should--as algorithms are the most reliable way to control for/counteract pernicious biases (biases that don't track merit, but also don't track truth for that matter -- since a lot of the things one looks for in a colleague, "collegiality", can be faked in an interview).
In short, I'm mostly just trying to point out just how problematic prevailing academic hiring methods are, and how they might be improved. I don't pretend to know "The God's Honest Truth" about this stuff...but I do think it is important to discuss and debate it, paying attention to (rather than ignoring) what science there is.
Posted by: Marcus Arvan | 01/29/2016 at 02:08 PM
Marcus,
At this point I, (and possible others), would love to know how you conducted your search this year - by "you" I mean both your department and you personally.
Posted by: Lovely Colleague | 01/29/2016 at 05:10 PM