Our books

Become a Fan

« 3QD Blogging Prize: The Voting Begins! | Main | Some notes on the Extended Narrativity Hypothesis »



Feed You can follow this conversation by subscribing to the comment feed for this post.

Justin Caouette

Hi Marcus,

Thought provoking post (as is often the case with your posts but I digress). I did want to bring up a few worries though.

First, the analogy with sports and draft position isn't a good one. The players are being interviewed every time they are on the field. In philosophy, it's not like that. So, even though a combine workout can weigh in favor or against a prospect it is usually only an all things being equal metric. The performance on the field is what does the most work for the prospects (Even Jemarcus Russell had a 10-1 season and some excellent come from behind wins against Alabama, etc.).

Second, isn't looking over one's dossier, and "counting their pubs" overly focusing on one aspect of our jobs as philosophers. Teaching is surely important and an in person interview is much better at gauging how the person will perform in the classroom. It's a lot bettr than simply looking at teaching evals (IMHO).

Lastly, doesn't the choice to not have interviews bias against those who do damn well at them. Or am I missing something?

Here is a few things to consider, in my case anyway.

I'm from Calgary, an under the radar program. I can help my chances with an interview given that I think one of my stronger traits is my ability to work a classroom and show my enthusiasm for the discipline. I think I can write just fine but given that my program is MUCH shorter than those in the states (4 years with much of that time spent on 3 intense examinations and a year of course work) my dossier will not be as impressive when compared to the 7-8 year PhD from the states, or someone like yourself who has been publishing for years since graduating. So, it seems that in not interviewing, folks like me are at a disadvantage. And, given that the name of my institution may already work against me this seems, well, shitty.

Now, this is not to say that the past shouldn't matter at all only that moving away from the interview all-together works against folks like me. I did have a couple of questions for you, Marcus.

Isn't reviewing one's work (rather than an in person interview) just a different way of "spotting talent"?

Also, the suggested approach creates systematic obstacles for folks who have what it takes to be a good pro if only they were given a chance. I'm thinking here of folks who have to work while in grad school which makes publishing nearly impossible. Those folks would never get a job if we were to focus ONLY on past success. Am I off to worry about such cases under your suggestion of no interviews?

Kate Norlock

Like Justin, I worry about the extent to which the prestige of an institutional affiliation with a grad program, and the luck of excellent funding (vs loads of teaching and grading at the expense of publication) would then outweigh skills such as classroom behavior and effective oral presentation. But perhaps, Marcus, you mean this to be a post applicable to R1 institutions, and not to the rest of us?

Christopher Stephens

While I share some of Justin and Kate's concerns about valuing classroom behavior, is there really evidence that the person who comes across shy and nervous in an interview will be a worse teacher? Marcus doesn't tell us about their teaching backgrounds, but won't the same points about actual success apply to teaching as to research? Is one data point about a very high stakes oral presentation really a good guide to future success as a teacher? I'd be interested in empirical evidence on this.

Suppose both candidates have good teaching evaluations, good letters about teaching from people who've observed their classes, extensive teaching experience, well thought out syllabi, etc. In that case, should we really use A's better performance in the job talk as a good reason to think A will be a better teacher than B?

Marcus Arvan

Hi Justin and Kate: Thanks for your comments!

I'm not advocating merely counting pubs. The empirical studies I'm referring to show that algorithmic, hard-data approaches to *all* aspects of hiring (e.g. teaching reviews, etc.) are better than soft-measures (i.e. interviews, teaching demos, etc.).

There are several reasons why this is. Let me explain each of them.

First, studies of interviews, demos, etc., consistently show that raters tend to favor/disfavor candidates largely on the basis of factors irrelevant to the job in question. So, for example, in interviews and other types of demonstrations, people have been shown to consistently favor (1) taller candidates over shorter candidates, (2) people with deeper voices over men with higher voices (especially men), (3) attractive people over less-attractive people, (4) extraverts over introverts, (5) men over women, etc.

In other words, although interviewing committees and people watching demos like to think they are evaluating the performance "where it counts", there is an overwhelming amount of evidence that people favor/disfavor candidates on almost entirely arbitrary grounds.

This is the first reason why algorithmic selection-processes have been consistently observed to result in better outcomes (e.g. higher performance reviews of hired candidates) than soft measures (such as interviews). Hard measures--i.e. someone's long-term research record, teaching record, etc.--tend to be far less based on arbitrary judgments and more on actual performance.

This brings me to the second problem with interviews/demos. Any statistician or data-collection expert worth their weight in salt will tell you (as Chris Stephens points out) that (1) many data-points are better than one, and (2) to be good evidence, data-points must be *representative* of actual, normal performance. Interviews and demos are neither--and here's why.

Every sort of performance known to humankind is subject to outliers. So, for example, the Denver Broncos are a really good football team--yet they laid a "stinker" this past week. Similarly, Joe Montana was perhaps the greatest quarterback in NFL history--but even he had bad days. Single-case observations cannot possibly distinguish an outlier from a person's central tendency, i.e. their normal performance. The best indicator of a central tendency is given by *many* data-points. So, for instance, if you want to know whether the Broncos are actually a good team, the best way to do it is have them play lots of games against other teams and see how many games they win. The same goes for all other human endeavors. If you want to know whether someone is a good teacher, you shouldn't base your judgment on a one-off performance or interview. You should base your judgment on the person's body of work--as that provides far more data pertaining to the person's central tendency, or normal performance.

The problem here is even worse for interviews and demos, as outliers are far more likely to obtain in *abnormal* test-conditions--which interviews and demos certainly are.

Consider, to begin with, a slick extravert candidate who normally doesn't give a damn about teaching, but can put on one really good show if s/he actually prepares. This person may give a killer teaching demo...yet it is not at all indicative of their normal performance. On a day-to-day basis, they may be poorly prepared, care more about research than teaching, etc.

Now consider an introvert who has a long-term record of teaching success, but who--as introverts are often wont to be--gets unusually uncomfortable in unfamiliar situations. This person may normally be a killer teacher, with years of excellent performance, and yet in this novel, highly abnormal situation, at a university they are unfamiliar with, with students they are unfamiliar with, being watched by people they are unfamiliar with, they may put up a "dud." Is that at all indicative of their normal performance? No--but the interview or demo can make it *seem* as though it is.

In other words, interviews and demos are subject to "masking." Candidates, quite frankly, can misrepresent themselves. Someone may look like they have everything it takes to be an excellent researcher...except the hard-data indicates otherwise. Someone might look like they have everything it takes to be an excellent teacher...except on a day-to-day basis they don't really care that much about teaching. Etc.

Finally, and on a related note, interviews/demos by their very nature don't--and can't--track many other things directly relevant to job performance. Consider, if you will, all of the things it takes to be a stellar teacher day-in, day-out over the course of a semester or academic year. It takes (1) a great deal of consistent, daily preparation of multiple courses, while (2) juggling those demands with those of research, while (3) juggling those demands against committee service, advising.; it takes (4) time and effort providing written and verbal feedback to students on term-papers, etc.; it takes (5) knowing how to respond to below-average students; etc. In other words, it takes a variety of very specific skills that are entirely removed from a single teaching demo.

These are just some of the reasons why hard-data are better predictors than observations. Hard data are (A) less subject to bias by irrelevant factors (e.g. height, attractiveness, speaking voice, etc.), (B) more reflective of actual central tendencies (i.e. normal performance) than one-off outliers, (C) more reflective of performance in *relevant* contexts (i.e. day-to-day teaching performance while juggling normal responsibilities); and (D) less subject to masking.

Potter Stewart

"These are just some of the reasons why hard-data are better predictors than observations. Hard data are (A) less subject to bias by irrelevant factors (e.g. height, attractiveness, speaking voice, etc.), (B) more reflective of actual central tendencies (i.e. normal performance) than one-off outliers, (C) more reflective of performance in *relevant* contexts (i.e. day-to-day teaching performance while juggling normal responsibilities); and (D) less subject to masking."

With the caveat that hard data _are_ observations (where do we think the data comes from?), this seems right, but only as far as it goes.

To simplify things, imagine that all I can are about is successful teaching. What data should I look at? Student grades? Student feedback? Teaching focuses letters of recommendation. As it turns out, the data predictors on successful teaching are all suspect, because they are all indirect. Every last one of them. Not so much so that we should never look at them, but at least so much so that we should not fetishize the numbers. And we might think that, at least while our lab studies of aptitude in teaching are still in their immaturity, it is not crazy to think that a good teacher is a bit like pornography -- I know it when I see it.

Potter Stewart

Also, I want to agree with Justin that the sports analogy is pretty weak.

If you think Russell only had one good year, then you must think that going 10-1 at LSU, getting a top five ranking, and beating Alabama before getting injured is a "bad" year.

And while Manning over Leaf seems clear in retrospect, remember that Manning never could win the big game (you can't spell Citrus without UT and all), while Leaf set Pac-10 passing records, helped WSU to its first ever Pac-10 championship, and finished second in the nation in passing rating. Them's performance numbers, even if he did lose the Rose Bowl to the national champion Wolverines. (Manning was then, not surprisingly, in the Citrus Bowl.) And it should be noted that Leaf was clearly raising the bar for WSU, which has since faded, while Tennessee won the whole kit and caboodle as soon as Manning was off campus. Leaf did more than Manning with less.

As for Smith, it isn't like he didn't perform. And many, many think that his holdout really hurt him. (He might be the best case -- although even here, it was his numbers that made him attractive, not his interviews...)

Brady? I watched Brady play. He showed up on a team that was national championship stuff, and he barely maintained his starting job, and while he didn't have anything resembling a _bad_ collegiate career, surely you're not suggesting that anyone looking at his collegiate performance would have known that he'd have the professional career he's had? If we went based off of data, like you're suggesting, the 6th round might even be generous. His biggest win resulted from an opponent's missed extra point!

Maybe Joe Montana makes the case for you. I don't know. I'm not old enough to have watched him in college. But it strikes me that taking his nickname and reputation to be a good predictor of success is the exact opposite of the way you'd want to go...

Justin Caouette

I have so much to say, Marcus. I'll type out some of my main concerns in a future post (likely next week). However, I did want to make two small points up front.

(1) Interviews are equally important for the candidate. I (as a candidate) would like to know more about the people (AS PEOPLE and not as written descriptions of people) I am going to be working with. Same with department structure and collegiality, in visiting a department you can get a vibe re: how business is done and how active students and colleagues are. Sure, this is not infallible (maybe they put on a show for your visit) but it's information that should be considered when one has multiple offers on the table. Even if they put on a show for you, that tells you something.

(2) Re: the draft/combine I second Potter’s apt points re: Brady and Manning. And further, since the advent of the NFL combine (in roughly 1980) one could argue that a team's ability to draft players successfully has gotten better, not worse! This is an empirical claim so it's one we could look into. This is in part *because*of the combine. Players once thought of as fast because they played poor competition now get exposed with the combine, along with many other examples. Sure, you can point to an Akili Smith, but for every Smith there are 10 players that did pan out since the combine. Without the combine Smith would still go. And it seems unobjectionable that there were draft busts prior to the combine (which suggests that a non-interview format, assuming combines are similar to campus interviews, is not much better).
All in all I am very skeptical that we can quantify some of the attributes you are looking for (good teaching, research, collegiality, etc.). I mean teaching evals as indicative of how good a prof is? Really?! Unless we put every eval we have ever received in our teaching dossier they don’t seem very meaningful. Surely, some often ill-prepared prof will have some good evals (from students who got a good grade and feel good about saying nice things or what have you), and over time said prof will have accrued 50 or 60 of them. Are you suggesting that we include them all? Not to mention the practical issues that arise even if one granted that evals are indicative of good performance (which I question). How likely is it that the hiring committee would (or even COULD) read them all? Especially for those who have taught 15-20+ classes! When I was first putting my dossier together I included ALL of them until I was quickly told that that wasn't how things were done. Pick your best from each class I was told. I thought this was a joke! But it’s not and it is currently a data point from which hiring committee’s look into.
Lastly, I agree with you that "Every sort of performance known to humankind is subject to outliers". But I also think there are outliers to the method you propose. I think that folks could be great for a job (and in some cases better than someone who does "quantify" well on all of data points you find most important). Neither one of us is saying that our process would be perfect, right? Showing that there are some that fall through the cracks is not reason to throw the process out the window, not necessarily anyway. Unless you can show that your process would be better. Nothing I have read thus far leads me to believe that a data driven process is unbiased or is better at selecting for candidates than flying out the best (3-5) of your application pool and relying on the interview (at that point). Up until that point I am with you that the dossier matters, but all things being equal I'd rather have the opportunity to show my skills in person rather than let my dossier with UNIVERSITY OF CALGARY pinned to the front page do ALL the work on whether or not I get the job or not. To think that folks won't look at my application different when it's compared to someone from NYU or Rutgers (all things being equal) is VERY optimistic.

That was longer than expected. I'll stop there. I will definitely write a post of my own so as to refrain from hijacking this thread.

Thanks again for the very thought-provoking post Marcus. You have made me think long and hard about interviews and given this is the job season it hits very close to home for me. It also hits close to home because, as I mentioned on FB, I conducted interviews for both a fortune 500 company AND a residential group home and found the process to be VERY helpful when selecting between two very good candidates. And, having had an excellent track record of hiring folks who have gone on to management positions and who have done very well in the job they were hiring them for it seems that there is something to an interview if the folks doing the interview know what the hell they are looking for.

Lewis Powell

How safe and supportive do you think the person who got this offer would find this blog post?

Marcus Arvan

Hi Lewis: This wasn't a job in philosophy (I wouldn't have shared it if it were!). I also don't see how a person could self-identify on the basis of the post.

Lewis Powell

I was assuming it was a different philosophy department (rather than a non-philosophy department) in part because it is very hard to make assessments of what features are relevant for tenure across different disciplines. For instance, in book disciplines like English and History, the publication record in journals is substantially less important for assessing candidates.

Robert Gressis

Hi Marcus,

Fascinating post, and response to comments. As someone who has been on hiring committees, I'm trying to figure out what to do with this information, though. I'd be interested in hearing what you think we can know about prospective candidates, given the tools that we typically have available to us (cover letters, CVs, writing samples, teaching evaluations, letters of teaching observations, letters of recommendation, # of publications, interviews, and presentations). Or do you perhaps have ideas for new assessment tools we can use?

Marcus Arvan

Hi Rob: Thanks for your comment, and sorry for taking so long to reply! In addition to Thanksgiving, it was my birthday this weekend--so I've been a bit occupied. :)

The way I understand the psych literature on selection, the best thing to do is to score different facets of candidates. Here is how this might go in philosophy.

First, all candidates might receive a research score--which might be determined by a (1) weighted formula counting # of publications and quality of venue, (2) numerical scores for recommendation letters, and (3) search committee scores having read writing samples.

Second, all candidates might receive a teaching score--which might be determined on the basis of student evaluations and faculty peer-evaluations.

Third, all candidates might receive a university service score--which might be determined on the basis of how many university activities they are involved in, as well as the level of involvement (i.e. organizing on-campus activities might be weighted more than simply participating).

Fourth, all candidates might receive a collegiality score on the basis of ratings data collected from current and former colleagues (i.e. "rate X's collegiality on a scale of 1-5").

Then add up the scores, and treat the candidate with best overall scores 1st, the candidate with second overall scores 2nd, etc.

Now, this algorithmic approach might seem absurd--to miss the "je ne sais quoi" of hiring--yet, again, contrary to intuition, decades of psych research indicates that it predicts success better than more subjective means.

Robert Gressis

Hi Marcus,

Thanks for your response. So, I have some more questions: it seems to me that one reason that number of publications and venue for publications are better indicators of a person's research ability than an interview is that they're less subject to bias (this is especially true of number of publications; arguably, it's not true of quality of publications, but I would resist that conclusion. I think we can be reasonably confident of the relative quality of journals, at least in some cases, but maybe I'm being naive here). But it also seems to me that a professor's letter of recommendation for her student is also quite subject to bias. Sure, the professor has had lots of interactions with the student, but these can be subject to their own kind of bias: a person can very quickly form a narrative of a person based on just a few interactions, and then only see the data that confirms that narrative and miss the data that disconfirms it. Is it your view that what I've just said is generally false, or that what I've said is true, but that letters of recommendation nonetheless have significant value, or that what I've said is true, and that letters of recommendation have only very little value.

So, imagine you're trying to weight the research value of a candidate. You can point to three things: (1) # of publications and quality of venue; (2) that candidate's advisors' assessments of the quality of her work; and (3) your own assessment of the quality of her work, based on the writing sample she provides for you. How much weight would you give to (1), (2), and (3)? 85%, 10%, and 5%, respectively? Or something else?

I have more questions, but I'll stop with that.

Marcus Arvan

Hi Rob: Thanks for your reply!

I think you're right to be skeptical about letters of reference. There are so many ways for bias to creep into letters (gender bias, personality bias, etc)--not to mention obvious conflicts of interest (letter-writers have self-interested reasons to provide inflated recommendations, so as to secure jobs for their department's candidates!).

So, I would say, letters of recommendation should be sharply discounted in a weighted measure of candidates. In fact, I'm one of those (and I'm not alone!) who think letters should be done away with altogether. Personally, I think letters are a pernicious anachronism--a harmful remnant of the medieval practices of patronage (where one had to satisfy one's patron in order to keep getting work). People should be judged on the basis of their work, not the opinions of a handful of people (who, let's face it, may or may not--for many reasons--have a suitably impartial view of the person's abilities).

In any case, when it comes to weighted averages, I think--for obvious reasons--that the most objective measure of research quality is the peer-review process. While imperfect (what isn't?), peer review has many procedures in place to prevent bias as much as possible. Second to that, I would say, are each person's judgments on the search committee of the writing sample. So, if it were me, I'd weight publication record something like 70-80% and the average committee member's judgment of the quality of writing sample something like 20-30%, and rank applicant research quality on those grounds alone.

Robert Gressis

Hi Marcus,

Do you think the same considerations that tell against LOR also tell against teaching observations? First, teaching observations examine the candidate on only one day; second, the candidate and the students might take the class more seriously than usual, given the occasion; third, there are at least some occasions where the writer of the letter knows that the candidate is going out on the market, and so feels pressure to inflate. Would you also ignore teaching letters as well as LOR?

In addition, what about statements of teaching philosophy? It's all well and good to say "here's my philosophy of teaching, and here's what I do", but do we actually know that the person does what she says she does, or, assuming she does it, does it well?

Long story short: besides # of peer-reviewed publications and teaching evaluations, is there anything that committees should use to assess the scholarly and pedagogical quality of a candidate?

(Frankly, I'm not sure why we should trust our own assessments of a candidate's work, unless we're experts; and even if we're experts, we might have our own biases -- biases against people who don't take the positions we take, or who work with people we don't like, etc.)

Robert Gressis

Oh, one other thing. According to this article (I don't know how to hyperlink, so ...
http://www.nytimes.com/roomfordebate/2012/09/17/professors-and-the-students-who-grade-them/students-confuse-grades-with-long-term-learning), good teaching evaluations can often be inversely correlated to deep learning. Ugh.

Marcus Arvan

Hi Rob: I do think the same considerations speak against teaching observations. I think it is better to judge people on a large body of work--i.e. their teaching record.

On teaching statements, I think the important thing is to determine whether the person puts their philosophy into practice. For instance, I not only claim to be a demanding teacher--my students regularly make comments to that effect in their evaluations (viz. "hardest grader I've ever had").

Finally, I think skepticism about teaching evaluations is overblown. What the research shows is that *numerical* scores can be inflated by things unrelated to good teaching (e.g. being an easy grader, being entertaining, etc.). The empirical research also shows that demanding teachers can be punished for being demanding.

However, despite all of this, there is a way to cross-check whether numerical scores are actually based on (1) bad teaching, or (2) good teaching. Namely: the substance of student comments. Allow me to explain.

Consider on the one hand a candidate who has high scored but whose student evaluations say, "Easy!", "Entertaining", etc. One has reason to believe that this person's high scores are the result of them being a poor teacher who just tries to make students happy.

Now consider a teacher who has high scores *and* whose student comments suggest a very different picture (viz. "Hardest professor I've ever had...but SO worth the challenge!", "Daily homeworks were a pain in the ass, but really challenged me to read carefully", etc.).

These types of comments are evidence that the instructor isn't just coasting by on being nice, entertaining, etc.--but is instead getting high marks despite doing things (being a demanding grader) that tend to lead to low marks with typical teachers.

I say: while teaching evaluations can be misleading, looked at carefully--taking into account student comments--can give a pretty good picture of what the teacher is really like.

Robert Gressis

Hi Marcus,

But the study I linked to (I realize the link is broken; here it is again:



said that the faculty who provide the most deep learning for students tend to get lower reviews than the faculty who grade easier. Sure, it's nice when there are stellar teachers like yourself, who get ultra-high numerical scores while being incredibly demanding, but I think that people like you are few and far between. If the study I linked to is right, then many teachers who teach really well are also ones who don't come off that great at first.

Marcus Arvan

Hi Rob: Good point--but there are ways to measure deep learning. Indeed, I think this might be one of those areas where the push for "outcomes assessment" may be helpful.

In my department, part of our annual measure of outcomes assessment involves each instructor in the department giving the same (rather difficult!) multiple-choice test of comprehension of philosophical ideas and arguments to our students. Furthermore, although we don't measure this, many of us tend to have the same students in our classes semester after semester--and given the differences in faculty specialization, they're often very different students for different faculty (I, for instance, have some majors that have taken >7 of my courses that have only taken one or two courses from other instructors--and other instructors have majors who I've never encountered repeatedly take their classes). By measuring outcomes longitudinally across different instructors, it may be possible to devise a pretty darn objective measure of which instructors are really improving student learning more than others. For what it is worth, our annual assessments strongly suggest just this. And so I would suggest that in addition to using teaching evaluations--weighting not only quantitative scores but also student-comments--search committees might request some form of longitudinal data demonstrating sustained, deep student learning.

Obviously, this would put a lot more work on the shoulders of candidates to gather such data--but if we really want to hire the best people, I think this may be the way to go. What could be a better measure of teaching quality than demonstrable longitudinal improvements in student comprehension of complex philosophical ideas and arguments combined with other measures?

Robert Gressis

That would be a good thing to do; our department collects longitudinal data as well. The problem, though, is that for the longitudinal data to be most useful, it requires something like what your department does, and which my department does not do -- giving the same multiple choice test to all the students year after year (although, by the fourth time the student has taken the test, surely some of her gain is due to simply being familiar with the test rather than learning). However, not all departments do this, so right now, unless you know a fair bit about how a department does things, you just don't know how informative its teaching evaluations are.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Comments are moderated, and will not appear until the author has approved them.

Your Information

(Name and email address are required. Email address will not be displayed with the comment.)

Job-market reporting thread

Current Job-Market Discussion Thread

Philosophers in Industry Directory

Open thread on hiring timelines

Cocoon Job-Market Mentoring Program