Our books






Become a Fan

« CFP: 4th Annual Philosophers' Cocoon Philosophy Conference | Main | Making the first day a great day »

01/20/2016

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Shen-yi Liao

I am very sympathetic to much of this post.

However, I also remember a past discussion in which you seem to be arguing the opposite position with respect to using journal publications as an imperfect predictor. See http://philosopherscocoon.typepad.com/blog/2014/04/on-assigning-numerical-values-to-publications-based-on-venue-or-how-not-to-counteract-bias.html Do you think some of the worries you raised there would apply to the process you describe here?

On a different note, I'd also support a broader conception of 'future success' that is primarily about the candidate, but about the contribution a candidate can make to a scientific community (at the university, in their field, etc.). That is, I think calling for more algorithmic methods is compatible with also taking a more social view of academic progress, in which candidates are not only measured by their success, but what they can contribute to others' successes.

Marcus Arvan

Hi Shen-yi: Thanks so much for your comment, and for drawing attention to my earlier worries!

Indeed, I do think those worries still apply, not just with publication rates but also with respect to teaching evaluations (given recent studies showing gender-bias effects in student evaluations).

However, I am optimistic that the types of gender/race/ethnicity biases that might be at play in publication, teaching evals, etc., might be statistically counteracted in an algorithmic process. For one, once we know the statistics about bias in student evaluations, search committees could implement an algorithmic "normalization" procedure where candidate evaluation averages are corrected for bias (according to the best empirical measures of those biases). In turn, I think a similar procedure could be applied to publication rates.

Although it would take some careful science to normalize things properly, the empirical literature (in my view) tells us this is the way to go--and, at any rate, more objective measures in general appear to be (given our overall evidence) to be *less-biased*, better predictors than individuals or committees, where biases may go totally unchecked/uncontrolled!

recent grad

It would be interesting for a department to run the two processes simultaneously to see how they match up. (This would be easier, perhaps, with a more narrow search.) They could have their applicants code their CVs in certain ways and hire a student to enter it into a system. They'd have to decide ahead of time what to do with any divergent result--e.g., they might ignore the results of one process or use it only as a way to take a closer look at a few overlooked candidates--but it would at least be interesting to hear what happens.

BTW, thanks for the post. I think I'm in agreement with most of it. It frustrates me that Wood, et al seem not to seriously entertain the idea of division of expertise or worries about individual blind spots.

Derek Bowman

Marcus,

What is it about this research that would plausibly confine the application of its results to the hiring process? Wouldn't the same mechanisms lead a carefully designed algorithmic method to outperform individual human judgment on the selection of a spouse or of one's friends? Why, given the limits of individual human cognition, don't we offload more of our important decisions on algorithms? In democratic contexts, wouldn't we be better off divising an algorithm for choosing elected officials?

You do seem to confirm one worry I've had when you've made other, briefer, references to this research: it appears that the result is that if we define "success" in ways that can be easily mathematically measured (e.g. numbers of publications), then algorithmic methods outperform human judgment. But if - given the problems with peer review - we're not sure that future numbers of publications is a good measure of success, don't we have reason to be doubtful about the predictive power of such algorithms over harder to quantify notions of success?
(see, e.g. your recent discussions here and here http://philosopherscocoon.typepad.com/blog/2015/09/stanley-on-peer-review.html
http://philosopherscocoon.typepad.com/blog/2015/09/small-potatoes-and-big-potatoes-should-we-have-different-reviewing-standards-for-each.html )

This question is even more pressing with respect to evaluating candidates for teaching jobs, given what appears to be mounting empirical evidence that student evaluations are both unreliable and systematically biased along race and gender lines. Yet what other quantifiable measure is available for generating an algorithmic decision procedure that weights teaching ability?

Perhaps these are simply naive questions that reflect my lack of familiarity with the relevant research, in which case I hope you'll be able to set me straight.

Marcus Arvan

Hi Derek: Thanks so much for your comment! You raise a bunch of good questions. Here are my thoughts in reply.

Your first three questions were: "What is it about this research that would plausibly confine the application of its results to the hiring process? Wouldn't the same mechanisms lead a carefully designed algorithmic method to outperform individual human judgment on the selection of a spouse or of one's friends? Why, given the limits of individual human cognition, don't we offload more of our important decisions on algorithms?"

My reply to both of them is simple: we increasingly *are* offloading such decisions to algorithms in ever-greater areas of human life, including the very areas you mention (dating, friendships, etc.).

First, dating websites are increasingly popular, in part because many of them use algorithms to match people--more effectively so that old-fashioned dating. (see e.g. http://www.nytimes.com/2015/02/17/science/the-science-of-finding-romance-online.html ). Similarly, facebook uses complex algorithms to bring new "friends" together--algorithms that determine what content people are presented with on facebook in response to their web behavior. This content in turn brings people with similar interests, personalities, and propensities together, creating online friendships. I, for one, in recent years have met far more people I "get along with" on social media than anywhere else. So, indeed, it seems, we are increasingly learning the predictive value of algorithms in most, if not all, areas of human life, including dating, marriage, friendships, etc.

Your next question is: "In democratic contexts, wouldn't we be better off divising an algorithm for choosing elected officials?"

My reply is two-fold: First, there are already philosophers (such as Alex Guerrero) who defend something not too far away from this. Guerrero defends a "lottocratic" system, but I don't think a more algorithmic system is too far away from that. And indeed, I think something closer to what you are suggesting already takes place in China, where people work their way up the political ladder by performance evaluations, many of which, I gather, are highly quantitative in nature. Second, I think democracy is a unique case, as, if there is any good justification at all for democracy, the best justification is a *moral* one (i.e. it is intrinsically just). If this is right, then although an algorithmic procedure might indeed do a better job of electing people (in terms of results), there still might be sufficient moral reasons to prefer democracy.

In terms of the concerns I have raised about peer-review before, I don't think it's a perfect system, by any means. I have made my concerns about it pretty clear. Still, for all that, (A) it is the system that in fact determines professional success as a researcher, and (B) empirical psychology still indicates that imperfect algorithms outperform individual judgers or committees. In other words, everything I have said here is consistent with the notion that peer-review is flawed, and should be improved. I would say: (1) it should be improved, but (2) empirical psychology still suggests its outputs are more likely to be better predictors of future success than individual judgers.

Finally, you are right that these issues are all the more pressing when it comes to teaching. However, I have some ideas here.

First, recent research indicates student evaluations *are* (weakly) related to teaching equality, but also biased against women and minorities (https://www.insidehighered.com/news/2016/01/11/new-analysis-offers-more-evidence-against-student-evaluations-teaching ). So, I think statistically renormalized teaching evaluation scores (correcting for racial, ethnic, and gender biases) are likely to have some predictive value.

Second, to address your main question ("what other quantifiable measure is available for generating an algorithmic decision procedure that weights teaching ability?"), there is actually a very good, and if I recall, empirically validated, alternative: namely, certain methods of assessment of in academic departments that measure how (A) a given instructor's students perform in (B) later courses with different instructors. If, for instance, my students tend to go on to perform better on exams, philosophy papers, and final grades in other philosophy courses (compared to, say, departmental averages in my department), those are grounds for thinking that my students have *learned* more from me than from the typical instructor in my department. This is, as I understand it, an emerging area of academic assessment--and I think it is probably precisely the direction we should go. If anything would be reliable empirical evidence of teaching quality, it is longitudinal comparisons of an instructor's students in future courses compared to other instructors.

Marcus Arvan

Hi recent grad: thanks for your comment! I agree it would be interesting to run the two methods simultaneously. It would also be good to study (as I-O psychologists do study) the post-hire performance of candidates selected through both methods.

postdoc

II actually applied algorithmic methods for selecting my spouse. When we first starting dating I liked her all right but wasn't blown away. I noticed though that I tended to fall for women who in the end just make me miserable. This was probably due to issues in my childhood. So, I devised an algorithm for things I wanted in a mate (for example, I quantified how trusting she was using various metrics). And the girl in question scored very highly. So, I stuck with her. We're now married and madly in love and it's a great relationship. If I had gone with my gut, I probably wouldn't be in a happy and loving relationship. I would have continued selecting women who were not compatible with me.

Derek Bowman

Marcus,

Thanks for the detailed reply.

On Facebook and online dating:
These are examples of using algorithms to make suggestions, not decisions. Presumably you don't empower Facebook to choose your "friends" for you - adding all and only "friends" suggested by Facebook. ("Hey, why won't you friend me on Facebook?" "Sorry, you haven't been suggested to me by Facebook's algorithm.") And I also presume that you don't rely on Facebook to tell you who to be actual real life friends with.

In the online dating experiences I'm familiar with, you don't rely on an algorithm to determine (a) whether your date went well; (b) whether to have a second date; (c) whether and when to get married; (d) whether and when to get divorced.

On democracy:
Nothing in the moral argument for democracy blocks the inference - which perhaps you're ready to endorse - that each voter should defer to an established algorithmic procedure in selecting how to cast her vote.

On peer review:
Am I right that the kind of "future success" that has been empirically verified is just "future number of publications"? But that only measures success - at being a good philosopher and researcher - if you think that number of publications is in itself a measure of, or reliable proxy for, philosophical success.

But that last clause is the very assumption I was calling into question. It appears to me that you face a bit of a dilemma at this point. Either you think that quantitative peer-review success in an imperfect but better measure of philosophical quality than the individual judgments of philosophical experts or you don't.

If you don't, then it's unclear how the empirical support for algorithmic decision making is applicable to this case.

But if you do, then why shouldn't you defer to that system in determining what counts as good philosophy? Shouldn't the inference be "Well, I thought groundbreaking big-idea work was philosophically more important, but the more-reliable-than-me system of peer review says otherwise, so I guess nuts-and-bolts research programs are generally philosophically better."

Trevor Hedberg

One assumption of this discussion appears to be that these job searches are designed with the aim of hiring the "best" candidate. I'm not sure that's what they are really designed to do. I think they're aiming to find someone who's "good enough" at doing the job and also a worthwhile colleague. When you hire someone, you aren't just hiring someone to do a job: you are also hiring someone to be an office colleague. You will share office space with this person, have conversations with them about teaching, research, etc., suffer through administrative meetings with them, and perhaps even share the occasional meal or cup of coffee. Thus, it's probably important to search committees that they hire someone who will not only do the job well but will also be a colleague that they will enjoy interacting with in these various ways. If this is the real aim, then one's qualifications and likely effectiveness at doing their job is just a threshold that one must pass, and having the best credentials is not sufficient for getting the job.

There are no doubt disadvantages to having "likability" or "collegiality" (or whatever we elect to call it) play a role in the hiring process. Perhaps the most obvious is that it's tough to make holistic and accurate judgments about a person after interacting with them for only 1-2 days. (I don't think you could glean anything about this with Skype interviews or the like, since the interactions are so short and contrived.) But my suspicion is that those who value the "human" element of the hiring process value it because they want some reassurance that they are hiring someone that they will like and not someone who, according to an algorithmic process, is likely the best researcher and teacher but could also be a belligerent and insufferable person.

The posts on this blog have made me aware of a myriad of problems with the conventional hiring process, but I have to admit that I'm torn on this particular issue. I'm not sure whether (1) collegiality should play a role in hiring decisions or (2) collegiality should be primarily measured by how prospective candidates interact with members of the search committee during their campus visit. Regarding (2), if collegiality should play a role in the hiring process, are there any other ways to measure it besides the campus visit? I struggle to see how that feature of a person would be detectable through the materials in a standard application dossier.

Lovely Colleague

Trevor,
Your point is no small matter. Indeed, I suspect that if you have a decent relationship with your colleagues - you can have a coffee together, or a meal, and be civil with each other, even enjoy each others' company - then when you have to disagree with them over some important college related issue, it will be easier to do it and still get along afterwards. And inevitably in 20+ years of a career together, disagreements will arise. Collegiality is a very important quality.

BLS

I am truly puzzled by this discussion. Wood is talking about nothing other than philosophical judgment.

Is there an objective measure of good philosophy?

Wood is only saying does not regard venue as an objective measure alone. Rather, he utilizes his own judgment in determining how good the argument is.

Is this honestly a job we can farm out to 'experts in the field'? That seems very strange. If I read an argument in a prestigious journal that seems terrible, what am I to do? If I read an unpublished piece or a piece in a lower tier journal that seems amazing, what am I to do? Your idea seems that I should farm out my judgment to some objective metric.

The selection of each feature of the metric also involves judgment. What do I rely on for those features? Some other metric. Who gets to decide that this metric is the measure of what is value? Other people apparently. What if I have good reasons to think they are wrong?

Hiring someone whose CV impresses you because of venue is fine--that's what mostly happens anyway.

Philosophy is not at all unique in this way. A curator for a museum show will not farm out the job of deciding which art is good. A casting director for a Broadway show will pick a draw--but also someone they think will do well in the performance. A principal may consider a teacher's rapport with students or the quality of their ideas on how to motivate students.

The similarity here is that there is no objective measure for 'good' used in almost any field. We want someone good.. To decide what is good, sports has data, sales have data, perhaps even farming has data. Philosophy does not have such data--and the creation of a metric already builds in an idea of your goals and what is good. A principal could pick the teacher whose students do best on standardized tests--but perhaps she thinks there is more to teaching than this. Some want to lock in judgment to quantifiable metrics in every domain--and we see the problem this creates in K-12 education where a particular power structure can then dictate our goals. In the case of philosophy, subsuming individual judgment to a metric would also lock us in to a value system that we might want to dissent from.

Like Wood, I don't want to subsume my judgment to the profession as a whole. Maybe it's self indulgent in some way. Perhaps it is some 'whim' but how can I do otherwise? It goes to the heart of what I think I am doing all the time. If I were to subsume my judgment about hiring in the way suggested, there are certain universities I would have to hire from exclusively. I'd have to prefer the fields and ideas that seem to be in vogue, etc. I'd have to look to how people get the prestige publications--they come from certain universities and they take up certain questions--and if everyone does this we'd have even less intellectual diversity than we have now. My philosophical judgment tells me that these fields and ideas are not the bottom line on good philosophy (even if some of it is very good). I try not to be hard-headed about it--I don't think I know it all. Rather, I listen to are my colleagues and try to take their ideas seriously.

Bob

Like everyone, I agree that there are serious problems in the hiring process. I think incorporating objective measures, in some ways, might be helpful. However, I would be extremely cautious about going too far with this.

Algorithms are designed by people, and therefore can come along with some of the same biases as people making decisions themselves.

Citing studies and pointing to empirical measures is an incredibly abused academic practice. We are asked to take a study as evidence, because it is 'empirical'. When doing this we seem to forget that many studies are unreliable. In fact, a study on psychological studies recently demonstrated the majority are unreliable:

http://www.theguardian.com/science/2015/aug/27/study-delivers-bleak-verdict-on-validity-of-psychology-experiment-results

Lastly, I see nothing wrong with Wood's stance. When he reviews a published paper, it may have been published on the good word of a single philosophers. The suggestion that Wood wants to replace the peer review system with himself suggests that a published paper has been reviewed by hundreds of persons in the profession. But this is unlikely. Papers are often published with the approval of only 1 or 2 philosophers. I think Wood can reasonably suspect he might have better judgment than the 1 or 2 philosophers who approved the relevant publication.

Marcus Arvan

Hi Trevor: Thanks so much for your comment and questions!

Like "Lovely Colleague", I think collegiality is absolutely crucial--and probably one of the things job-candidates may fail to adequately appreciate. I also don't disagree with you that campus visits may be (somewhat) helpful in this regard--though I will voice some reasons for healthy skepticism shortly. Indeed, let me be clear about what I am and am not suggesting. I am *not* suggesting that the entire hiring process be automated, and people hired "sight unseen." I don't know many, or indeed any, empirical researchers in I-O psychology who think that. Of course one ultimately should meet a candidate in person. What empirical research shows, however, is that (A) selections procedures should be *more* algorithmic than they currently are, especially when it comes to (B) whittling down candidates to a group of finalists. This is all I am suggesting--that instead of search committee members going through the usual process of subjectively analyzing dossiers, and carrying out preliminary interviews (which are notoriously non-predictive), the initial narrowing of a candidate pool should to a short list of finalists should be more regimented, following an algorithmic process of "counting publications" (weighted, say, by journal stature), a normalized weighting of teaching evaluations (counteracting gender/race/ethnic bias), etc. One could also (potentially) "score" letters of recommendation for positive, neutral, and negative remarks regarding collegiality.

This is the kind of thing I-O Psychologists use empirical results to advocate, and it is the kind of thing that major corporations, government organizations, etc., are increasingly using to very good effect.

Also, on the collegiality issue. It is notoriously difficult to measure, and *interviews* in particular (as I note in the OP) have been found consistently *unable* to track important elements of collegiality--in particular, deception. After all, here's the thing about interviews and campus visits: they are highly artificial situations where a person "puts on a good face", which may not be at all "who they are" on a day-to-day basis. And indeed, I know of horrible instances where search committees misjudged candidates. Just the other year, I was at a conference talking about collegiality issues, and one person who had been on a search committee a few years prior told me they had hired someone who said and did "all of the right things" on the campus visit. Yet, when that person was actually hired into the job, it clear that they didn't want to be at the university, and were just looking at it as stepping-stone to a better, research-oriented department. The person ultimately left the position, and the department lost the tenure-track line--an absolute disaster for the department.

So, to address your question, how else can collegiality be measured? Actually, there are a number of ways. Indeed, there are empirically confirmed correlates of bad work-behavior. In particular, although extroverts tend to come across very charming in interviews, some extroverts--dishonest, narcissistic extroverts--can "put on a good show" in an interview but appear to be among the worst people of all to work with (https://www.researchgate.net/profile/In-Sue_Oh/publication/255948921_Are_Dishonest_Extraverts_More_Harmful_than_Dishonest_Introverts_The_Interaction_Effects_of_Honesty-Humility_and_Extraversion_in_Predicting_Workplace_Deviance/links/09e41513f581d2a280000000.pdf ). Conversely, conscientiousness is one of the best psychological predictors of career success (and indeed, as someone who works with colleagues, I can say: day-to-day conscientiousness is a very important component of "collegiality").

And yes, there are ways to measure these things. It's what I-O Psychologists do with corporations and the government. I suspect many people will recoil instinctually at the notion of using personality tests--yet they are very predictive, especially when combined with more subjective means (e.g. meeting a person).

Finally, although you may be right that search committees are not necessarily looking to hire "the best candidate" (only someone "good enough"), my major point *wasn't* that we should always hire the best person. Rather, it was that algorithmic means are known to both (A) be more predictive than subjective judgment, and (B) less-biased (in terms of race, gender, etc., at least when normalized properly). Search committees do want to hire someone who will get tenure: people who will publish and get good teaching reviews. My claim--following decades of empirical research--is simply that, when it comes to those kinds of things, algorithmic processes are known to generally less-biased and more predictive (things that, in my view, any good search committee wanting to make a successful hire) should value.

Marcus Arvan

Hi Derek: Thanks for your reply, and follow-up questions.

You write: "These are examples of using algorithms to make suggestions, not decisions. Presumably you don't empower Facebook to choose your "friends" for you - adding all and only "friends" suggested by Facebook."

As you can see in my reply to Trevor, I think you are absolutely right. My claim is not that the entire hiring process should be automated--that we should merely use algorithms to hire people. I don't know any I-O Psychologist who would advocate that. What I am claiming is that the empirical research indicates that we should use algorithmic means in early stages of selection to whittle down candidates to semi-finalists or finalists. That's all. This is (roughly) what online dating sites do, as well as (sort of) what facebook does.

Next, you write: "On democracy: Nothing in the moral argument for democracy blocks the inference - which perhaps you're ready to endorse - that each voter should defer to an established algorithmic procedure in selecting how to cast her vote."

You're right! Nothing blocks that conclusion--but it's also an empirical question I don't think we have right to *decide* by fiat. Real, live human beings are (as we all know) pretty terrible at selecting good rulers. I, for one, would not be surprised at all if empirical science--at some point in the near or distant future--became able to predict "good ruling/political abilities" better than common citizens. Indeed, some already argue that good US Presidents tend to have certain traits, and bad ones other traits. I, for one, would *hope* that science might improve our ability to select political representatives, better than the totally haphazard process we now have!

You write: "Am I right that the kind of "future success" that has been empirically verified is just "future number of publications"? But that only measures success - at being a good philosopher and researcher - if you think that number of publications is in itself a measure of, or reliable proxy for, philosophical success."

Indeed, this is a very difficult question. However, first, many/most people in the discipline *do* think journals publications are (relatively) good proxies of philosophical value/contributions. Secondly, since we are talking about hiring, one needs to ask what kind of measure of success committees are looking for. Here, I think, is one thing most, if not all, committees are looking for: hiring someone likely to get tenure. Here's another, at least research departments: they want to hire someone who will be influential in the field. Well, journal publications are surely good proxies of both. Publications (including quality of value) are large determinants of tenure-decisions, and of influence in the discipline.

I'm not saying these are the "right" measures of success. But, actually, this isn't the point. The point is that the empirical research on selection shows that, in general, *whatever* measure of success one adopts, algorithmic measures tend to better predict it. Again, look at the meta-study above. It shows that, across a vast variety of predictands, algorithmic methods work as well or better than subjective judgment in terms of making reliable predictions.

Marcus Arvan

Hi Bob: The concerns you raise about "psychological studies" (citing The Guardian) do not apply, to the best of my knowledge, to this particular area of I-O Psychology. The superiority of algorithmic selection-processes is, in my understanding, one of the single most robust, consistent findings in the field--one supported by meta-study after meta-study (the best available measure of whether a finding is a statistical artifact--though, if I am wrong, I am happy to be corrected!). As my spouse tells me, there are few things of consensus in her discipline--but this is one of them.

Also, your point about Wood misunderstands the empirical claim here. Someone like Wood may, indeed, be capable of judging the quality of *one paper* better than a distributed network of experts (and indeed, as you note, that one paper may only have been vetted by one or two reviewers not as reliable as Wood). But, this is the wonderful thing about statistics. Statistics don't deal with isolated, one-off cases--and nor (for the most part) do career advancement and success (such as tenure). Statistics deal with large numbers of cases--and what the statistics tell us is that if you want to predict someone's success (on any given predictand), you are better off *not* appealing to your own judgment on a single (which may be accurate *in that single case*), but rather, by appealing to algorithmic measures of multiple cases. Allow me to explain how this is relevant.

As we all know, sometimes flukes happen. A person can come up with one great idea--a great paper--but then have trouble coming up with more good ideas. I've seen cases of this--cases where someone publishes a paper in a high-ranking journal, but they never do it again. Some of these people don't get tenure, because they "never lived up to their promise." What statistics do is better enable us to predict the future. Empirical psychology says: if you want to know whether someone is likely to accomplish X, look how how many times they have accomplished X in the past. Someone who has multiple publications in top-ranked journals will be more likely to continue doing, statistically speaking, than the "brilliant" person with one great paper. Again, I've seen cases of this. In another academic field at another university (that I won't mention), an "obviously brilliant" person who blew the committee away was hired over a "less brilliant" candidate who had an incredible publication record. The "more brilliant" person who was hired went on to publish little, and the less brilliant person continued publishing article after article in top journals. Hence, the fallibility of human reasoning: we have all kinds of biases, including salience biases where we are apt to pay attention to *one* piece of work (or how smart someone is on their feet) over a clear, objective record of success. The empirical research shows that the objective record is *better* at predicting the future. This is why every major league baseball team has now adopted this "moneyball" approach, and why an increasing number of industries are starting use more rigorous, less-subjective selection methods. It works.

Finally, you are certainly right than an algorithm is only as good as the person whose judgment designs it. But this is my point! Instead of haphazardly selecting candidates through methods that are known to be poorly predictive, we should study and implement the use of algorithms that do predict future success. Further, and this is the critical thing, subjective judgments are *worse* than algorithms. Yes, an algorithm is only as good as the person who makes it, but whoever that person is, they are better off using an algorithm than their own judgment--or so the science suggests.

Bob

Thanks for the reply, Marcus. I have one follow-up question. Wood said that he would read the writing sample, and use his own judgment. You point out ( I think) that although Wood might have better judgement on that sample, a series of publications is a better judgement of quality than Wood's single reading of one paper. I get that.

But the question, with Wood, I think, is what to do with the writing sample? Are you suggesting we do away with writing samples? It seems like you think it is better just to look at the CV and judge writing quality by the list of publications. I just did not realize people were suggesting we do away with the writing sample. The writing sample seems to be the part of job dossier with which people have the least amount of criticism. But if we are going to have a writing sample, than there is nothing wrong with a search committee member using his or her own judgement regarding that single paper.

39th most influential externalist 1993-7

The major objection to this seems to be in the definition of success: what a writing sample or a committee's judgment of a paper's value distinct from its publication venue can contribute is precisely a choice as to what qualities in a philosopher should be valued. And conceptions of success that differ from the status quo's seem impossible to put into institutional practice without making space for subjective judgment.

Obvious example based on the kind of thing Marcus has discussed many times before on this site - an algorithm can't capture the fact that philosopher A's two 15-thousand word publications offer new perspectives on wide fields of inquiry, while philosopher B's 5 papers, each in slightly swankier journals, are all of the 'me-too' margin-fiddling variety. We know that philosopher B is more "successful" by the reward-standards of the field as a whole, and will win under any algorithmic assessment, but either I personally or my department as a whole might attach higher value to the kind of work A does. We might reject the conception of success that defines the field as a whole, and we might want, by giving A-type philosophers a chance over B-type philosophers, to change the field.

In this case, the algorithmic approach seems intrinsically conservative, and not to leave much room for alternative conceptions of philosophical quality or success to take root.

Pendaran Roberts

I think the distinction between 'margin fiddling' and 'new perspectives on wide fields of inquiry' is probably difficult to discern in practice.

What looks like a large Magnus Opus may turn out to have no effect on the discipline. What looks like margin fiddling could have profound implications. Look at Gettier! Was that not margin fiddling? A short paper providing some counterexamples to a theory of knowledge?

I wouldn't be so quick to think you can discern the difference.

A smart algorithm that looks at number of publications, journal prestige, length of publications, and publishing rate is most definitely going to do way better on average than any search committee. All the scientific research predicts that this is true and it also just seems obvious!

So, yes I would advocate throwing out writing samples. We also need to get rid of recommendations. Am I not right that the science shows these are useless? I suspect the science suggests applying an algorithm to the CV is the best hiring practice, and that not even interviews are effective.

I am not sure how to measure teaching. Perhaps just count classes taught for now. Or just look at whether they've taught the things you want. From what I've read student evaluations don't measure teaching effectiveness.

From http://www.npr.org/sections/ed/2014/09/26/345515451/student-course-evaluations-get-an-f

The paper compared the student evaluations of a particular professor to another measure of teacher quality: how those students performed in a subsequent course. In other words, if I have Dr. Muccio in Microeconomics I, what's my grade next year in Macroeconomics II?

Here's what he found. The better the professors were, as measured by their students' grades in later classes, the lower their ratings from students.

"If you make your students do well in their academic career, you get worse evaluations from your students," Pellizzari said. Students, by and large, don't enjoy learning from a taskmaster, even if it does them some good.

Lauren

The suggestion that departments should focus on publications (nearly exclusively, it seems, at least in the initial stages) seems to apply largely to departments who have one thing they want: someone who publishes a lot of articles. And if that's the only thing a department wants, then perhaps this makes sense. But I suspect there are very few (if any) departments for whom this is the case; in addition to publishing a lot, I know my R1 institution wants someone whose research fits in well with existing faculty (doesn't duplicate, but enhances, and existing faculty find interesting), someone who will engage students well, someone who gets along and will be a team player in the department, someone who can teach classes no one right now wants to teach, etc. And while there may be empirical measures for some of these things, for some of these there aren't (e.g., how interesting the research is to the existing faculty members). But the more difficult thing is that even if search committees could all agree on the factors that are important, I just don't think you could convince most search committees to agree on precise weighting, which it seems the empirical argument requires.

Although I will stick up for teaching evaluations a bit more than the previous commenter: I've read a lot of the teaching research, and well-designed teaching evaluations are actually (mostly) correlated with teaching effectiveness (contra the NPR article, which doesn't consider meta-analyses on the subject). There are some factors that influence student evaluations, but for the most part, we know what those are (and possibly could adjust for them).

Marcus Arvan

Hi Lauren: thanks for weighing in! Do you have any links to those meta-analyses of student evaluations and teaching effectiveness? I recall coming across them before as well, but haven't been able to hunt them down. I think it would be good to present them here, as the public narrative that student evaluations don't track performance has really become dominant.

Fool

One interesting meta-analysis is by Linda Nilson, which suggests that the extent to which evaluations track effectiveness has significantly declined over time, to the point where TODAY they're basically useless.

Nilson, Linda B. "Time to raise questions about student ratings." To Improve the Academy: Resources for Faculty, Instructional and Organizational Development (2012): 213-228.

Pendaran Roberts

Here's a short abstract of Linda Nilson's position. https://listserv.nd.edu/cgi-bin/wa?A2=ind1306&L=POD&F=&S=&P=34370

Carrell and West’s article (2010) is just one of several recent studies showing that student ratings are no long positively related to student learning. Others include Johnson (2003), Marks, Fairris, & Beleche (2010), and Weinberg, Hashimoto, & Fleisher (2009). These studies find no relationship, a negative relationship, or a meaninglessly weak positive relationship between student ratings and learning. In his 2009 meta-analysis, Clayson (2009) could not find one study that was first published after 1990 (Feldman used data from the 1980s in his articles and chapters published as late as 2007) that documented a positive relationship between student learning and student ratings.

This lack of relationship is quite important because the link between students ratings and learning has been the basis of the validity and utility of ratings for decades. As Cohen (1981) very clearly stated: “It [teaching effectiveness] can be further operationalized as the amount students learn in a particular course. … [I]f student ratings are to have any utility in evaluating teaching, they must show at least a moderately strong relationship to this index.” At the time Cohen wrote this, there was such a relationship. But students’ values and motivations have changed since then, enough to undermine the relationship.

A couple of other compelling reasons exist for questioning the validity and utility of student ratings. They are more biased by extraneous variables than they used to be, and their connection to factual reality is suspect. For the more on this topic, including the relevant studies, please see my 2012 article: Time to Raise Questions about Student Ratings. Pp. 213-228 in To Improve the Academy: Resources for Faculty, Instructional, and Organizational Development, Vol. 31, edited by J. E. Groccia & L. Cruz. San Francisco: Jossey-Bass.

Pendaran Roberts

Part of Nilson's actual paper can be read here: https://books.google.co.uk/books?id=91XbztjZaUoC&pg=PT216&lpg=PT216&dq=%22time+to+raise+questions+about+student+ratings%22&source=bl&ots=T4aAkhrZOI&sig=Tt6Yc26Gi1w8KrfRK7hz7PmaR3c&hl=en&sa=X&redir_esc=y#v=onepage&q=%22time%20to%20raise%20questions%20about%20student%20ratings%22&f=false

Grad student

I found Wood's post helpful for this reason: Wood is honest. He gives you a picture of the kind of person you should expect to look at your file. For better or worse, these people are somewhat (justifiably?) arrogant. They will make decisions based on subjective judgments and intuitions. They think they have intuitions when it comes to who possesses that illusory intellectual "spark." Sure, this all seems off. Sure, perhaps it is the case that the peer-review system is all we have to defend against such subjective and unreliable judgments. But for the time being, these are the people who will look at our files. So, I think it is helpful to see how they think. And I also agree with some of what BLS said above. Since no one has addressed BLS's remarks, I'll quote them again. I'd be particularly interested in your view on BLS's thoughts, Marcus.

BLS says:

Wood is only saying does not regard venue as an objective measure alone. Rather, he utilizes his own judgment in determining how good the argument is.

Is this honestly a job we can farm out to 'experts in the field'? That seems very strange. If I read an argument in a prestigious journal that seems terrible, what am I to do? If I read an unpublished piece or a piece in a lower tier journal that seems amazing, what am I to do? Your idea seems that I should farm out my judgment to some objective metric.

The selection of each feature of the metric also involves judgment. What do I rely on for those features? Some other metric. Who gets to decide that this metric is the measure of what is value? Other people apparently. What if I have good reasons to think they are wrong?

Hiring someone whose CV impresses you because of venue is fine--that's what mostly happens anyway.

Philosophy is not at all unique in this way. A curator for a museum show will not farm out the job of deciding which art is good. A casting director for a Broadway show will pick a draw--but also someone they think will do well in the performance. A principal may consider a teacher's rapport with students or the quality of their ideas on how to motivate students.

Someone

Okay, so just to warn you, this is a bit of a bitter rant, but I guess my personal experience might be interesting to people in a similar position.

I finished my PhD a few months ago at a very well respected department, with four publications all in the best journals in my area, loads of good teaching experience and glowing letters of reference both from inside the department and from research collaborators at other universities, and I've now applied for something close to fifty jobs (research fellowships, tenure track jobs, lectureships, everything really, all around the world). I've spent an age working on my application materials, often using helpful advice from this site, and I've had just about zero success. I'm actually okay with this as a general fact. I can accept that there are loads of similarly qualified people, many of whom are probably smarter and more professional than me. The thing that gets me is that it is impossible to tell where you are going wrong. For example, there are a couple of prestigious research fellowships I didn't get (I'm fine with that) which went to people with no peer reviewed publications (I'm not fine with that). I simply do not understand how such a decision could possibly be justified. If you have two people at the same stage of their career, in the same area of research, applying for a research fellowship, surely you have to give the job to the one with the best research record! (most/best publications, conference presentations etc). Similarly, I am aware of people in the same research area as me with tenure track jobs at top research universities who have less publications than I do currently. So I guess publications aren't generally very decisive factors. I find this very concerning, since publishing in good journals is the only legitimate way to prove yourself as a researcher.

wise guy

Someone,
You should ask a senior person in your department or your sphere of influence to review your c.v, your application letter (a typical example), and any other supporting material that YOU send, and you control.
I reviewed someone's file a few years ago, and I was able to aid them in fixing some things that were not at all obvious to them. It is not worth going into the details, but candidates often present their material ineffectively, and include a lot of information that is irrelevant (and this suggests that the candidate either has bad judgment or is professionally immature). Given what else you say about yourself, you should not give up hope.

Someone

Hi Wise Guy. Thanks for the advice and encouragement. It's much appreciated.

Marcus Arvan

Hi Someone: I agree with Wise Guy's advice. However, I would also add that cases like yours are one of the (many) reasons I am pushing so hard for rethinking selection/hiring methods. It seems to me, from your self-description, that you should be getting interviews in spades on the basis of your record--and, I think the algorithmic approach I'm advocating would have that result (it would be far more meritocratic than the status quo).

wise guy

Someone,
If Marcus would be open to this, I welcome you forwarding your material to him - to be forwarded to me - and then I would give you feedback. I do not want to give my e-mail on the blog.
Many people of my cohort were on the market 5 years before getting a Tenure Track job ... and many of us are full professors now. It was not easy. Indeed, getting tenure was easier.

Marcus Arvan

wise guy: I would absolutely be open to that, if "Someone" is interested! Someone: if you'd like me to forward your materials to wise guy for feedback, please email them to me and I will serve as the intermediary between you two...

Marcus Arvan

Oh, and I know it's easy enough to find, but my email is [email protected].

someone

Dear Marcus and wise guy,

I'm so sorry for the delayed response, and thanks so much for your kind offers of help. It's greatly appreciated. I will email Marcus now.

Analytic philosopher

Very thought-provoking post(s)! Thanks for pointing me to it(them).

My sense is that you're right, current mainstream hiring procedures in philosophy are amateur-ish and could use substantial overhauling. In addition to the problems you mention, see for instance:

https://qz.com/675152/here-are-google-amazon-and-facebooks-secrets-to-hiring-the-best-people/

and

https://qz.com/180247/why-google-doesnt-care-about-hiring-top-college-graduates/

Although I share your worries about some of the problems you mention (especially when it comes to amateur-ish use of interviews, the overvaluing of quick-thinking 'smartness', and I would say, even more importantly, the overvaluing of never admitting one is wrong, and the use of personal judgements to hire friends of friends with no significant publication record), and although I do think more reflection on hiring procedures is necessary, I am less worried about some other problems you mention and less convinced about your key positive proposals. A few points:

(1) Allen Wood's passage you start from ends with this admission "However, about this issue, I am in the minority.". If Wood is right, then (as I suspect he is), current mainstream hiring procedures are not as far from those you advocate as your piece suggests;

(2) although efficiency improvements are certainly possible, contemporary analytic philosophy doesn't strike me as having a problem in the quantity and perceived prestige of contributions being churned out, or in the correlation between people who publish in high-profile venues and people employed by high-profile departments. Your suggestion might improve on these factors, but things seem already pretty okay;

(3) where philosophy does have a big problem is in the actual quality of what gets published in supposedly prestigious places, in the poor correlation between philosophical quality and (quantity of publications and) perceived prestige of publishing venues and departments. I worry that the policies you advocate wouldn't go in the right direction...unless of course the algorithms were radically different from those you seem to assume, which seem to me to simply reflect the status quo;

(4) stricter procedures about who gets hired could be helpful for counteracting current cronyism practices. But, where search committees are not interested in hiring friends of friends and are really interested in hiring the best philosopher, your suggestions would not change contemporary analytic philosophy in as dramatic a fashion as is desirable. The peer-review-cum-commonly-accepted-journal-rankings in philosophy is not doing well. It had, it seems to me, some positive effects when peer-review publications started to be valued in the second half of the 20th century. Now however the system is too rigged to be very useful...if actual philosophical quality produced is the goal one wants the system to foster.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

Your Information

(Name and email address are required. Email address will not be displayed with the comment.)

Subscribe to the Cocoon

Current Job-Market Discussion Thread

Philosophers in Industry Directory

Categories

Subscribe to the Cocoon