Our books

Become a Fan

« Job Market Boot Camp, Part 7: Developing coherent research program(s) | Main | Grading need not be a necessary evil »



Feed You can follow this conversation by subscribing to the comment feed for this post.

Brian Weatherson

How are you grouping up people into the buckets? Is it by home department at time of publication (i.e., as reported in the journal) crossed with PGR rank at the time of publication? I remember for some projects it was a pain to find out just where every author was at at the time their article was *accepted*, but for this project it probably matters where they were at the time the article was published, and that's clearly printed on the article, which is nice.

And are you using Google Scholar citation counts? That's probably the easiest, though there is something to be said for using the Web of Science counts that Kieran Healy used?

Marcus Arvan

Hi Brian: That's right. I used Google Scholar for citation counts, and whenever possible, the home department at time of publication as reported in the journal. In cases where the journal did not list the author's affiliation at time of publication, I did my best to track it down.

Christopher Stephens


Thanks for this. It looks interesting. I wanted to ask about this claim of yours "In other words, it looks like if you are from a small college or foreign unranked university, almost no one will cite you even if you publish in Mind or Phil Review."

This doesn't follow from the fact that of those that received fewer than 10 citations, the majority were from small or foreign departments. We'd also need to know how many of those that get lots of citations are from small or foreign departments, right?

John Schwenkler

Hi Marcus,

Following up on Christopher Stephens's question, wouldn't we also need to know what overall percentage of the papers in your sample were written by authors from small/foreign institutions?

Also, I wonder if you tested for statistical significance in the trend you describe with your initial set of numbers. And even if the trend is significant, it seems misleading to summarize it by saying that "authors from Leiter top-10 departments were cited about 2x more often than people from lower-ranked departments", since in fact this isn't true at all for two of the comparison groups (11-20 and 50+), and we'd need to know the size of each group in order to make this overall claim, right?

Marcus Arvan

Hi Chris,

There were 49 articles in the 50+-ranking data set.

Of those, 15 were by authors from foreign/small colleges.

There were 12 articles in the full data set with <10 citations.

6 of those were foreign/small schools.


40% of all papers written by authors from small/foreign-unranked schools were cited <10 times.

Only 8% of papers written by authors from larger/non-foreign-unranked schools were cited <10 times.

Marcus Arvan

Hi John,

You're right. If we're looking at medians, it's more like 1.5-2x--so I've changed the post to indicate that, so as not to mislead.

I haven't had a chance to do significance testing yet, but hope to soon!


Can you give us the raw data, please?

Marcus Arvan

Brad: Here you go...https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnwyMDEzcGNwY3xneDozZjcyNDRiMTY3NmQ5MmVl



Thanks Marcus,
Just to clarify something on your data sheets, have coded all programs that are not ranked in the top 50 as 51?
By the way, that was generous of you to share.

Marcus Arvan

Hi Brad: Thanks! Yes, I wasn't quite sure how to code those given that they are "unranked", so I gave them all '51' for the sake of categorization.


one more question ... when I open the googledoc of the data, it seems you did not list the authors of the Mind papers. Is that correct? (though you do list the affiliation)

Marcus Arvan

Yes, I just started to code by article title, citations, and department rank after a while.

Shen-yi Liao

The hypothesis that "top journals" papers by people from "top schools" receive more citations is a very intriguing one. Thank you, Marcus, so much for sharing the data. That really allowed anyone who is interested in this topic, like me, to build on existing work and explore this question.

After some exploration, using a variety of frequentist statistical tests with conventional thresholds of significance, I did not find support for the hypothesis. The SPSS outputs and spreadsheets are available at https://www.dropbox.com/sh/yisnoy223uy33vt/AABVWQpqDAIfMftB_UeTYFC6a?dl=0 (I'll probably delete this in a couple of weeks, so interested parties should save the documents.)

To start, I was worried that the use of ordinal ranks from PGR might make some smaller differences bigger. So I added the mean evaluation alongside the ordinal ranks (and in the process noted a few corrections).

Then, I looked for correlations. As the scatterplot in the spreadsheet file shows, there isn't really clear correlational relationship between citation count and mean evaluations of institutions. Although I forced a linear trendline that skews positive, the correlational tests are not statistically significant show a relatively weak relationship anyway, rho = 0.141, p = 0.069.*

Next, I tried to compare groupings of institutions, as you did in the blogpost. Any grouping here is bound to be a bit arbitrary, so I tried two: {top 8 / other ranked / unranked} and {top 10 / other ranked / unranked}. Either way, the nonparametric test says that there is not enough evidence to reject the null hypothesis, which says that the distribution of citations is the same across the groupings, p = 0.234 and 0.116. Inspecting the bar graph, the null result is basically because the variance of citations is so high even within the "top schools" grouping.

All in all, I think the upshot is that we do not yet have evidence that "top journals" papers by people from "top schools" receive more citations. Reporting only mean and median here has the potential to mislead because the variance in citation rates is so high, such that we might eyeball some pattern in the means that fails to properly account for the high variance.


* In the documents, I report both parametric and non-parametric tests. I think we should be extremely skeptical of parametric tests because citation counts are certainly not normally distributed. So I'll only report non-parametric tests here.


Thanks Shen-yi,
These are very useful analyses. There may be more in the data still (as I am sure you know). It may, for example, be worth looking at sub-populations in the data, for example, all those papers cited 0 times (or under 4 times etc.). We may discover interesting things. Though at this point I have no hypothesis in mind.
Your analyses, though, caution us from drawing inferences that really are not supported by the data. Thanks again.

Marcus Arvan

Shen-yi: Thanks for your analysis. I'm a bit confused by your outputs, though. After correcting the three errors you found in my spreadsheet, I calculated Spearman's rho and the result came out statistically significant: r=.155, p=.047.

Did I make a mistake somewhere?

Output: https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnwyMDEzcGNwY3xneDpjMmVkMTFlYTIzYTk0MmI

Corrected data: https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnwyMDEzcGNwY3xneDozZjcyNDRiMTY3NmQ5MmVl

Greg Frost-Arnold

Thanks Shen-yi! That's very helpful; I appreciate the time and care you put into this.

I just was wondering whether you think the null hypothesis (that there is no correlation between dept rank and citation count) should really be considered the default position, which we should only reject if we get data with p=.05 or lower.

I lean towards 'no,' for the following reason: the more prestigious one's department, the more 'visible' one is in the profession; and the more visible one is, the more likely one is to be cited.

So the one bit of your comment that I am wondering about is: "we do not yet have evidence that "top journals" papers by people from "top schools" receive more citations." To say that we don't have evidence for p, when we do not have the data to reject the null hypothesis that competes with p, seems too strong/ narrow a conception of 'having evidence for,' since sometimes the null should not be the default position. Or perhaps I have misunderstood something?

Shen-yi Liao

Marcus: The correlation I was testing was between the PGR mean ratings and citation count, and not between the PGR ordinal rankings and citation count. That might explain the discrepancy.

FWIW I think the mean ratings are more indicative of relative prestige than ordinal rankings, for basically the usual reasons about cardinal vs ordinal rankings. But, even if one were undecided between the two measures of prestige, it's still indicative of its evidential strength that the detected effect is not robust across different measures.

Shen-yi Liao

Greg: Definitely. As I noted, I was going with the standard interpretation of null hypothesis significance testing and the conventional threshold for significance.

I think what you're interested is probably best done with Bayesian methods. Indeed, my own prior is that there probably are prestige effects all over the place!

BayesFactor -- http://pcl.missouri.edu/bayesfactor -- is generally good, but I can't find its equivalent for nonparametric tests, at least on the web. Let me know if you can think of good ways to do this!

Marcus Arvan

Shen-yi: Thanks for your reply.

Your point is well-taken. However, I would argue that it is much more appropriate in this context to test by PGR ordinal ranking (numerical ranking in the PGR) versus the mean scores of programs by PGR referees (which you seemed to do). [For the difference, see http://www.philosophicalgourmet.com/overall.asp ]

Here's why: the average reader of a journal article is not very likely to know a PGR ranked program's mean score by evaluators (I had to look them up myself just now!). In contrast, a reader of a journal article is very likely to know where a program *ranks* in the PGR. Another way to put this is: the average journal reader likely has no idea that NYU's mean Leiter score is 4.8, Harvard U's mean score 4.0, etc. They *are* likely to know that NYU is #1 ranked school, or at least in the top-5.

So, it seems to me, if we want to test for a "prestige effect", we need to test for what people know: (1) Leiter rank--which is an ordinal variable (and which I tested), not (2) PGR mean score (which you tested).

For these reasons, I would suggest that the rank-based analysis I performed is more appropriate to the type of data and hypothesis being tested.

Brian Weatherson

A couple of questions/suggestions.

1. Why did you use current PGR rank rather than PGR rank at the time of publication? It doesn't make a huge difference, but there are some interesting moves that change a few values at the margins.

2. I think it would make sense to split out grad students and faculty for this kind of investigation. As it stands, the model assumes that a grad student at, say, the #15 school has more prestige/network power/etc than a faculty member at, say, the #25 school. That doesn't feel intuitively right to me. Quick thought experiment to check this: if someone graduates from #15 (whoever that is, I haven't looked it up) and gets a job at #25, it feels like they are becoming more network central, are more likely to get invited to conferences, colloquia etc, not less.

The more splits you do, the more data you need to get significant results. But I think this split would be worthwhile.

Wesley Buckwalter

The other troubling thing is just how infrequently papers are cited in these "2 top journals" over this five year period as a whole.

I'd be really interested to see if the trend you discovered holds up with more data points as you continue to investigate more journals with more citable documents in that period, or if it interacts with various prestige 'rankings' of the journals themselves too.

Brian Weatherson

This is a defence of Shen-yi's suggestion of using mean scores rather than ranks, in response to Marcus's objection that people don't know what the mean scores are.

I'm not sure that people know the relative ranks particularly well either. Off the top of your head, do the PGR ranking of: Duke, Wisconsin, UC Riverside, Edinburgh and Sydney.

I spend a lot of time worrying about this stuff for recruitment, placement, hiring, funding etc, and I don't have any idea what the ranking of those five is. I think they are all top 50 and not top 20, but even that I'm not confident about, and beyond that I have no idea.

The nice thing about using mean scores is not that people know them, it's that they collapse the distinctions that should be collapsed, e.g., between the relative prestige of the kind of schools I started this comment.

Marcus Arvan

Hi Brian,

My suggestion wasn't that people know Leiter ranks particularly well. The claim was a purely comparative one: that they plausibly--and broadly--know PGR rankings better than mean PRG means scores, at least on the whole. And, while this is an empirical issue, there seem to me reasons to think it is probably true.

If you were to ask me what a given department's PGR mean score was off the top of my head, I would probably say I have no clue. Okay, I might estimate NYU and Rutgers as a 5 on 5-point scale, but beyond that, I would have real difficulty estimating.

On the other hand, if you were to ask me whether Rutgers, NYU, Oxford, Harvard, and Pittsburgh are top-10, I would know that right off the top of my head--and so too, I submit, would just about anyone else in the discipline who reads journal articles.

Things get quite a bit less clear after the top-10--I don't know off the top of my head precisely where Wisconsin, Riverside, UNC, Arizona etc. are--but still, I bet you the average person in the discipline would give broadly accurate estimations (withing an admittedly large margin of error). For example, I have no idea what Arizona's mean PGR score is, but having gone there I know its general ranking (10-20), as well as schools in the same general area of the PGR (Duke, Michigan, Brown, CUNY).

So, it seems to me, if we want to examine prestige effects, PGR rank is a good way to do it--as people plausibly know PGR rank better (if very fallibly) than mean evaluation scores.


You inadvertently proved Brian's point, Marcus. Michigan is 4th, and Duke is 24th. They are not ranked in the same general area.
I think Brian's and others' point is that there are no real differences between departments with means of 4, but there are differences between these and those ranked 3.5.

Marcus Arvan

guy: My point wasn't that people are infallible in making PGR categorizations off the cuff. My point was that although a person may be highly fallible in this regard, there are reasons to think that, at a broader level, they can make relatively accurate judgments. So, for instance, even if I mistakenly categorize Michigan as outside of the top-10, many of my other judgments about the top 10 are accurate. Similarly, even if not all of my judgments about 11-20 are accurate, many of them are broadly accurate (I will not *tend* to place 11+-ranked schools in the top-10). Etc.

That's all. And the claim then is that, given how widely publicized the PGR rankings are in the discipline, people are apt to know PGR rankings better, on average, that PGR evaluation scores.

John Schwenkler

Hi Marcus,

I still don't follow you re: ranks vs. mean scores. First of all, as Brian says I'm sure that very few people *explicitly* know either of these for most departments. Nor, and this is even more important, is it likely that explicit knowledge would be the primary force behind these prestige effects, as opposed to a implicit sense of which departments are better than others (and to what extent). Finally, even without both of these points it remains that all we do by using mean scores rather than ranks is capture *more* information about how these departments are perceived, i.e. not just Department X being n spots better than Y, but also X being better than Y to such-and-such a degree, where differences in degree always entail corresponding differences in rank but not vice versa, as the ordinal measure is more coarse-grained. I find it hard to believe that many philosophers' sense of the excellence of departments doesn't have this more fine-grained character, which is why it's better to use mean scores, and not just rankings, as a proxy for it.

Brian Weatherson

Here's another way of putting the point I was trying to make.

Let's say I picked two schools at random from the PGR, and asked a bunch of philosophers which had a higher PGR rank. Which of the following two factors do you think would be a better predictor of their accuracy?

(A) One of the schools is at least 10 places above the other?
(B) One of the schools has a mean score at least 0.3 above the other?

I think it would be (B). That's not because people know the scores. No one does, outside of real obsessives. It's because large differences in mean scores are better correlated with felt difference in prestige than large differences in rank are.

Note it's a common (and good!) complaint about the PGR that the 'league table' presentation magnifies what are in reality very small differences between a few dozen schools outside the top 10-20.

And I should say that my view that (B) is more relevant than (A) is just a conjecture. It's an empirical question whether rank or score better correlates with the intuitive prestige you're trying (rightly) to measure. In the spirit of what you're doing here, I think we should all be fairly modest about the right way to measure things. (This is to agree with another of Shen-yi's points; we should be cautious about claims that only show up as validated on one way of cutting up the data.)

Marcus Arvan

Hi Brian: Interesting--I would have thought (A). On the big issue, though, you're right. We should be fairly modest about the right way to measure things. We should, at this point, simply say: on one way of cutting up the data, there's a statistically significant relationship, and on another, there isn't (though it is worth noting that, even on Shen-yi's analysis, the data are *close* to reaching the level of statistical significance--something which should also be considered. When things are statistically significant on one measure but close to significant on another, that is some reason to think that the non-significant result is likely a result of low statistical power, and that a larger sample might generate a significant result).

Shen-yi Liao


I'm really looking forward to having more data and seeing how all this turns out.

I just want to say that I think we're disputing more than the existence of an effect, but also the magnitude of the effect. One of your initial takeaways (which got picked up at Daily Nous and elsewhere) is that "authors from Leiter top-10 departments were cited about 1.5-2x more often than people from lower-ranked departments". Instead, when I look at my scatterplot with the trendline (which I've now revised to switch the axes and add the trendline formula and R-squared), I see a much weaker relationship. Roughly, a paper gets 2 more citations every additional unit in PGR means--that is, moving the prestige distance from Leeds to MIT.

Now, I don't think the trendline story is quite right either, because I'm not convinced that some kind of correlational relationship is the best way to model the variables we're interested in. Nevertheless, that's just an instance to show that we're having a difference of interpretation beyond yes/no statistical significance.

Again, I think this is all super early stages, and as Brian Weatherson and others note, there are probably many many mediating variables that are missing right now. Getting more data would certainly be helpful! (Hence why I added the PGR means data to your PGR rank data.)

Marcus Arvan

Hi Shen-yi: I'm looking forward to that, as well.

In terms of magnitude of effect, we do disagree greatly on how to interpret trends in the data.

Given the sheer amount of variance in citations (the variance in the sample is very, very high), I do not think it is right to measure magnitude by a trend-line fit to a scatterplot (as you suggest we should). The less variance in a sample, the more representative a trend line is of actual trends in the data (i.e. an incremental trend line will represent true, incremental trends). In contrast, the more variance in a sample, the less sensitive the trend line is to variances that significantly influence the trends (larger variances in citation counts in different parts of the data).

Here is why I think this is important. When focusing on the trend line, you suggest that the net benefit of moving in the PGR from Leeds (#47) to MIT (#15) is only +2 citations. That sounds really small.

But now let's look at the variance (focusing on large variations):

(1) Of the 31 papers by authors from top-10 programs, 10 (or 31%) had over 50 citations.

(2) Of the 32 papers by authors from 11-20 programs, only 5 (or 16%) had over 50 citations.

(3) Of the 20 papers by authors from 21-30 programs, 4 (20%) had over 50 citations.

(4) Of the 17 papers by authors from 31-40 programs, 2 (11.7%) had over 50 citations.

[sample from 41-50 programs was too small]

(5) Of the 49 papers by authors from 50+ programs, 11 (or 22%) had over 50 citations.

Similarly, there is the other big variance I mentioned earlier in the 50+-ranked data set.

(6) 40% of all papers written by authors from small/foreign-unranked schools were cited <10 times.

(7) Only 8% of papers written by authors from larger/non-foreign-unranked schools were cited <10 times.

In other words, although there is still a lot of noise in the data, I think that when we pay attention to large variances the trends are stronger--and that paying attention to such variance is important. A paper with 50 citations is a *much* bigger "mover" in the discipline than, say, a paper with 20 citations--and so in terms of measuring trends, it matters how many "super-cited" papers there are. Similarly, the fact that people from small and foreign schools hardly get cited is a huge variance too.

Shen-yi Liao


That's really helpful. I just want to re-emphasize that I'm not at all committed to the trendline. (Indeed, I said it's probably not quite right!) I just suggested that as one way to interpret the data. And I'm attracted to the idea you suggest, where citation counts do not have linear relationship to being "movers" in the field; in which case we should do something like log-transform the citation data and look for statistical models for that.

My main point in the latest comment was just to highlight that statistical significance is not all that's at issue. We should also care about other (potential) disagreements that might come elsewhere in the statistical model, including of course the range of reasonable ways to interpret the data.

Joshua Knobe

This whole conversation has been extraordinarily helpful and productive, and it's wonderful to see how willing people are to go after these questions using rigorous methods.

I have just one tiny addition to the many helpful points that have already been made. In existing studies of the sociology of philosophy, there has been a tendency to focus on papers from just a few journals (e.g., Mind and Phil Review), but my sense is that those journals are in some ways a little bit idiosyncratic and unrepresentative of what our discipline is really like these days.

So in some earlier work I did on this topic, I turned instead to the data from Google Scholar Metrics, which lists the most highly cited papers in philosophy over the past five years. I just put up a spreadsheet with the information for all of those papers at:


If any of you want to do any further research on these topics, you can always just use those data. No doubt, many of the conclusions one would reach looking just at those specific journals would also emerge looking more broadly in this way. However, I think there are also some ways in which those specific journals are pretty idiosyncratic, meaning that one can arrive at a more accurate understanding of the discipline as a whole using this other method.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

Comments are moderated, and will not appear until the author has approved them.

Your Information

(Name and email address are required. Email address will not be displayed with the comment.)

Job-market reporting thread

Current Job-Market Discussion Thread

Job ads crowdsourcing thread

Philosophers in Industry Directory