This year, I developed a course in experimental philosophy for third-year students at Oxford Brookes. Classes have now ended, and I am getting the first large end-of-term essays in, and it is looking promising so far. In this post, I want to explain how I taught experimental philosophy and how I approached the subject.
What I wanted to do in the course
At our university, as in most UK institutions, we have 12-semester weeks, and the last week is usually devoted to writing up papers, so that leaves only 11 2-hour teaching sessions. Nevertheless I wanted to teach
- An overview of experimental philosophy
- Basics of statistics
- Some elementary notions of experimental design
This was quite an ambitious set of goals. Of course, given that I had only 22 contact hours to teach this course, I had to be careful in selecting appropriate exercises and readings.
Why teaching experimental philosophy is a good idea
Only 14 students enrolled in this course (our typical class size is about double for the second and third years), probably because it was offered late and also because some students worried about the mathematics involved in the statistics.
However, I think experimental philosophy is a good course to offer because of its transferable skills on the job market. Being able to design a simple questionnaire and conducting statistical tests are attractive graduate skills.
Next to that, I think experimental philosophy is also useful for students to get the chance to do philosophy a bit differently: engage with people by having them participate in your experiment or code for you, and reflect on practices in our discipline, such as reliance on intuitions. For both these reasons, I think an experimental philosophy course is most suited for final year students (UK undergraduate degrees have three years).
The basic setup of the course
The course consisted in part of a theoretical introduction to the field of experimental philosophy, including the following topics:
- What is experimental philosophy? (Positive, negative, descriptive programmes)
- Intuitions of ordinary people and why they matter
- Are philosophers experts?
- The experimental philosophy of morality
For the statistics/practical part of the course, I made a selection of concepts and tests that are useful and relatively simple.
In spite of recent criticisms levelled against it, I decided to stick with significance testing, but with paid attention to the pitfalls of this practice (p-hacking etc.), and information about controlling for multiple tests. I also gave them the right definition of a p-value (from the recent American Statistical Association) and emphasized that p-values and effect sizes are different things.
- Designing experiments and collecting data (within/between experimental designs, and some mixed designs, getting approval from ethics committee, designing an information sheet, different forms of surveying)
- Basics of statistical inference – types of variables, levels of measurement, the z-distribution, testing for normality, the central limit theorem
- Three kinds of t-test, interpreting a p-value, confidence intervals
- Chi-square tests, cross-tabulations, and Fisher’s exact test
- Pearson’s correlation, elements of linear regression, scatter plots
For each of these, I carefully explained the statistical concepts involved such as what the chi-square distribution is, how to calculate (and what are) the degrees of freedom, what the chi-square statistic means, and different critical chi-square statistics for alpha levels.
I regularly introduced little quizzes where I probed students’ statistical understanding. For example, suppose a lecturer, Kevin is very concerned about recent findings that women find philosophy courses less appealing and interesting. He wants to do a statistical test on student evaluations where he will compare how interesting men and women find his course on a likert scale from 1-7. He wants to make sure that, if there is an effect, he detects it. Should he place the alpha-level higher or lower than .05, and motivate your choice.
What the students did
The students did four small tasks and wrote a large essay. For example, they had to attempt to replicate the Knobe effect. The Knobe effect, first described by Joshua Knobe in 2003, is the tendency of people to attribute whether an action is intentional or not, based on the perceived moral character of that action. In Knobe’s original study, the majority of participants who heard about a CEO who knowingly started a program that would maximize profits but also harm the environment (as an unintended, but foreseen side-effect) said the CEO harmed the environment intentionally. If “harmed” is changed to “helped”, by contrast, most participants think that the CEO did not help the environment intentionally. We put all the data of the students together (140 responses in total). Knobe’s original findings replicated beautifully. The students wrote the null and alternative hypothesis, performed a chi-square and t-test (the t-test was on how much blame or praise the CEO deserved, the independent variable was the condition), and wrote the results in the APA (American Psychological Association’s) style.
Throughout the classes, students also were tasked to invent new experimental designs. For example, they had to try to come up with a factorial design for a variation of the Knobe effect experiment where you have 2 factors. The first is the familiar harm/help condition. Students came up with interesting other factors, such as the side-effect not being harm or help to the environment, but something else such as poverty in developing countries, or something specific such as polar bears, or varying the gender of the CEO.
The large essay was a replication of Shaun Nichols’ experimental study on the genealogy of norms. Nichols argued that emotions can influence cultural evolution, and used the survival of norms to the present day to test this. He used an etiquette book, Erasmus’ On Good Manners for Boys (1530), distilled 61 rules from it, and had these coded by coders. He hypothesized that rules that prohibit doing something disgusting (e.g., “there should be no collection of mucus in the nostrils”) would have a better survival rate compared to rules prohibiting something non-disgusting actions (e.g., “only cut bread with a knife”).
Students received the dataset, which I in turn had received from Shaun Nichols, and they had this coded by coders; one who had to code whether the action mentioned was an elicitor of core disgust (something involving bodily fluids), and a second to judge whether it was part of contemporary manners. This is where I had the one major glitch in my course, because it turned out many coders weren’t careful or had not understood the instructions. To avoid problems with many students ending up with suboptimal datasets, I selected the mode (most common) response of each coding. In this way, all students had the same dataset. Nichols’ original findings were replicated, but the results were not as extreme as in his original study; perhaps a result of cultural differences in norms between the UK and the US.
Feedback from students
Students were very positive about this course. Many of them liked to do the statistical tests, and one student even mentioned in their evaluation they wanted to know more about statistics than I was able to offer. They were also encouraged by the fact that they could list some knowledge of SPSS, statistics and experimental design in their CVs.
I assigned experimental philosophy papers, such as Knobe’s side effect paper, but also thoughtful, theoretical papers such as Jennifer Nagel’s Intuitions and Experiments: A Defense of the Case Method in Epistemology (PPR, 2012).
In addition to experimental philosophy papers, I used two books. Both books were very helpful, although I could not use the materials on working with R in Sytsma and Livengood, given that I went with SPSS.
- Justin Sytsma and Jonathan Livengood, 2015. The Theory and Practice of Experimental Philosophy
- Sarah Boslaugh, 2013. Statistics in a nutshell.