What does the experimental evidence actually say about the stability of moral intuitions?

Suppose you are sitting at your desk, reflecting on a moral question. Now suppose that as you are reflecting on this question, you happen to be looking around at a somewhat disgusting scene. Perhaps there is a half-eaten apple on the desk, or a bad smell in the room, or maybe you just didn’t have an opportunity to wash your hands.

I sometimes encounter the claim that experimental studies have shown that people’s moral intuitions can be pushed around in surprising ways by subtle situational factors like these. It is then sometimes suggested that philosophers need to think more about the deeper philosophical implications of this kind of ‘instability’ in our moral intuitions.

This claim strikes me as a serious misrepresentation of the present state of the empirical literature. In fact, it might be more accurate to say that existing studies provide evidence that these factors do not influence people’s moral intuitions. At the very least, it would be hard to deny that a whole bunch of recent studies suggest that people’s moral intuitions are surprisingly stable.

As many of you know, one of the most important things going on in psychology these days is the attempt to replicate certain widely-known existing findings. Work in social psychology has led to numerous findings that showed a certain cuteness/flashiness/clickbaitiness. Many of these findings have become highly influential within philosophy. However, in a number of cases, more careful systematic work over the past few years has shown that these findings consistently fail to replicate. It is now widely thought that many of these supposed effects actually don’t exist at all, and that the original findings were instead the result of publication bias or questionable research practices.

Many researchers now hold precisely this view about the findings philosophers sometimes invoke to demonstrate the instability of moral intuitions. More specifically, a series of early studies in social psychology were taken to show that people’s moral intuitions could be affected by incidental manipulations of disgust. This work got a lot of attention, and it was widely seen as a challenge to a certain methodology in moral philosophy. But follow-up work now seems to suggest that this effect doesn’t actually exist. I worry that this more recent work is not sufficiently well-known in philosophy, but it strikes me as very important and worth examining.

First, a direct replication found no effect of cleanliness on moral judgment. Second, an influential meta-analysis indicates that, after correcting for publication bias, there is no significant effect of incidental manipulations of disgust on moral judgments. Third, a recent study checked for moderation with an enormous sample size and, again, found no effect at all.

These results are truly surprising. Naively, prior to reading any empirical work, I would have assumed that making people feel more disgusted would change their moral judgments. These studies suggest that this naive understanding is mistaken. People’s moral intuitions are more stable than one would have thought.

Of course, it would be possible for philosophers to challenge this experimental evidence, and I would be very interested to hear such challenges However, what we should absolutely not do is to simply ignore it. That is to say, it would definitely be a big mistake for us to continue writing metaphilosophical papers on the implications of empirical results suggesting that moral intuitions are less stable than one would have thought without engaging with the evidence suggesting that moral intuitions are actually more stable than one would have thought.

[Cross-posted at Experimental Philosophy. A whole series of recent experimental philosophy studies suggest that people’s intuitions are surprisingly stable and invariant across demographic groups. For another recent example, see here.]

23 Replies to “What does the experimental evidence actually say about the stability of moral intuitions?”

Edouard Machery says:

August 21, 2016 at 4:58 pm

order effects replicate just fine.
Joshua Knobe says:

August 21, 2016 at 5:43 pm

Hi Edouard,
Great to hear from you! I certainly agree. Order effects are real. More generally, there is definitely an important difference between effects that involve actually changing the questions participants receive (order effects, framing effects) and effects that only involve changing the external situation (dirty desks, cleanliness priming).
In my view, these different effects are importantly different. No one ever thought that moral intuitions were completely stable, so no one thought there was anything interesting or surprising about the mere existence of order effects. What makes the literature on order effects so exciting is that people are offering serious explanatory hypotheses about why these effects arise and what they show about moral cognition.
By contrast, the effects of manipulations of the external situation were widely regarded as intrinsically interesting and surprising. Many people thought that these effects showed that moral intuitions were less stable than they had previously assumed. (Full disclosure: I was one of the people who thought that.) However, more recent work points in the opposite direction, indicating that moral intuitions are less impacted these factors than one might have expected.
Clearly, there is no chance that moral intuitions will turn out to be stable in all ways. Overall, though, I think one message that has been coming through in the recent experimental literature is that intuitions are much more stable, at least in certain respects, than anyone would have guessed in the absence of serious empirical work.
Edouard Machery says:

August 21, 2016 at 11:32 pm

“so no one thought there was anything interesting or surprising about the mere existence of order effects”: this does not seem quite right. Just re-read the many papers on order effects about trolley intuitions (including the papers by Eric and Fiery).
And in most of those papers there are *no* explanatory hypotheses – it’s only recently that these have been formulated.
And there is plenty of demographic variation: gender effects on moral intuition reproduce just fine (see Gawronski’s paper) as do age effect on political judgments, etc.
Admittedly, we have learned that priming may have much less influence on judgment, to say nothing of behavior, than what a decade of social psychology suggested. That’s important to take into account, but it would be a serious mistake to ignore the solid evidence of variation on the grounds that some lines of research turned out to be false positives.
Joshua Knobe says:

August 22, 2016 at 3:18 am

Hi Edouard,
Thanks once again for the use super helpful comments! In my original post, I was just trying to discuss this one particular type of effect, but I really appreciate your efforts to situate this in a broader context.
I think we are actually completely in agreement about the empirical facts. Existing studies point to some ways in which people’s intuitions are unstable and also to some ways in which people’s intuitions are surprisingly stable. Philosophical work in this area should take account of this whole pattern.
Initial findings seemed to suggest that people’s intuitions could be pushed around to a really shocking extent by extremely subtle features of the situation. If these findings had held up, it might plausibly be claimed that they gave us reason to radically rethink our philosophical methodology. I would not say the same about the existence of order effects or about the fact that people of different political orientation have different moral views. However, I would be very open to listening to arguments for the opposite perspective.
By contrast, I find it quite surprising that people’s intuitions are not affected in the way I would have expected by manipulations of disgust. I have also been very struck by recent studies showing surprisingly similar moral intuition across cultures. For example, I would never have predicted that we would find so much cultural uniformity in judgment about moral responsibility and about metaethics:
https://campuspress.yale.edu/joshuaknobe/files/2016/02/cultural-universal-2abcgkp.pdf
https://campuspress.yale.edu/joshuaknobe/files/2016/02/Relativism-1vzddhk.pdf
Perhaps most of all, I have been super impressed with your recent findings about cross-cultural uniformity in intuitions in a number of domains.
But that is not to disagree with anything you said in your recent comment. All of those effects really do exist, and I would be happy to see further philosophical exploration of them — just as long as this philosophical exploration also engaged with the growing body of evidence pointing to ways in which moral intuitions are far more stable than one would ever have expected.
Chandra Sripada says:

August 22, 2016 at 10:11 am

Hi Josh,
I wonder if some readers of this post are scratching their heads and thinking, “Why would Josh, a leading advocate of Xphi post *this*? Isn’t he undermining the very rationale for doing XPhi?”
To those who might be thinking this, I guess it would be useful to underscore that the project of identifying instability in intuitions is really only a small part of Xphi. I saw one place where you (Josh) estimated it was about 1% of studies in XPhi.
So the questions you are raising about the instability of philosophical intuitions (or lack thereof) are important and fascinating. But whatever the answers turn out to be, the day-to-day business of doing XPhi should go on just fine.
Shelley Tremain says:

August 22, 2016 at 11:32 am

Hi everyone,
The “one place” to which Chandra refers may be in my Dialogues on Disability interview with Josh that was posted to the Discrimination and Disadvantage blog in June.
There’s lots of great stuff in the interview! You will find it here:
http://philosophycommons.typepad.com/disability_and_disadvanta/2016/06/dialogues-on-disability-shelley-tremain-interviews-joshua-knobe.html
Edouard Machery says:

August 22, 2016 at 11:53 am

i don’t think that this 1% figure should be taken very seriously. Sure, if you include all the articles published in JPSP and Cognition, you will get a very low figure, but you’d get a different figure if you focus on articles authored or co-authored by philosophers or articles that have had an impact in philosophy.
Brad Cokelet says:

August 22, 2016 at 2:01 pm

Interesting! This is certainly worth looking into, but perhaps you can comment on an initial reaction.
I too would think that there is a population whose moral judgments could be pushed around a bit by priming disgust, but I would think this effect would be isolated to specific classes of moral judgments. Do these pyschologists target the kinds of moral judgments that someone like Nussbaum thinks are distorted by disgust? For example, do they focus on judgments about sexual ethics or eating human beings in dire circumstances? I would expect that those judgments could at least be markedly intensified (in a significant population) when subjects were primed for disgust.
Brad Cokelet says:

August 22, 2016 at 2:09 pm

I have looked a bit and the effects they were looking for look to be less intuitive. For example priming for disgust was thought to affect behavior in an ultimatum game. This does not seem likely. Priming for the honor module on the other hand would seem to affect that sort of case, but that is because one would expect honor sensitivity to connect to justice sensitivity, but not expect disgust sensitivity to connect to justice sensitivity. It seems that the studies should focus on cases in which we might be tempted to talk about norms of moral cleanliness and purity.
Brad Cokelet says:

August 22, 2016 at 2:21 pm

Never mind, I see that one of the studies focuses closely on the cannibalism, etc.
Josh May says:

August 22, 2016 at 2:35 pm

Brad:
Great point. Previously, I had conceded that incidental disgust can influence moral judgments in the purity domain, which is unsurprisingly related to disgust. However, in their meta-analysis, Landy and Goodwin (2015) examined whether the domain of purity moderated the purported disgust effect, and they found that it did not:
“Contrary to the predictions described earlier, and also contrary to a handful of published results (e.g., Horberg et al., 2009; Seidel & Prinz, 2013), the mean effect size for nonpurity violations was, if anything, slightly larger than the mean effect size for purity viola- tions (see Table 2), suggesting that the amplification effect is not restricted to moral transgressions involving bodily purity, sexual purity, or crimes against nature.” (530)
This may not be the last word on the matter. However, if you ask me, it makes the purported effect of incidental disgust on moral judgment even more dubious.
Brad Cokelet says:

August 22, 2016 at 2:57 pm

Josh: Thanks for weighing in!
I would be especially interested to here whether you think these results cast doubt on the idea that there are some sub-populations (tribes someone might say) whose judgments about the morality or perversity of homosexual sex can be intensified by disgust priming.
It would also be interesting to have studies that go beyond asking “how wrong is x” and have them ask things like “how morally bad is x”, “how unnatural is x”, “how shameful would it be to x” and so forth. Do psychologists in these areas explore these part of the thicker moral, and quasi-moral terrain?
Josh May says:

August 22, 2016 at 3:17 pm

Brad:
I think we have less evidence about sub-groups, at least regarding incidental disgust. There seems to be plenty of evidence (although no meta-analysis I’m aware of) that conservatives are more disgust-sensitive. However, that doesn’t mean that conservatives are prone to have their moral judgments influenced by disgust that is *incidental* (irrelevant or unrelated to the act or character judged).
Many theories predict that people’s moral judgments are sensitive to disgust that is instead *integral* or bound up with intuitions or judgments (often unconscious) about the relevant act violating a norm. The surprising finding is supposed to be that a mere feeling alone can influence moral judgment, even if the information it carries (if anything) has nothing to do with what’s being judged.
My own view is that incidental disgust and other mere feelings tend primarily to be consequences of moral judgments, not causes. When they are causes, the feelings are bound up with more cognitive states that do the causal work. In other words, plot twist: the science supports rationalism!
On dependent measures: Some researchers do measure thicker moral judgments, but it doesn’t seem the norm. Do you think judgments employing thicker concepts are more likely to be influenced by incidental disgust?
Brad Cokelet says:

August 22, 2016 at 3:33 pm

Josh: Thanks that is very interesting.
Yes, I was thinking the effects might vary if people left talk of ‘morally wrong’ or ‘wrong’ behind.
I have not thought this through but I am thinking that we might get stronger results if we asked about shame and grounds of shame (vice, unnatural, etc) rather than guilt and grounds of guilt (wrong, impermissible, etc). My hunch is that people tend to feel shame about being or acting disgusting, but that there is a less strong tendency to feel guilty about being disgusting. So disgust might pump up judgments connected to the shame family but not so much judgments connected to the guilt family.
Joshua Knobe says:

August 23, 2016 at 12:17 pm

Just wanted to chime in to reemphasize the very helpful point that Chandra makes above. People sometimes make it sound as though experimental philosophy, by its very nature, is on the side of demonstrating instability in intuitions. But on reflection, surely everyone would agree that this is a mistake.
First, experimental philosophy as a movement is obviously just on the side of believing whatever the experimental data turn out to show.
Second, and more importantly, there have been many, many experimental philosophy studies over these past two years that explore the surprising ways in which people’s intuitions turn out *not* to be unstable.
So if you are writing a paper about experimental philosophy, please do not think that it makes sense somehow to characterize the field as a whole as an attempt to demonstrate instabilities in people’s intuitions and then to mention in a brief footnote that ten recent x-phi studies provide evidence for the opposite view. There is nothing accurate about that. Rather, the right way to characterize the field is just that it aims to make progress on philosophical questions using experimental methods. Some experimental philosophy studies suggest that people’s intuitions can be influenced by irrelevant factors, but there are also lots of studies showing surprising ways in which people’s intuitions turn out not to be influenced by such factors.
Kevin Tobia says:

August 24, 2016 at 6:02 pm

Thanks for this really interesting post! In this discussion, it seems that whether people’s intuitions are “stable” turns on pre-experimental (naive) beliefs like whether we naively think that disgust cues or question-order would obviously affect intuitions. Some of the comments here suggest that the common expectation was that disgust cues and question-order would have an effect. But I wonder whether (before the experimentation) many philosophers would have predicted that. Philosophical ethics is full of claims that take the author’s moral intuitions as shared, stable … even true!
Perhaps part of the story here is that some experimental results that are surprising at the time of discovery are seen as very obvious and unsurprising years later. The initial discovery of almost any instability in moral intuitions was at odds with a certain traditional view of moral intuition. Over a relatively short time, there’s been a sea change: today people are less surprised by findings that intuitions are affected by certain factors. But now there’s a problem at the other extreme: intuitions aren’t actually influenced by some of these factors. As this post helpfully notes, the experimental literature shows that some intuitions are affected by some of these “irrelevant” factors, but other factors (e.g. disgust cues) seem overall to not have an effect.
So experimental philosophy has found some evidence in favor of intuition instability and other evidence in favor of stability. An important question (for philosophers) would be how to assess intuitions’ status or “in/stability” in light of this evidence: for example, what do we make of moral intuitions that are stable in the face of disgust cues but vary based on question order or framing?
Joshua Knobe says:

August 24, 2016 at 9:02 pm

Hi Kevin!
Wonderful to have you here. I have really learned a lot from your work on these issues, and I’m delighted that you are participating in this conversation.
A number of years ago, some philosophers argued that empirical findings about the factors that influence intuitions gave us reason to radically rethink our whole approach to philosophical methodology. I initially found this argument extremely convincing, but since subsequent work indicates that those empirical claims weren’t actually true, I am inclined to change my mind. The argument doesn’t seem nearly as convincing to me now as it did ten years ago.
As you rightly note, it is still clear that people’s intuitions can be influenced by the order of presentation, but I was thinking that this fact just doesn’t give us any reason at all to radically rethink our approach to philosophical methodology. To see why, it might be helpful to think in a more concrete way about what this order effect actually is.
Wiegmann & Waldmann (2014) examined the impact of order on intuitions about the trolley problem. For the Push case, intuitions did not differ depending on whether it was presented first (M = 2.4) or second (M = 2.45). For Switch, participants gave a slightly different answer when it was presented second (M = 3.51) as compared when presented first (M = 4.40).
In other words, order had no effect in one condition, and in the other condition, the effect of order was less than one point on a six-point scale. I have only good things to say about existing research on this issue — Wiegmann & Waldmann is completely awesome, as is Schwitzgebel & Cushman — but I guess I was just assuming it was obvious that this effect in particular isn’t even remotely the sort of thing that should give us reason to radically rethink our approach to philosophical methodology. (Other effects really did seem to give us reason to do that, but the evidence now suggests that those other effects weren’t real.)
Of course, I would certainly be open to alternative views on this topic, so if anyone has a different perspective on the philosophical implications of this effect, I would love to hear about it.
Kevin Tobia says:

August 25, 2016 at 8:11 am

Hi Josh! Nice to chat with you here! I agree wholeheartedly with what you write about experimental philosophy method: experimental philosophy shouldn’t be committed to the stability or instability of intuitions and it should go where the data goes. Thanks again for this post and for calling attention to these important findings.
It’s not entirely clear what is entailed by calls to “radically” rethink our approach to philosophical methodology But I think that these findings have implications for philosophical methodology and there are broad similarities between the implications of order effects as from some of these other debunked effects.
I don’t know why disgust effects would entail a much more radical rethinking than the order effects do. Your post suggests two relevant features – the effect size (e.g. less than one point on a six point scale) and the type of effect (e.g. question-order vs. disgust). It can’t be effect size that’s distinctive since many of the original disgust findings reported similarly sized effects. So I assume the difference has something to do with the type of effect: situational effects seem more problematic than question framing or ordering effects. I’m not sure I agree with that. But even if situational effects are *comparatively* more troubling, the empirical refutation of that trouble doesn’t alleviate other (lesser) troubles.
Joshua Knobe says:

August 25, 2016 at 10:18 am

Hi Kevin,
As always, your thoughts on this stuff are super helpful and insightful. More generally, I feel like this whole conversation is going in exactly the right direction. We definitely need to keep thinking, along these same lines, about the whole pattern of data and what conclusions it supports.
Perhaps it would be good for me just to say more explicitly how I was understanding the role of order effects in this methodological discussion and then to see whether other people were understanding it in the same way. The way I understood it, the mere existence of order effects was never supposed to be a strong reason to change our philosophical methodology; rather, the key point was supposed be that order effects were one example of a much broader phenomenon.
Philosophers always understood that the intuitions we have about a particular case can sometimes depend on which other cases we have considered. The actual order effect uncovered in these experiments is maybe a little smaller than one would have expected and certainly not a strong reason for philosophers to switch over to some different methodology.
The important point, I had thought, was that order effects were supposed to be one instance of a broader phenomenon that really did pose a very serious threat to traditional methodologies. Just to give one example, it was suggested that 26% of participants of Western descent ascribe knowledge in Gettier cases but that 56% of participants of East Asian descent actually ascribe knowledge in such cases (!). This is the kind of effect that really blows one’s mind — suggesting that an intuition we had taken to be absolute bedrock might actually will be parochial to our particular culture. At least to me, it seemed extremely plausible that we should make very serious changes in our philosophical methodology in light of this effect. (Unfortunately, it now appears that the effect was not real.)
Then I guess I had assumed that the reason people were discussing the precise details of order effects in these methodological debates was that they thought that a rigorous investigation of order effects could help shed light on the broader phenomenon. In other words, it doesn’t seem plausible that we should make major changes in our methodology just in light of the fact that there are order effects, but if we believe on independent grounds that there is a serious threat to our methodology, it does seem plausible that we could investigate that threat by looking at order effects.
Kevin Tobia says:

August 25, 2016 at 2:02 pm

Hi Josh, thanks for *your* super helpful and and insightful thoughts! I agree that it’s really useful to keep thinking about these topics and I’d also be interested to hear what others think.
Two responses to this latest post:
– First, there are arguments to be made from the existence of just one of these effects (e.g. order effects). If moral intuitions are really supposed to be formed reliably or deliver truths, the “mere” existence of order effects is still somewhat troubling. Clearly, that alone doesn’t justify a *radical* methodological reform, but it seems equally problematic to simply carry on in the face of such findings if it turns out that they are widespread and robust. I agree that it’s inappropriate to characterize moral intuitions as radically unstable, especially given that recent studies evince the falsity of some earlier empirical claims. But it also seems inappropriate to characterize moral intuitions as uncontroversially and entirely stable when we have these other results.
– Second, there is this broader argument that you mention, in which order effects are one example of a broader phenomenon that would be deeply problematic for intuitions. You’re right to point out that some of those demographic effects have not replicated and those refuted claims should not be propagated as true. But doesn’t this broader argument still have good evidence going for it? Experimental philosophy studies find that intuitions are affected by personality, native language, religious belief, question order, other question framing … [granted, some of these involve non-moral philosophical intuitions]
Joshua Knobe says:

August 25, 2016 at 3:44 pm

Hi Kevin,
Your second point really does a beautiful job of getting to the right to the heart of these difficult issues. It would be wonderful to have a paper that grapples in a serious way with all the evidence that people’s intuitions are surprisingly stable and then argues that, despite this, we still have reason to reach a negative conclusion about traditional philosophical methods. I would absolutely love to read such a paper and would definitely do what I could to promote it in the broader philosophical community.
In the meantime, let’s continue thinking concretely about what these effects actually are and whether they provide evidence for such a negative conclusion. To take up a second example, suppose we consider framing effects.
It has often been noted that experimental philosophy studies have uncovered striking effects in which people have different intuitions depending on whether a question is presented in the concrete or in the abstract. These effects have successfully replicated and definitely seem to be real, but I don’t think that they provide any reason at all to reach a negative conclusion about traditional philosophical methods.
The traditional method in moral philosophy was a kind of reflective equilibrium. The first step is to collect a bunch of intuitions — intuitions about concrete cases and also intuitions about general principles. Second, one was supposed to find certain tensions or contradictions among these different intuitions. Third, one was supposed to use philosophical reflection to resolve the issues brought out by these tensions or contradictions.
Now, I think it would be possible for empirical results to provide evidence that this method is no good, and some early experimental philosophy studies actually did seem to be providing such evidence. However, we don’t get such evidence from the mere fact that there are tensions and contradictions between people’s different intuitions (e.g., concrete vs. abstract). On the contrary, defenders of the traditional method always specifically insisted on the existence of these tensions and contradictions. If we hadn’t found any tension at all between the concrete and the abstract, this would actually have been evidence *against* what traditional philosophers always said.
p.s. Please do not think that I am at all closed off to the possibility that existing findings could provide evidence for a negative conclusion about traditional philosophical methods. I would be very, very welcoming of arguments for that conclusion, as long as such arguments engaged in a serious way with the empirical data pointing in the opposite direction.
Josh May says:

August 25, 2016 at 5:45 pm

Kevin and Josh:
Great discussion! If I could just chime in briefly… Kevin, I think you’re right that if size matters then it should matter for disgust and order. However, while the effects of disgust turned out to be small, many people used to erroneously conceive of them as large. (for some choice quotes from philosophers and scientists, see my: http://philpapers.org/rec/MAYDDI-3) That mistake is largely what got me into writing about the topic.
In the end, I think size definitely matters for skepticism about the import of intuitions. The influence on the intuition must be a main basis of the intuition for it to be undermined. If, for example, order only slightly affects moral intuitions (see Demaree-Cotton: http://philpapers.org/rec/DEMDFE-2), and they are mainly based on non-arbitrary factors, then they aren’t necessarily untrustworthy. (Victor Kumar and I run this line in a draft paper.)
Also, perhaps one reason to think that order effects aren’t always damning is that sometimes order matters for rational updating on different sets of evidence. Horne and Livengood have an interesting paper on this issue (http://philpapers.org/rec/HOROEU). Some order effects don’t fit their mould, I think, but they make some interesting points.
Joshua Knobe says:

August 25, 2016 at 11:37 pm

Hi Josh!
This is a fantastic point. You are completely right to emphasize that effect size matters, and I really appreciate your efforts to draw attention to the fact that these effects are so small (a point that is also brought out, very helpfully, in the Demaree-Cotton paper you cite).
Not sure if people will find this helpful, but perhaps we can draw an analogy here with the use of p-values in statistics. The key idea behind this approach is that if you use it correctly, the frequency with which you make an error should be acceptably low. (Basically, if your hypothesis is false, you should mistakenly conclude that it is true 5% of the time.) Now, in actual fact, the rate of error has been far higher than that, and this has precipitated a major crisis. But suppose, counterfactually, that the opposite had happened. Suppose we had discovered that when researchers were testing a false hypothesis, they only mistakenly concluded that it was true 3% of the time. Of course, this finding would reveal that researchers sometimes make errors, but all the same, it would be taken as a completely astonishing vindication of statistical methods, showing that there was much less error then anyone that would have anticipated.
I am thinking that existing findings about the influence of irrelevant factors on philosophical intuition are showing us something along these very lines. Philosophers always knew that demographics, order, mood, etc. had some amount of influence, but they thought that this amount was low enough to make their methodology acceptable. Then what the experimental studies showed was that the actual size of this influence was astonishingly low, far lower than anyone had anticipated.

What does the experimental evidence actually say about the stability of moral intuitions?

23 Replies to “What does the experimental evidence actually say about the stability of moral intuitions?”

Leave a Reply

Discover more from PEA Soup