Lessons from the shock machine
Stanley Milgram’s obedience research is one of psychology’s most famous experiments, but it might not be all that it seems, reports Gina Perry.
It’s a balmy summer evening in 1961 at Yale University in New Haven, Connecticut, as people hurry to the psychology department. They are paid volunteers for a study that has been advertised in a local newspaper as an investigation into memory and learning. Little do they know that, once inside, they will be ordered to torture a man.
Each volunteer is met in the laboratory by “Mr Williams”, a stern-looking man in a grey lab coat. He explains that the volunteer is to act as a “teacher” in the experiment. Their job is to punish incorrect answers by administering electric shocks to a “learner”, the affable “Mr Wallace”, in a room next door. Williams introduces the teacher to the learner and then, strapping the learner into a chair, attaches electrodes to his wrists. The teacher must administer an increasing level of voltage to the learner for each wrong answer by pushing a switch on a machine. Switches are labelled from 15 to 450 volts (V). At 75 V, the learner can be heard grunting in pain; at 120 V he complains loudly; at 150 V he begs to be released; at 285 V he screams in agony. Soon after, he falls silent.
But, as the volunteers learnt at the end of the experiment, the electric shock machine was a prop and both the experimenter and the learner were actors – Williams played by a 31-year-old biology teacher and Wallace by a 47-year-old accountant.
The exercise was not to study memory and learning at all, but an elaborate experiment to determine just how far an ordinary person will go in obeying orders given by an authority figure.
The results appeared to show that, confronted by the sounds of the learner’s shocking pain, and despite their own agitation and stress, more than half of these people followed the instructions of the lab-coated authority figure and administered what they believed to be dangerous and perhaps even fatal electric shocks to people just like themselves.
In the early 1960s, Stanley Milgram, the designer of the experiment, was an ambitious, untenured 27-year-old assistant professor of psychology at Yale. He was acutely aware that to make his academic mark, his research should shed light on the pressing issues of his generation. He succeeded beyond expectation. His findings unleashed a storm of controversy that catapulted him into the global spotlight.
Milgram had apparently discovered something profound and troubling about human nature – that in the face of unjust commands from an authority figure, our morals and values evaporate and we blindly obey.
But reactions to his “obedience to authority experiments”, the results of which were published in October 1963 in The Journal of Abnormal and Social Psychology, were deeply divided from the start. Noted Harvard psychologist Roger Brown admiringly called them the “most famous experiments in social psychology” while psychologist Bruno Bettelheim branded them “vile” and “in line with the human experiments of the Nazis”.
American psychology developed a deep ambivalence to Milgram, according to his biographer, Thomas Blass. As a result Milgram, although famous, would be passed over for tenure at Harvard and forced to accept a much less prestigious position at the City University of New York, where he worked for 17 years until his death in 1984.
But there is little disagreement about the impact of the studies. In the 50 years since they hit the headlines, academic interest has continued unabated. The implications of Milgram’s findings have been discussed in journals devoted to ethics, medicine, science, engineering, international criminal justice, law, business and public administration. The Milgram experiments have also been the subject of intense debate in Holocaust literature. Did the findings shed light on the atrocities committed by ordinary people in Nazi Germany?
Some argued that Milgram’s findings explained the behaviour of many ordinary Germans who had committed atrocities, because just like ordinary Americans, they were overwhelmed by the command to obey orders from an authority figure. Bernhard Schlink’s 1995 book The Reader makes a similar case, where Hanna, an illiterate but otherwise very ordinary character, commits atrocities because she is compelled to obey her superiors.
Others have argued that Milgram’s findings have only limited explanatory power when it comes to the Holocaust. Blass says: “He only accounts for part of what constituted the horror of the Holocaust, the faceless bureaucrat who in lockstep fashion was routinely carrying out what he was doing. But that clearly is not the whole story. Much of the horror of the Holocaust was the initiative that was involved in devising more and more ghastly ways of murdering victims.”
I first heard about Milgram’s experiments as an undergraduate studying psychology. I was intrigued even then by the question of what happened to the subjects in the experiments after they left Milgram’s lab. In 2006 I started to look into the human story of the research, seeking out and interviewing Milgram’s staff and subjects, to find out what impact the experiments had had on them, both in the immediate aftermath and ever since. I soon found inconsistencies between the stories I was hearing and the story of the research that I thought I knew so well. It prompted me to visit the archives at Yale where Milgram’s papers are kept to do some fact-checking. I found a troubling mismatch between the unpublished and published accounts of what transpired in Milgram’s lab that led me to question the methodology and findings of social psychology’s most famous experiment, and also the reliability of a scientist as the narrator of the story of his own research.
I was not the only one to have doubts. In the past half-century, many academics have put Milgram’s experiments under the microscope and found them wanting.
Psychologist Diana Baumrind, published the first criticism of Milgram’s work in the high-profile American Psychological Association journal in June 1964. Her paper and Milgram’s riposte sparked one of the most intense debates in the history of psychology, one which eventually led to the 1973 revision of the American Psychological Association’s ethical guidelines.
Apart from criticising the ethical dimensions of the study, Baumrind challenged whether the findings could be generalised beyond the lab, particularly the parallel Milgram drew between his subjects and Nazi war criminals. Milgram’s description of subjects shaking, stuttering, trembling and arguing at times with the experimenter showed they could not be compared with SS guards who viewed their victims as inhuman, Baumrind wrote.
In 1968, psychiatrist Martin Orne and his colleague Charles Holland from the University of Pennsylvania, Philadelphia, published a paper in the International Journal of Psychiatry that highlighted the uncritical acceptance of Milgram’s research. “The extent to which scientific findings become generally accepted is only partly a function of the care with which they are obtained,” they wrote. “In large part acceptance depends upon the extent to which results fit the zeitgeist and the prejudices of the scientific community. The flair with which Milgram presents his findings and the effect they generate tend to obscure serious questions about their validity.”
Orne and Holland argued that Milgram’s subjects would have been alert to cues that the experiment was a hoax. And without proof that his subjects were convinced by the cover story, Milgram could make no “meaningful inference” from the lab regarding the world outside.
When psychologist Don Mixon enrolled in his PhD in the late 1960s, Milgram was one of his heroes. These days he is one of Milgram’s harshest critics. In a paper in the Journal of the Theory of Social Behaviour in 1972, Mixon challenged Milgram’s assertion that the administration of shocks was unequivocal evidence of subjects’ cruelty. He argued that Milgram had instead measured trust, that subjects had taken the word of someone they saw as a highly credible scientist who reassured them the procedure was safe. “In a way he found just the opposite of what he thought he found,” Mixon says.
Milgram responded to these methodological criticisms by citing evidence from a follow-up questionnaire completed by 658 of his 720 subjects a year after the experiments were over. It showed that 56% of subjects said they fully believed the learner was receiving painful shocks, while 24% said they thought he was “probably” getting the shocks.
Still the controversy gained impetus. In 1993, Milgram’s widow donated his papers and recordings to Yale’s Sterling Memorial library. Here for the first time researchers were able to study original sources including audio recordings, data and interview transcripts, and compare this unpublished material with Milgram’s published accounts of the research.
According to many commentators, Milgram’s experiments showed that most of us are flawed, based on the statistic that 65% of his subjects followed orders to inflict the maximum shock on another person. But this statistic applied only to the most widely reported first experiment, which involved just 40 people. Over time, the behaviour of all 24 groups, a total of 720 subjects, was conflated into the same statistic. In fact, in more than half of Milgram’s 24 experiments, fewer than 40% of people obeyed the experimenter.
But it’s not just a question of whether it was 65% or 40% of subjects who were willing to continue to the maximum voltage. The question is: did they really believe they were hurting anyone? In my research at Yale I found evidence to support Orne and Holland’s assertion that there were people who went to 450 V on the shock machine because they knew they were not torturing anybody. Many of the 652 subjects who returned the follow-up questionnaire made additional comments. Some explained what had made them suspicious: that the experimenter was in a room supervising them instead of in the room with the learner; that the learner and the experimenter stepped aside to let the teacher go through a doorway first (“why was I getting the red carpet treatment?”); that the experimenter could have run the experiment himself (“why did he need me to do it?”); and that the sound of the learner’s cries was coming from a speaker in the corner of the ceiling, suggesting it was a tape recording. Many mentioned their confusion at the calm demeanour of the experimenter in the face of the learner’s cries.
That Milgram’s subjects were suspicious is unsurprising when you consider that Candid Camera, a TV show that secretly recorded people’s reactions to bizarre situations, was hugely popular at the time.
Audio recordings of the experiments themselves reveal a constant process of manipulating the experimental situation to achieve the desired result: a high obedience rate. For example, Williams can be heard leaving the lab, pretending to check on the learner to reassure worried teachers. At other times, Williams repeats his commands to obey many times more than what Milgram later described as four standard requests. In some instances Williams commands teachers more than 25 times to continue with the experiment despite their protests and distress. When you listen to these tapes, the “obedience” originally associated with Milgram’s experiments comes across as much more like bullying and coercion.
In 1974 Milgram published the first full account of his research, presenting a theory that had taken a decade to develop. In his book Obedience to authority: an experimental view, he argued that the tendency to obey authority, even when the commands of that authority conflict with our conscience, is a universal trait: “A substantial portion of people do what they are told to do, irrespective of the content of the act and without limitations of conscience, so long as they perceive that the command comes from a legitimate authority.”
His point was that you did not have to be a psychopath to comply with orders to torture or even kill. “Often it is not so much the kind of person a man is, as the kind of situation in which he finds himself that determines how he will act,” Milgram wrote. When we were faced with an authority figure who told us to do something that conflicted with our conscience, we entered a kind of twilight zone, an “agentic state” relinquishing our will and blindly following orders, he said.
Reviews of the book were generally positive, although some reviewers pointed out that Milgram had ignored disobedient subjects in his theory. “Of the 35% or more who disobeyed in the experiments … Milgram has nothing to say,” Columbia University professor Stephen Marcus noted in his January 1974 review in The New York Times.
When a scientific result is in dispute, the normal response is to repeat it. Indeed there have been more than 20 attempts to replicate the experiment in nine countries since the mid-1960s. Blass was the first person to compile a picture of these attempted replications in a 1999 paper where he summarised the results of nine studies in six countries comparing each study’s results to the equivalent Milgram condition. For example, he compared two studies to Milgram’s “voice feedback” condition in which 40 subjects could hear the learner complaining and in which 62% obeyed. One 1969 study of 16 university students in South Africa found that 87.5% went to maximum voltage; the same scenario repeated with 50 Australian students in 1974 achieved 28% obedience.
When the term “replication” is used in science it implies a careful repeat of the original experiment using the same methods and finding the same result. But experiments conducted in the past 50 years differ from Milgram’s original in terms of procedures, samples (varying in size, age and gender) and have predominantly used students as subjects. Quite apart from the large methodological variations in these “replications”, response rates have varied enormously.
In a second review in 2012, Blass added up and averaged out obedience rates from nine US studies held between 1967 and 1976 and compared this to an average of eight international studies held between 1968 and 1985. Blass argues that the two rates – 60.94% and 65.94% indicated, overall, “a comparable finding”. When asked whether the variation in the results of these individual studies suggested that Milgram’s results hadn’t been replicated, Blass acknowledges the variability is “a puzzle that might look questionable, but if you look at the broader picture we end up with the same benchmark figures Milgram found”.
Averaging out data from such a range of international studies might seem a simplistic solution but is one way of trying to make sense of contradictory and confusing data from these attempted replications. The fact is we can’t come to any definitive conclusions about whether Milgram’s findings have been confirmed or rejected without resorting to generalisations and “benchmarks”.
The attempts to replicate Milgram’s experiments ended in the US in the mid-1970s when the revisions to the American Psychological Association’s ethical guidelines spelt an end to highly deceptive and stressful experiments. Similar guidelines were introduced in Europe within a decade.
Yet even in the early days, the research quickly escaped from academia into the public imagination to be referenced in novels, plays, films, performance art and songs.
In recent times Milgram style experiments, have been born again in the form of reality television programs. In 2007 psychologist Jerry Burger from Santa Clara University conducted the experiment for the national American television show Primetime. As he told a reporter from ABC news at the time, “People have often asked the question, ‘Would we find these kinds of results today?’ and some people try to dismiss the Milgram findings by saying, ‘that’s something that happened back in the ’60s. People aren’t like that anymore.’” To gain ethics approval, Burger ran a modified version of Milgram’s protocol, stopping the experiment at 150 volts and predicting which subjects would have gone to maximum voltage. Primetime found that 65% of 18 males and 73% of females inflicted shocks on “the learner”.
But Burger’s study was very different from Milgram’s in ways that affected his results. In American Psychologist, Milgram’s research assistant, Alan Elms, pointed out that Burger skewed his results by rejecting people who were susceptible to stress, and might have screened out the potentially disobedient.
Furthermore Burger’s results were based on a prediction of how people might behave rather than on their actual behaviour. In Elms’ view Burger had removed the “most distinctive feature of Milgram’s basic research design” and “diminished the replication’s generalisability to any real-world issues”.
Nevertheless, Burger’s version made headlines around the world for reportedly uncovering the same troubling truth about human nature that Milgram had found 50 years earlier.