Why the Olson study fails to prove animal testing can predict human responses

In response to our tweet yesterday, a member of the animal experimentation community, Lars Dittrich, has today cited the Olson study, claiming it proves that animal testing holds predictive value for humans.

The Olson study is a favourite of animal modelers, but it fails on every account. Our senior doctor Ray Greek addresses the false claims made by the Olson study, below, in his peer reviewed paper written with Dr. Niall Shanks: Are Animal Models Predictive for Humans?

Below we quote a section of Shanks and Greeks’ paper, summarising the Olson study’s main errors. For those wanting to read Shanks and Greeks’ complete analysis please visit this link,  scroll down – or type in ‘Olson’ in the ‘find box’.


‘The Olson Study, as noted above, has been employed by researchers to justify claims about the predictive utility of animal models. However we think there is much less here than meets the eye. Here’s why:

1. The study was primarily conducted and published by the pharmaceutical industry. This does not, in and of itself, invalidate the study. However, one should never lose sight of the fact that the study was put together by parties with a vested interest in the outcome. If this was the only concern, perhaps it could be ignored, however, as we will now show, there are some rather more serious flaws.

2. The study says at the outset that it is aimed at measuring the predictive reliability of animal models. Later the authors concede that their methods are not, as a matter of fact, up to this task. This makes us wonder how many of those who cite the study have actually read it in its entirety.

3. The authors of the study invented new statistical terminology to describe the results. The crucial term here is “true positive concordance rate” which sounds similar to “true predictive value” (which is what should have been measured, but was not). A Google search on “true positive concordance rate” yielded twelve results (counting repeats), all of which referred to the Olson Study (see figure 5). At least seven of the twelve Google hits qualified the term “true positive concordance rate” with the term “sensitivity” – a well-known statistical concept. In effect, these two terms are synonyms. Presumably the authors of the study must have known that “sensitivity” does not measure “true predictive value.” In addition you would need information on “specificity” and so on, to nail down this latter quantity. If all the Olson Study measured was sensitivity, its conclusions are largely irrelevant to the great prediction debate.

4. Any animals giving the same response as a human was counted as a positive result. So if six species were tested and one of the six mimicked humans that was counted as a positive. The Olson Study was concerned primarily not with prediction, but with retroactive simulation of antecedently know human results.

5. Only drugs in clinical trials were studied. Many drugs tested do not actually get that far because they fail in animal studies.

6. “…the myriad of lesser “side effects” that always accompany new drug development but are not sufficient to restrict development were excluded.” A lesser side effect is one that affects someone else. While hepatotoxicity is a major side effect, lesser side effects (which actually matter to patients) concern profound nausea, tinnitus, pleuritis, headaches and so forth. We are also left wondering whether there was any independent scientific validity for the criteria used to divide side effects into major side effects and lesser side effects.

7. Even if all the data is good – and it may well be – sensitivity (i.e. true positive concordance rate) of 70% does not settle the prediction question. Sensitivity is not synonymous with prediction and even if a 70% positive prediction value rate is assumed, when predicting human response 70% is inadequate. In carcinogenicity studies, the sensitivity using rodents may well be 100%, the specificity, however, is another story. That is the reason rodents cannot be said to predict human outcomes in that particular biomedical context.

The Olson Study is certainly interesting, but even in its own terms it does not support the notion that animal models are predictive for humans. We think it should be cited with caution. A citation search (also performed with Google on 7/23/08) led us to 114 citations for the Olson paper. We question whether caution is being used in all these citations.


Mark Kac stated, “A proof is that which convinces a reasonable man.” Even though the burden of proof is not on us to prove animal models are not predictive, we believe we have presented a proof that would convince a reasonable man.

There are both quantitative and qualitative differences between species. This is not surprising considering our current level of knowledge vis-à-vis evo devo, gene regulation and expression, epigenetics, complexity theory, and comparative genomics. Hypotheses generate predictions, which can be then proven true or false. Predict has a very distinct meaning in science and according to some is the foundation of science itself. Prediction does not mean retrospectively finding one animal that responded to stimuli like humans and therefore saying that the animal predicted human response nor does it mean cherry picking data nor does it mean occasionally getting the right answer.

When a concept such as “Animal models can predict human response” is accepted as true, it is not functioning as a hypothesis. We have referred to this as an overarching hypothesis but could have easily referred to it as an unfounded assumption. An assumption or overarching hypothesis might in fact be true but its truth must be proven. If a modality such as animal testing or using animals to predict pathophysiology in human disease is said to be a predictive modality, then any data generated from said modality should have a very high probability of being true in humans. Animal models of disease and drug response fail this criterion.

In medicine, even positive predictive values of .99 may be inadequate for some tests and animal models do not even roughly approximate that. Therefore, animal models are not predictors of human response. Some animals do occasionally respond to stimuli as do humans. However, how are we to know prospectively which animal will mimic humans? Advocates who maintain animals are predictive confuse sensitivity with prediction. Animals as a group are extremely sensitive for carcinogenicity or other biological phenomena. Test one hundred different strains or species and one is very likely to react like humans. But the specificity is very low; likewise the positive and negative predictive values. (Even if science did decide to abandon the historically correct use of the word predict, every time an animal-model advocate said animal species × predicted human response Y, she would also have to admit that animal species A, B, C, D, E and so forth predicted incorrectly. Thus justifying the use of animals because animal models per se to make our drug supply safer or predict facts about human disease would not be true.)

Some have suggested we should not criticize animal models unless we have better suggestions for research and testing [27]. It is not incumbent upon us to postpone criticizing animal models as not being predictive until predictive models such as in silico, in vitro or in vivo are available. Nor is it incumbent upon us to invent such modalities. Astrology is not predictive for foretelling the future therefore we criticize such use even though we have no notion of how to go about inventing such a future-telling device.

Some have also suggested that animal models may someday be predictive and that we should so acknowledge. While this is true in the sense that anything is possible it seems very unlikely, as genetically modified organisms have been seen to suffer the same prediction problems we have addressed [16, 81, 82, 83, 84, 85, 86, 87] and, as mentioned different humans have very different responses to drugs and disease. Considering our current understanding of complex systems and evolution it would be surprising if one species could be used to predict outcomes in another at the fine-grained level where our study of disease and drug response is today and to the accuracy that society demands from medical science.

There are direct and indirect consequences to this misunderstanding of what prediction means. If we did not allow on the market any chemical or drug that causes cancer, or is teratogenic, or causes severe side effects in any species, then we would have no chemicals or drugs at all. Furthermore, there is a cost to keeping otherwise good chemicals off the market. We lose: treatments perhaps even cures; the income that could have been generated; and new knowledge that could have been gained from learning more about the chemical. These are not insignificant downsides. Since we now understand vis-à-vis personalized medicine that even humans differ in their response to drugs and disease and hence one human cannot predict what a drug will do to another human, it seems illogical to find models that are predictive using completely different species from humans. If we truly want predictive tests and research methods (and we do), it would seem logical to start looking intraspecies not interspecies.