Dealing with the Uncertainty in Scientific Results
MOST EVIDENCE IN the empirical sciences is statistical in nature, and scientists rely on a variety of statistical tests to distinguish valid scientific discoveries from spurious ones. Unfortunately, there is a growing recognition that many important research findings based on statistical evidence are not reproducible, raising the question of whether there is a gap between what these statistical tests ensure and the way they are used. Communications of the ACM, April 2017, Building a Safety Net for Data Reuse.
I like to use the example of a hammer. If I hit you on the head with the hammer, how do I know it hurts? Do I need a randomized double blind study with 100 people to decide if hitting you on the head with a hammer actually hurts you? No, probably not.
Why then do we need all these fancy statistical tests and meticulous study designs? Because many things of interest are not as black and white as a hammer blow to the head. We need more subtle measures to see if something makes a useful difference. If something we do can improve things by 10%, for example, then that might be useful. Most things will continue on as normal, but that improvement of 10%, say of people not dying of cancer, is quite possibly worth doing. The problem arises when we use these techniques to justify products, such as drugs, where the only way we can argue that they are useful is by some arcane numbers and not by being able to observe the improvement directly.
I like the statins example. I read an MD saying that statins are so effective that they should be in the water supply. Yet, it takes 300 people treated with statins to statistically detect that one less person had a heart attack. We can’t tell which person it was, we can only claim the number based upon studies using the drug. The irony is that while we can’t tell who was helped, we can tell which group was taking the statins from the control group who took a placebo. How? By all the side effects and reduced quality of life the statins group suffered.
I recall another study where the difference in cancer survivors was something like 33% instead of 31% without the tested treatment. The quality of life for the treated population was notably less than the untreated, due to the side effects of the treatment which I believe was chemotherapy. I concluded, based upon the small differences in these results, that I would probably take the chance and not be treated.
I had already done something similar in my life. I was told that I had thyroid cancer and needed to have my thyroid removed. After some terrified and frenzied research I discovered that while taking the thyroid out was the standard approved practice, that only 20-30% of the thyroids removed actually had cancer. This meant that I had a 70-80% chance of no cancer nor having to have surgery nor having to take a daily pill for the rest of my life. I went with the odds, kept my thyroid, and I have been cancer free for over 5 years.
Statistics are a wonderful tool and even simple averages have provided us with profound insight that helped us launch projects to new levels of success. However, it is too easy to mislead people and regulators with assurances based upon standard practices and statistically driven studies. Studies where there may be little or no real observable value except for the barely significant, often unreproducible, statistical occurrence that however produces significant income when getting people to use these practices.
Do the numbers behind your project tell a realistic story or are they simply the standard acceptable but meaningless numbers that everyone uses?