However, data will also exhibit idiosyncrasies that result from the randomness in the process of data sampling and do not say anything about the process that generated them—these idiosyncrasies will disappear if we resample new data from the process. Teasing out the true properties of the process from these idiosyncrasies is notoriously hard and error-prone task. Problems stemming from such errors can be very costly and have contributed to a wider concern about the reproducibility of research findings, most notably in medical research. Communications of the ACM, April 2017, Guilt-Free Data Reuse.
Big data is a great thing. Once we’ve mined the low hanging fruit from big data, it makes sense to drill down and look for more subtle things that we can exploit. For most of my career, however, once we leveraged the more obvious insights and improved the system according to these insights, we almost always had a new environment with new data. The point being was that if we really changed things, the data we had been producing changed and we once again were on the lookout for the low hanging fruit.
I always figured we’d get pretty sophisticated, one day, with our data and management. It never happened, at least not in my projects nor organizations. The domains were changing so rapidly in places we applied software, that we were always looking for first principles and not deep, subtle, insights. The most complexity we had was that with a dozen first principles (averages, durations, rates, business rules, etc.) they tended to work together in complex and subtle ways. However, each of the basics was real simple and the challenge was melding them all together into a cohesive whole where managing any one insight didn’t drag down the benefits of another insight.
Have you exploited all the basic insights your data supplied before drilling down into deeper and more elusive pathways?