In the same way that dark matter is an unseen but very large part of the cosmos (some estimates peg dark matter at 27 percent of the mass of the universe), dark data represents the unseen but very large part of the data that most corporations collect and store. It’s “dark” because corporations don’t use it for analysis, insight, or decision making. Data Divination, Spectrum.IEEE.org, February 2017.
I kept hearing that we would have all the defects fixed and we would be ready to ship by the end of the week. This went on week after week in a project that would ultimately ship 6-9 months late (depending upon whom you asked). Since we had already been predicting that we would ship “this week” for months now, I couldn’t figure out why the best and brightest of this Fortune 50 company were stuck in this loop.
I pulled the software defect data trend from the database for the project and computed how long we were taking to fix the average defect and how long we were taking to fix 95% of all the defects. The killer stat was that it was taking us some six weeks to finally dispose of 95% of the defects that we were finding any given week. On top of that we had a steady, but declining, arrival rate of new defects each day. I showed these numbers to the VP in charge of software development. His response was that he had seen my data, but that if we didn’t ship in another week or two, we’d be out of business. We didn’t ship for two more months.
What was startling as the new guy at this organization was that they had never used their defect data in this way before. They had a great automated defect tracking and management system, but never used the data for analysis, insight or decision making. When it was finally used for this, the reaction was denial and to ignore it, because we couldn’t admit that what we had been saying all along was flawed and not based upon any objective data. This defect data, comprising thousands of defects per project and regularly collected over the entire project, readily reflected the overall progress of the project which was then easily modeled using simple statistics and then truly predictive of our actual progress.
What data does your project or organization have that if made more available might be very useful for understanding how your project is performing?