A mission failure often has multiple small triggers that combine in unsuspected ways … By meticulously fixing the small, relatively benign issues with the same determination as the larger issues, we make sure that serious problems become much less likely. Revving the Rover, Communications of the ACM, Feb 2013.
Its called quality. The small stuff matters. It is still tough to figure out exactly which of the small stuff matters most, but if we are not fixing the small stuff, it is amazing how the big stuff never gets much better.
It just looked odd. The coefficients in the earth model were different in different places in the system. I asked our resident statistician about it. He said they logically should all be the same. I asked my manager. He said that no one had submitted a defect or indicated that it was a problem, so don’t change anything.
Some time later I became the technical lead for the project. I once again noticed that how we modeled the shape of the earth for our space based system was different in different parts of the system. In the worst case I figured, I could always change them back to what they were. So I “fixed” them. It wasn’t really much of a fix, I just made all the coefficients in the earth model equations match each other throughout the software. They only differed in the least significant digits, but the unexplained differences just annoyed me.
Disaster, right? Nope. Instead, our testing of the full live system simply gave better results. We located events of interest on our planet with greater accuracy. Since we ran a series of live data tapes to test our system, it was not unusual for each test of the system to produce slightly different results. This we considered normal because it was a live system having live, if however recorded, data pumped into it. Now, however, the system was almost perfect in identifying the earth events that we knew were on the live tape. It appeared that what we accepted as normal variations due to the complexity of the live system using live data were in fact due to slight differences in these coefficients. Since few people understood them, no one was willing to make any changes.
By the time I moved onto the next job, I came to realize that the coefficients were different because several different contractors had built this large space based system and over time changes to the earth model had not been propagated everywhere. In today’s software, this “earth model” would have likely been its own module that anyone needing to use it would have all called the same library functions. Modern software engineering techniques now in place were motivated by finding such inconsistencies as these in large projects.
Noticing this “small” issue and not messing with something that few people considered an issue had been going on for years. It became part of the “this is just how it works” when in fact it was a defect in the software.
In another example, we just knew that dropped calls on our mobile phone were normal in many places. It wasn’t our phone’s issue, it was a systemic system-wide issue that we just had to deal with. For various reasons our customer decided to release their internal raw call data to us. We discovered that it was our software making assumptions that were not true. Our mobile phones became some of the best in the world at making and holding calls. We had seen the issues for years, but just wrote them off as normal and acceptable variations.
When we see small things going bad, they are often at the root of bigger things not working right. Once we eliminated or minimized these small niggling issues, the overall system “suddenly” worked very well. This also works with the human dimension. Once we discouraged or just got rid of quirky management behavior teams that had underperformed for years “suddenly” delivered projects on time and with good quality.
For examples, see The Leap
The small stuff are often indicators or pointers to bigger issues. Often times “stamping out” those small issues, as much as a waste of time it seems to be, can clean up the noise in our system and propel our team or project to exceptional results.
What small problems do you see in your project that might be worth spending some time investigating and fixing?