Are We Measuring Project Quality Or Just Activity?
Things must have not been going too well on our project. The VP of Quality and the VP of Test both showed up at one of our project meetings and wanted to know if all of our defects had been prioritized correctly.
I told them that the review process was ongoing, as usual.
They didn’t like our process nor my answer.
The problem they saw was I was taking too much time and not immediately reprioritizing the defects.
They were sure of this because they were not seeing the spike in defects that occurred as the quality review team reviewed and increased the defect priorities. The periodic spikes were taken as indicators of Quality doing its job — identifying quality problems. The Test VP also liked it because higher priority defects showed that his team was finding important issues that needed fixing.
They were mad.
The Quality Team Reviewed Our Defect Prioritization
They told me my job was to immediately implement the new priorities.
I responded that in the previous product (which shipped on time with good quality) we always first reviewed the recommended changes. This they clearly took as rank insubordination (see initiative or insubordination for more on this notion).
Humorously, seeing that I was going to lose this one, during this conversation I messaged my team and told them to just apply all the new priorities without a review. Simultaneously the VP of Quality was messaging my Product VP. At about the same time my staff confirmed all the priorities were updated, I got a message from my VP telling me to “just do it.”
I informed the VPs that all defects were now prioritized per their recommendations.
We had a real time defect tracking system, so they immediately pulled up the trends to see the impact.
There was no change in the priority defect trend.
The expected spike did not stand out in the normal ups and downs of the overall trend.
So what happened?
The Quality Reviews Had No Measurable Impact
What happened was the normal process disposed of the defects, at the original priority, faster than the reviewing of the priorities. This highlighted a simple truth. The priority changes that were recommended made no difference as to how fast defects were disposed of nor of which ones got worked on or not. The only thing it did was to introduce a periodic spike in the trend, based upon the batches of new priorities the review team would set.
One review team member admitted in confidence that the team had agreed to always increase some defect priorities at each daily review. This was to show management that they were doing the job they were given.
I had at one time, again humorously, suggested that we should just write a script that randomly upgraded priorities and this would save hundreds of staff hours. I don’t think anyone was amused.
The Quality Team Just Believed That Quality Was Not Good
Why didn’t the VPs of Quality and Test see the data as we did? This is always hard to analyze, but in a nutshell those organizations had a belief that the development and product teams were “tweaking” the reported defects to claim quality was better than what quality and test found. They saw the periodic defect spikes after a review as evidence of this.
The fact that the data never showed any increase in how fast or how many defects were fixed, even when their priorities were updated fast enough to cause a noticeable spike, was never addressed. As long as the review caused periodic spikes in the trend, Quality felt this showed they were contributing.
Quality Had No Effective Measure Of Quality
By the way, this product shipped on time and even got recognition from our field test team (another test organization – see the best test team) for the best quality product the team had ever seen.
Quality is not always self evident, especially on large projects or products. We need to have clear and objective methods of measuring quality. Too often what passes for quality checks can be driven more by emotion and habit than by an objective measure. Knowing our project or product quality requires hard data, not just beliefs.
Are the quality measures on your project objective and do they truly measure quality?
8 thoughts on “Are We Measuring Project Quality Or Just Activity?”
Comments are closed.
Terry,
Absolutely. When I was a new software development director I was sitting in on one of my first director level meetings and listening to everyone beat up the test director for allowing so many defects to be released to the customer. After a few moments of listening, I spoke up and said “No, this sounds like a development problem – my problem. Test can’t prevent defects they can only tell us how we are doing.”
Everyone, including the test director, looked at me in amazement. We went on to improve our software production, delivering on time with good quality, because we focused on development (per your examples, I’ve used most of these at one time or another). They were stuck until then because they were putting all the focus and pressure on quality and not looking at the root causes. (More on this example at: http://pmtoolsthatwork.com/almost-a-great-test-organization/).
It can and it does work as you say, just too many people don’t believe it well enough to understand and be determined enough to make it work.
Great comment, thanks.
You cannot TEST quality into a product!
The Deming approach to Quality is first and foremost a corporate and personal commitment to quality as a “Constancy of purpose”. Everyone must be aware of and prevent the inclusion of defects in a product during development/production.
Tom McCabe, back in 1976, introduced Cyclomatic Complexity (v(G)) to the software development world. This method showed and demonstrated that overly complicated software was untestable and, thus, unmaintainable. One lesson I took from that effort is to codify testability/maintainability requirements in a Project Scope. Defects, defined as insertion of complicated code (v(G) > 10) were returned to developers as is with the Quality direction to reduce complexity. Software testing, using ‘probes’, was mandated to ensure 100% module level test coverage. Finally, code coverage graphics were included as part of Final Product Delivery documents.
Think this story doesn’t apply to today’s software? WRONG! v(G) is computed using the number of decision points and paths within a given module. There is zero reason to think it can’t work today.
Then, too, the old DeMarco saying “The quality of any product is directly related to the quality of the process used to develop it.” is still true today. When was the last time the development process itself was tested or even reviewed? How many chances to insert defects into the product presented during the process? When was the last time product requirements were tested to ensure they are Specific, Measurable, Achievable, Realistic and Testable (SMART)?
These are lessons learned the hard way by engineers over a long period of time. It seems, however, we aren’t learning from the past and are doomed to repeat it.
More comments from around the web:
Sébastien HIRSCH • Yes, you’re right the knowledge from the last project is a good overview to see what was OK and not.
But what is important is during the project to know if the level of Quality is good.
What I do is a mix with the net price of product, the deadline in term of time schedule (during the project before industrialisation).
After the industialisation it’s easy because we have PPM, net price target, delivery time….
And you how do you mesure the quality level?
Bruce Benson • Sébastien,
I had just answered a similar question over in a measurements forum, so let me just repeat it with a little editing here:
We simply divided the total defects found in past projects by the total number of new features in those products (features were pretty arbitrary in size but generally were units of approved functionality). This gave us our historical perspective on quality.
I recall that the number at the time was about 27 defects/feature. I could use that to get a general ballpark of the total defects we would see in a new or updated product during development. We also kept a history/trend line (with highs/lows) of how defects came in over the schedule of the project. This allowed us to compare our current progress to how things had proceeded in the past. The key here was to answer the question “when will we achieve sufficient quality where the customer will take our product.”
Trending defects was also very useful to see if our overall quality was going up or down. One big quality improvement was highlighted when everyone concluded our test team had not been able to do their testing because … they had only reported one defect after a week of testing (dozens would have been normal, and these teams tested many products at a time so were very practiced and thorough).
It turned out that the results were accurate – only one defect. More senior management (including test management) remained skeptical until all testing was completed and produced the same kind of single digit counts (100s were normal for completed testing). This result was much more compelling and convincing as to improved quality than the typical density quality metrics that we used.
More comments from around the web:
Capers Jones • Bruce,
It is fairly easy to measure quality. You keep track of all bugs found during inspections,static analysis, and testing. These are normalized to defects per function and the current U.S. average is about 5.0 bugs per function point with a range from between 2.0 and 7.0.
After the software is released you measure customer-reported defects at a fixed interval: 90 days is most common.
Then you calculate your defect removal efficiency (DRE). If your team found 90 bugs and customers reported 10 in the first three months your DRE is 90%. The current U.S. average is about 85%.
Top projects with a full complement of inspections, static analysis, and formal tests will hit 99% in DRE. Project using only testing seldom top 85% because most forms of testing are only about 35% efficient or find 1 bug out of 3. That is why you need so many kinds of testing.
Most hi-tech companies such as IBM, Raytheon, Motorola, and the like use these or similar measures and have been doing so since the 1970’s. You can find data in most of my books, and a lot of data in my most recent book The Economics of Software Quality, Addison Wesley, 2011.
If you want to see some of the data send an email with a valid return address to capers.jones3@gmail.com.
Regards,
Capers Jones
Bruce Benson • Capers,
In one of the places I worked, where we did not use FPs, we simply divided the total defects found by the total number of new features in the product (features were pretty arbitrary in size but generally were units of approved functionality).
I recall that the number at the time was about 27 defects/feature. I could use that to get a general ballpark of the total defects we would see in a new or updated product during development. We also kept a history/trend line (with highs/lows) of how defects came in over the schedule of the project. This allowed us to compare our current project to how things had proceeded in the past. The key here was to answer the question “when will we achieve sufficient quality where the customer will take our product.”
At one place where we had a customer reported defect metric, I could never break the code on how it was computed. However, using and trending quality via defect reports was a great way to dynamically and predictably see how we were doing prior to release.
I’ve used function points on and off since the early 90s (saved my butt on several occasions). Some organization however, just need something more accessible (for example, using their own numbers such as features) to get them moving in the right direction.
As always, thanks for collecting and sharing these kind of numbers.
More comments from around the web:
Glen Manaker • A question to open Pandora’s box… Who’s responsibility should it be to set the quality goals for a product/project? I have had the experience of being instructed to set goals, and when they were proposed, the respective managers said they were unreasonable. When managers were then approached to identify possible goals, they would not commit. However, it was still my responsibility.
What has your experience been?
Bruce Benson • Glen,
We usually had a history, past products/projects, that we used to set goals and expectations. Our goals were usually an updated projection of what we could do and when we could do it by (and how good it would be) based upon our benchmarks.
If we were given a “blank sheet” to work with — it always made me wonder and I would go dig up past data. More than once I was told not to consider our past performance because that project or projects in question did not do very well (or were claimed too dissimilar to be used). I would then ask what has changed since doing those projects and after some awkward “uh, ums” I would get something like “but you guys are going to do it right this time.”
So my approach was never to go to anyone with a clean slate and ask about goals. Instead I did my research (often after many late hours and weekends) and went to the same people and laid out goals based upon the data I had collected. I would ask them if it looked reasonable to them and would they support it (or did they have other ideas). From this we always got reasonable goals. This because it was based upon what we had been able to do in the past and often it was the first time anyone had been presented with a realistic goal for consideration.
The bigger challenge was usually more senior management (or someone who wanted to lead the project!) taking issue with goals that were noticeably different than similar goals on past projects (that were generally not greatly successful). In organizations with subpar project (product development) performance, seeing a realistic goal was often a shock. The realistic goal often admitted that we were not as fast and nimble as we liked to claim, and admitting that was a heresy.
The bottom line in my experience was I had to help them set realistic goals — which meant I had to get real smart real fast and that was a challenge. But after doing it for awhile, it is still a lot of work, but I’ve great confidence that we can set an aggressive but realistic and attainable goal.
Good question and example. Thanks.
More comments from around the web:
Mitchel Weissman • It’s a poor project manger that doesn’t schedule in Quality checks. In my 19 years in Quality I have heard PM complain and complain that they can’t meet their goals due to stoppages because of quality issues. My reply was if it in your contract book you have to meet it.
Stage one of every project starts with a contract book, so to say quality is not self evident is just a game of finger pointing.
Bruce Benson • Mitchel,
Amen! However 😉 I’ve seen quality checks/criteria piled on struggling project after struggling project. These were an attempts to help “fix” these projects but didn’t succeed (nor did we pass them and nor did they block the project for long).
In a particular case, once we finally set a realistic schedule instead of the typical “aggressive” schedule we never meet, we passed our quality checks almost effortlessly. In fact what slowed us down a bit was quality (testing in this case) not being ready to test because they didn’t really expect to start testing on time.
So one lesson learned by me is that just because we can’t pass a quality check, doesn’t mean the root cause is centered on “doing quality harder.” Sometimes our process will produce really good quality, if we allow the process to work correctly.
Good feedback.
More comments from around the web:
Mike Murphy, PMP • More amazing tales of quality measurement:
The company that periodically closes any defect more than 2 years old so that suddenly the backlog looks much smaller (and in an amazing coincidence, the auto-close always happens just prior to one of the quarterly reviews with executive management). Never mind that defects have been arriving at constant 125 per week for years on end, and that the size of the team to fix defects hasn’t changed – declare success on decreasing backlog!!!
The company that redefines test exit criteria to be “the number of defects open on planned ship date”, completely ignoring detailed exit criteria reviewed and approved by the management team.
The company that sets up performance goal for the developers based on # defects fixed, and performance goal for testers based on # defects found……can you see the tsunami of defects coming?
The company that decides to motivate developers to fixing faster by giving bonus points for defects closed, points that the developers can use to ‘buy’ company merchandise (do developers really want cheapo Top Flite golf balls with company logo on them????).
The company that reports defect priority pie chart at each monthly review, as if that pie chart actually provides meaningful data, “Who cares about the numbers from last month, or trends over last 6 months – we’ve got 25 priority A and 75 priority B defects!” Almost as if the person doing the reporting was super proud to have figured out how to create a pie chart in EXCEL! What was most interesting was the deafening silence from senior management when the pie chart was presented – seems each may have been afraid to be the one to ask, “is that chart useless or what?”
The company that creates trending chart showing future plans to decrease the backlog using ‘planned’ productivity increases in the fix rate to demonstrate management’s leadership. Of course, the details of the plan for increasing productivity remained closely guarded secret……..and inevitably, the ‘planned’ productivity increases never materialized……..which leads to the “close defects older than 2 years” method….. bringing us full circle!
Oh, the list could go on and on. 🙂
Bruce Benson • Mike,
These are priceless and I’ve lived through enough of them myself to see the painful humor in them (“deafening silence,” disappearing defects, misplaced rewards system, etc.).
Comments from around the web:
Sébastien HIRSCH • To measure the Quamity project is not do easy, but the target is to fix a real target. And lot of time at the beginning we fix an optimist level of quality but it’s offen over Quality. Is the customer pay the over Quality. I think not, it’s why it very relevant to fix the good level. But it’s the same thing is what is the realistic target and how we can follow this.
Bruce Benson • Sebastien,
Where possible, I like to base the expected quality on past project quality (projects completed in the past). This way I have an idea of what is possible and I can set expectations that are in line with past demonstrated performance.
Agreed, it is not always easy, but often I found it is easier than we think to get a good estimate of quality if we dig into past performance.