Constantly changing requirements are the bane of production at scale. Finding ways to balance the needed stability of a production environment while handling bugs and enhancements is the key challenge in operating and developing enterprise software.

Reading through Freedom's Forge by Arthur Herman, I found another hidden gem. Bill Knudsen is talking with someone from the Army Air Forces about incorporating field changes into the B-24 Liberator. At the time, the typical manufacturing loop was:  hand built aircraft went out to the field, pilots and crew would make recommendations, the AAF would collect and submit back to the manufacturers, and those changes would be in the next aircraft off the line. 100,000 man-hours of work went into each B-24 and each one could be substantively different than the previous. As a result, there were no complete blueprints of a B-24 and spare parts needed to be modified for specific planes.

Knudsen quickly saw this was the core of Willow Run's problems. Mass production can't happen in the face of constantly changing requirements.  Every time new machine dies were made and put into the line, the engineers would have to start over to meet a change request. The changes couldn't be ignored.  These flaws were discovered by the men whose lives were at risk. Knudsen's solution was simple and clear: create modification centers to deal with the changes after the plane rolled off the line. Let the manufacturing capability incorporate the changes at their pace, which they know how to do. The modifications, planned and known, could be done before the planes were sent into the field.

300 words to set the stage: Why is WWII mass production meaningful to software? Because the principle applies:  if you constantly change requirements, you will be stuck in the world of craftwork software spending thousands of man-hours writing and rewriting code. Software modularity and mass production require the same things: standardized components that can be modified in post with any modification going through a process of incorporation into the next revised design.

This is the real challenge of software development: how do I ship a product, fix bugs, address requests for enhancements, all while moving forward with delivering on the original requirements?

Agile methodologies, DevOps, Kaban, and others are symptoms of this process problem. 800 page requirement documents developed over 2+ years are symptoms of this process problem. Change control boards and operational reviews are symptoms of this process problem.

But Matt, we do those things to solve development problems! How can they be symptoms?

I say these are symptomatic because they don't address the real underlying process problem, they are work-arounds from particular points of view. Agile views the problem as not iterating fast enough, heavy requirements sees it as incomplete specification, and endless CCBs think recording every operational change will fix things.  But the problem remains: "No software survives first contact with it's end users".

These methods are all bandaids on short term failures.  Failure is a bad word, but failure is where learning actually happens.  Edison famously said, "I've not failed.  I've just found 10,000 ways that didn't work."  We need to fail to improve software, and we need to a wide variety of failures in broad spectrum to improve software environments.

The solution is a combination of approaches, taking advantage of distributed code management for parallelizing main line feature development with operationally highlighted requirements, refactoring the complete architecture when major changes are required, tying developers and administrators at the hip in separate but equal operational and engineering teams.

There's no easy button for software. Either you do it yourself, which means you need to take on the balancing act between SLAs and new features, or you buy it from someone else and hope you have a loud enough voice (or deep enough pockets) to keep them responsive. Either way, there's no way around SLA management. And the "right" answer is always different in any situation.

There are folks in the Fed working on that tightrope now. There are areas where the government absolutely needs to be on the leading edge of innovation because there's a business driver to solve the problem. Accumulo is a great example there; cell level HBase security doesn't solve a private sector problem.  Without a procurement to design it, it wasn't going to happen outside the NSA. Now that it's been developed and released, they can be a consumer for all the interesting things built on top of that technology. But there will still be government first innovation that will happen on top of that platform resulting from those new capabilities.

Build versus buy, cloud versus traditional, private versus public, red versus blue. The answer is always yes because these are only part of the answer, each can be made to work.

Photo courtesy of www.af.mil