Why is Software So Complex?

Q: Why is most modern software so mindbogglingly complex, with multiple layers of abstraction stacked on each other? Why do they not make simple, efficient software like they used to earlier?

This question was asked on Quora recently. I answered that it is due to a few major reasons:

1. Code Maintenance

There’s an old humor article that has circulated online for many years, titled, “If Architects Had to Work Like Programmers.” It’s written as if it’s a letter from a homebuyer who wants an architect to build a house. Here’s an excerpt:

“Please design and build me a house. I am not sure of what I need, you should use your discretion. My house should have between two and forty-five bedrooms. Just make sure the plans are such that the bedrooms can be easily added or deleted. When you bring the blueprints to me, I will make the final decision of what I want. Also bring me the cost breakdown for each configuration so I can arbitrarily pick one.”

This is humorous because it sounds like the way programmers are given their software requirements. Because software can be modified after it is created, the employer assumes it’s easy to do that, and that they don’t need to be specific about what they want.

People in the software development field over many years have tried their best to accommodate this, by creating more and more abstractions so that the pieces of software can be altered, combined, upgraded, or swapped out more easily.

The employer wants this because it enables them to get software on an affordable schedule, without forcing the employer to write detailed specifications that they probably don’t know in advance anyway.

The programmer wants this because they want to remain employed.

2. Code Reusability

A good way to increase code quality without increasing the schedule is to write less bespoke code, but instead use more code that has been written and tested well prior to your project. We call these libraries, frameworks, templates, or code generators.

You’ve probably used Lego toys, where you can build elaborate models using simple reusable bricks or other specialty pieces. You can build practically anything by using enough bricks. There are also some specialized shapes, and lots of guides showing you how to combine them to build the desired models.

Image: Queen Mary model in Lego.

It’s a similar concept to code reusability. Software development then becomes an activity of learning all the different pieces and ways to use them together.

The code is reusable because of abstractions. Like the Lego pieces that use standard dimensions and connecting buttons so they can be fastened to other pieces.

3. Features, Features, Features

I once developed an app for a manager who was very decision-challenged. Every time I would ask him, “do you want the app to behave this way or that way?” I was often asking about two alternatives that were mutually exclusive. For example, do you want the report to arrange categories of data in rows or in columns?

He would always answer, “both.” He didn’t know how to choose, and he was afraid of making the wrong choice. So he asked me to implement both alternatives, and make the software configurable. He wanted to keep his options to change his mind later as often as he wanted to.

This at least doubled the work to implement the code, and doubled the testing needed to assure it works.

But it was worse — every time he said, “both” this doubled the number of test cases, because I had to assure that a new feature worked with every combination of alternatives for the past features.

Programmers can’t say “no” when their employers want some features. They can say, “okay, but here’s what it’ll cost in time and money, do you still want it?”

How do Software Engineers cope with stress?

The typical sources of stress for software engineers are not caused by technology. They’re caused by managers and projects.

Work tasks are not described clearly.

If I’m given unclear work requirements, I ask for more details. I make it clear that I can’t give an estimate for the cost or the time of completion until I know the full scope of work. In a professional software engineering environment, a significant amount of time should be spent estimating the work based on complexity of the task. You can get yourself into a stressful obligation if you agree to a deadline before knowing the requirements of the project.

Be careful about this. Managers often insist that the software engineer decide on the estimate much too early, and then they use that against the engineer later, saying it was an estimate that the engineer had made, so they can’t claim it was imposed on them.

When you are asked for an estimate before you know the scope of work, remember to use this standard response: “I’ll have to get back to you.”

Schedule is too aggressive. Deadlines are impossible.

At one job I joined, I was hired to get a software project back on track after it had fallen behind schedule and the previous team lead had quit. I knew generally what the project was, but I didn’t know how much was done and how much still needed to be done. On my first day, a marketing person told me he wanted to present this software to customers at an annual conference, which was happening in two weeks. He wanted me to promise to finish the software by that time. I told him I would need two weeks just to learn the current state of the project and make an estimate for the completion. It was a little bit stressful to tell him no, but it would have been much worse to make a promise and then fail to deliver.

Finishing on schedule requires long work hours and few breaks.

Don’t fall into the trap of letting management bully you into working until you are exhausted. If you do that, you will make more mistakes, and your code will need to be scrapped and written over. You are not a machine — and even machines need time for maintenance, cleaning, repair, etc. If you keep yourself healthy and your mind fresh, you will be able to concentrate better and produce better quality work. You will have a better chance of finishing the work on time.

New requirements are added late in the project, but the workers are held to the original schedule.

When they want to add more features, tell them you’ll evaluate the new requirements to see if they can be added with minimal interruption to the schedule. Do your best and make an honest effort to do that. But sometimes the new feature is requires major changes to the current code design, which is partially implemented already.

Approach the product manager and let them know this. Present to them the following options:

  • Postpone their new feature idea until “phase 2” (that is, a future revision of the software).
  • Cut some other requirements from the current project that are time-consuming and not yet implemented.
  • Make a compromise to reduce the complexity from one or more features, to make them take less time to implement.
  • Extend the project deadline to give enough time for the extra features.

If they still demand that they want all the features and no change to the schedule, that’s not realistic. Politely tell them that they need to choose one of the options or they will be disappointed. This makes the tough choice their responsibility, which reduces your stress.

Unscheduled work and alerts interrupt and spoil concentration.

This is a great source of stress because it’s difficult to resume work that requires concentration after an interruption. This has been studied a lot. It’s not just the time it takes to do the unscheduled task. It also takes time to shift your focus between tasks. If this happens several times per day, you can lose all your productivity for the whole day.

If software engineers are expected to be oncall or to help with unscheduled analysis or troubleshooting or technical support, then they should make it clear that any schedule estimates are in unpredictable. Your stress comes from uncertainty that you can be productive enough to meet your deadline. You can mitigate this stress by insisting that the deadline must be extended every time you are interrupted.

Software engineers are expected to understand and be productive with any type of technology with no time for training.

The best way to cope with this is to fib a little bit and add some time to every estimate, to allow for research, self-training, debugging, and getting answers from technical support. It’s unfortunately a reality that management can’t justify budget for training time, if they are already paying high salaries to software engineers. So you have to include the necessary training time with engineering estimates.

One way to hide training as engineering is to schedule part of the project to implement a “prototype” or a “proof of concept.” These basically mean you’re going to be practicing, and the result will be an unoptimized implementation, intended to be scrapped and redone before the final deadline.

The Private Option

There’s a famous case of a fumbled rollout of a website: HealthCare.gov, the federal health insurance exchange used by independent insurance customers in about two-thirds of states in the USA.

These days, the an updated version of HealthCare.gov functions fine, so you’re wondering what the hubbub was about when it was launched.

Poor Debut

Proponents said that a slow rollout is not unexpected. People who managed the health insurance exchange in Massachusetts that served as the model for the Affordable Care Act say that the same initial bugs and slow adoption affected their program too.

The site has performance and scalability problems, has an overly complex user experience, and sometimes calculates wrong answers. The result is that of the 100,000 people who signed up for independent health insurance after October 1 2013, fewer than 27,000 used the federal exchange.

Why Did it Fail?

HealthCare.gov had a major obstacle: they had to handle several times the originally anticipated demand. The original plan was for each US state to implement their own health insurance exchange, to serve people in their respective state, and HealthCare.gov would handle those who couldn’t. It was assumed that only a small minority of states would rely on HealthCare.gov, and these would be the states with smaller populations. As it turned out, a majority of states refused to implement their own exchange web sites. In December 2012, when the states were required to have blueprints describing their solution, reportedly 25 states didn’t meet that deadline
By the time of the rollout of the ACA, only 14 states were signing people up using their own state-run exchange, whereas the rest of the states–more than two-thirds–were relying on the federal exchange. These include some of the highest population states like Texas and Florida, and 20 states who had taken federal money to plan their state exchanges, but ultimately also relied on the federal exchange.
The Private Option

A few young programmers created an alternative web site they call HealthSherpa.com in their spare time, after the ACA debut on October 1 2013. Their web site is a prototype effort to make a more streamlined portal for people to find the health insurance plans they’re eligible for. It seems to work, and it’s very fast. It uses raw data that is accessible publicly from the federal government.

It’s a valid question then: why didn’t the federal government—or any of the states—employ a small team of web experts to throw together such a site for a fraction of the cost?

HealthSherpa.com doesn’t have all the functions that HealthCare.gov is supposed to. It doesn’t do credit checks, it doesn’t actually even sign anyone up for health care. It just allows consumers to find the data that pertains to them, and then it links to the websites for the respective insurance carriers. And HealthSherpa.com doesn’t create the data—it might be true that part of the effort behind HealthCare.gov has created the raw data that HealthSherpa.com uses.

Also, HealthSherpa.com isn’t (yet) serving tens of millions of users, as HealthCare.gov is supposed to do. I work for Percona, a company that offers consulting and support for database operations, which is just one aspect of web site scalability. Scalability for a web site is complex, much more difficult than most people appreciate.

But it’s worth noting that even with these limitations, there’s a pretty big difference between a three guys throwing together a working website in a few days, versus major federal IT contractor CGI Federal spending $174 million since they announced winning the contract in December 2011 (i.e. 22 months until their go-live deadline of October 1 2013), but they still failed to implement a site that could handle the demand.

Conclusion

So here’s some hindsight views on the HealthCare.gov project:

  • They should have anticipated the demand from all 50 states. This may have been over-engineering, since the intention was to serve only a minority. But they had no control over which states would agree to create their own exchanges, and every reason to think there would be political resistance to doing so.
  • They should have had a beta test period. No large-scale web site can handle the load of millions of users on its first day, not even sites implemented by major web experts like Google and Amazon. They restrict enrollment to a limited subset of their users, sometimes by invitation only. They leave enough time to work out the problems before going fully public.
  • They should have provided raw data only, not the whole web site. Let other entrepreneurs innovate the best way to search the data. Maybe someone would even create a Facebook game for selecting your insurance.
  • They should have set the deadline after scoping the project.

C Pointers Explained, Really

While I was in college, a friend of mine complained that he was confused while programming in C, struggling to learn the syntax for pointers.

He gave the example of something like: *x=**p++ being ugly and unreadable, with too many operations layered on each other, making it hard to tell what was happening.  He said he had done a bit of programming with assembly language, but he wasn’t accustomed to the nuances of C.

I wrote the following explanation on our student message board, and I got a lot of good feedback.  Some people said that they had been programming in C for years, but not until they read my post did they finally understand pointers.  So here it is, unearthed from my backups and slightly edited.  I hope it helps someone again…


Message 1956 (8 left): Thu Jan 25 1990 2:44am
From: Bill! (easterb@ucscb)
Subject: Okay

Well, if you know assembly, you have a head start on many of the CS freshpersons here. You at least know about memory maps: RAM is a long long array of bytes. It helped me to learn about pointers if I kept this in mind. For some reason, books and instructors talking about pointers want to overlook this.

When I have some code:

main()
{
  int n;
  int *p;

There is a place in my memory that looks like this:

            :
Address:    :
         |-----|
0x5100   |     | n is an integer, one machine word big
         |-----|
0x5104   |     | p is a pointer, also one word big
         |-----|
0x5108   |     | other unused memory
         |-----|
            :
            :

Let’s give these variables some values. I set n to be the number 151.

n = 151;

I set the pointer p to point to the integer n.

p = &n;

That says, “the value of the variable p is assigned the address of the variable n”.

           :
Address:   :    Value at that address:
         |----|
0x5100   | 151| n
         |----|
0x5104   |5100| p
         |----|
0x5108   | ?  |
         |----|
           :
           :

Now I want to print out the value of n, by two ways.

printf("n is %d.n", n);
printf("n is %d.n", *p);

The * operator says, “give me the object at the following address.” The object’s type is the type that the pointer was declared as. So, since we declared “int *p”, the object pointed at will be _assumed_ by C to be an int. In this case, we were careful to make this coincide with what we were pointing at.

Now I want to print out the memory address of n.

printf("n is located at $%x.n", &n);
printf("n is located at $%x.n", p);

The & operator says, “tell me the address where the following object starts.” In this case, it is hex 5100 (I put a ‘$’ before it, to conform to the Assembly notation I am used to). Notice the _value_ of p is an address.

Hm. Does p have an address? Sure. It is a variable, and all variables have their own address. The address of p is hex 5104.

printf("p is located at $%x.n", &p);

Here we are taking the address of a pointer variable, using the & operator.

main()
{
char name[] = "Bill";
char *p;
int *q;

Now we have an array to play with. Here’s how memory looks now:

       |---|
0x5100 |'B'| "name" is an address constant that has value hex 5100
       |---|
0x5101 |'i'| char: 1 byte
       |---|
0x5102 |'l'| char: 1 byte
       |---|
0x5103 |'l'| char: 1 byte
       |---|
0x5104 |   | char: 1 byte
       |---|
0x5105 |   | p is a pointer: 1 word
       |---|
0x5109 |   | q is a pointer: 1 word
       |---|
p = name;

We set p to the value of name. Now p has value hex 5100 too. We can use the * dereferencing operator on p, and get the character ‘B’ as a result.

Now what happens if I do this:

++p;

The pointer p is incremented. What value does it have now? Hex 5101. Pretty simple.

Now let’s try something irresponsible:

q = name;

But q is a pointer to int! If we dereference q, it will take the word (typically 4 bytes) beginning at address “name” (which is hex 5100) and try to convert it to an int. ‘B’, ‘i’, ‘l’, ‘l’ converted to an int will be some large number, dependant on the bit-ordering algorithm on your machine’s architecture. On ucscb, it becomes 1114205292. (to see how, line up the binary representation of the ascii values for those 4 characters, and then run the 32 bits together, and convert that resultant binary number as an integer.)

What we have just seen here is a key issue of pointers that I mentioned earlier: C assumes that what they are pointing at is an object of the type that the pointer was designed to point at. It is up to the programmer to make sure this happens correctly.

++q;

The int pointer is incremented. What value does it have now? Hex 5104. Huh?!? The answer is simple if you accept the above paragraph. It gets incremented by the size of the object it _thinks_ it is pointing at. It’s an int pointer, so incrementing it makes it advance a number of bytes equal to the size of an int.

Now print the dereferenced value of q (i.e. the value of the object q is pointing to). Well, it’s pointing at a null byte, and then the first 3 bytes of the char *p. Now we’re all messed up. Nice going. Try to convert _that_ to an integer representation. Well actually, C will do it happily. But it’ll be another weird number.

main()
{
  int n;

  n = 151;
  f(n);
}

f(x)
int x;
{
  printf("%d.n", x);
}

Here is a simple program that passes an int “by value”. That is, it copies the value of n into the new variable x!

       |---|
0x5100 |151| n is an integer
       |---|
0x5104 |151| x is another integer
       |---|

When we mention x, we are using the value at location 5104, and we can change it, read it, whatever, and it won’t affect n, the int at location 5100.

But what if we want to have f() modify the value and then have that new value be available in main()? C does this by passing the variable “by reference”.

main()
{
int n;

  n = 151;
  f(&n);
}

f(x)
int *x;
{
  printf("%d.n", *x);
  *x = 451;
}

Pass the _address_ of n, and declare x as a _pointer_ to int. Actually, this is still passing by value, but the value being passed is the address, not the number.

       |----|
0x5100 | 151| n is an integer
       |----|
0x5104 |5100| x is a pointer to int
       |----|

Now if f() when we make use of *x, we are referring to the value at location 5100. This is the location of n. After the assignment “*x = 451;”, this is what we have:

       |----|
0x5100 | 451| n is an integer
       |----|
0x5104 |5100| x is a pointer to int
       |----|

x still points to location 5100, but we have changed the value of the object at that location.

Well, those are the basics. You mentioned things like “*x=**p++” being ugly and unreadable. Well, yeah, but here is a diagram that may help:

       |----| here is a word in memory with initial value 0. 
0x5100 | 0  | no variable name
       |----|
0x5104 | 12 | here is a value, a word in memory. no variable name.
       |----|
0x5108 |5104| Here is an int pointer, pointing at the previous word.
       |----|
0x511c |5108| here is p, a pointer to int pointer.
       |----|
0x5120 |5100| here is x, a pointer. guess where it's pointing.
       |----|

First let’s see what p and x were declared as:

int *x; /* pointer to int */
int **p; /* pointer to pointer.

The subordinate pointer is a pointer to int.*/

You should know now what “*x” means. It means, “the value of location 5100.”

And you know what “*p” means, “the value of location 5108”. Now that value is another address! Okay, let’s dereference that address: “**p” and we find (by the declaration) an int.

Now “*x = **p” looks like, “this int at 5100 gets the value of that int at 5104.”

And what does “**p++” mean? Well, ++ binds tighter than *, so this is equivalent to: *( *( p++ ) )

Or, “pointer to pointer to int, and by the way, after we’re done, p has been incremented. But we looked where it was pointing before it got incremented, so we don’t care. Let the next statement worry about it.”


This content is copyright 2012 by Bill Karwin.  I’ll share it under the terms of the Creative Commons License, Attribution-NonCommercial-ShareAlike 3.0 Unported.

How to Save $100 Million

Last night I listened to an interesting interview on public radio, relating a story from the New Yorker magazine about Michigan Dr. Peter Pronovost saving millions of dollars and hundreds of lives of patients.

How did he do it? He taught doctors and nurses to use checklists to avoid mistakes in the intensive care units of hospitals. Mistakes that could put patients’ health or lives at risk.

What’s interesting about this story is that it’s an extremely low-technology solution to a type of problem that exists in virtually every field. In this case, it applies to medical care. But it easily applies to manufacturing. In Japan, they call it poka-yoke, or mistake-proofing. Check out this book too: “Mistake Proofing: Designing Errors Out

Do checklists and similar techniques mean hamstringing the creative process in these fields? Absolutely not! On the contrary, effective use of checklists can free our attention from repetitive details, so that we can devote more of our energy to innovation and creativity. We don’t need to keep the details of well-understood procedures in our short-term memory, if we write down the steps so that we can do them without burden, or delegate the work to a teammate.

Why not go the extra step and create technology to automate those procedures? Because a checklist doesn’t necessarily remove the requirement for human attention, to exercise good judgment and critical thinking. Some steps may require analysis, or may be performed conditionally based on the result of a previous step. It’s usually very expensive to make a machine that does that kind of analysis.

Procedures are inexpensive to modify and prototype when humans perform them. It might turn out that the whole procedure is revealed to be incorrect, and needs to be re-thought. If automation technology had been developed for that procedure, the cost of developing that technology would be wasted. If the procedure were merely a checklist, then we just need to re-train the operation staff and voilà!

Notice that books like David Allen’s “Getting Things Done” focus on non-technological methods for organizing and avoiding letting things fall through the cracks.

If checklists and other easy organizational techniques are such a good idea, why don’t we employ them more? In the article about Dr. Peter Pronovost, he remarked that it’s surprising that it has taken so long to adopt his methods, and if there were a drug that achieved the same positive results he does, it would be mandatory in every hospital. A clue to the explanation is in if there were a drug. Follow the money! The solution that is marketed most aggressively is not the one that is most cost-effective; it’s often the one that is least cost-effective, because its vendor stands to make the most money from that one.