Don’t Put the Cart Before the Horse

April 2nd I made this undiplomatic statement (funny how Twitter practically encourages being provocative):

#ZF 2.0 is a great example of second-system syndrome.

Matthew Weier O’Phinney and I have a good working relationship. I think his work on the Zend Framework project has been amazing, both from a technology perspective and a marketing perspective. 
Matthew and Bill
So when Matthew asked me to clarify my Tweet, I was happy to reply, in the spirit of constructive criticism. These thoughts apply to many projects–not just ZF–so I thought they would be of general interest. Here’s the content of my reply:

When I’ve reviewed project proposals or business plans, one thing I often advise people is that you can’t describe the value of a project in terms of how you implemented it. Users don’t want to hear about how you used XML, or dependency injection, or unit tests, or agile methodology, or whatever. They want to hear what they can do with this product.

After reading the roadmap for ZF 2.0, I observed that a great majority of the planned changes are refactoring and internal architectural changes. These are worthwhile things to do, but the roadmap says very little about the feature set, or the value to users.

What I’m saying is that implementation does not drive requirements. That’s putting the cart before the horse.

I admit that for a developer framework, this line is more blurry than in other products. Your users do care about the architecture more than they would for a traditional application. But that still doesn’t account for the emphasis on implementation changes in the roadmap, and the lack of specific feature objectives.

For instance, some goals for the controller are described in a list of four bullet items: lightweight, flexible, easy to extend, and easy to create and use custom implementations (which sounds close to easy to extend). Then it jumps right into implementation plans.

So how flexible does it need to be, and in what usage scenarios? What does lightweight mean? How will you know when it’s lightweight? Are there benchmark goals you’re hoping to meet?

Another example is namespacing. Yes, using namespaces allows you to use shorter class names. Is that the bottleneck for users of ZF 1.x? Do you need to create a namespace for every single level of the ZF tree to solve this? Would that be the best solution to the difficulties of using ZF 1.x?

The point is that the way to decide on a given implementation is to evaluate it against a set of requirements. You haven’t defined the requirements, or else you’ve defined the requirements in terms of a desired implementation.

My view is that requirements and implementation are decoupled; a specific implementation should never be treated as one of the requirements, only a means of satisfying the requirements.

Bill Karwin

Announcing Awk on Rails

Awk on Rails is a new kind of web application development framework, with a distinction that no other framework has: Awk on Rails is fully POSIX compliant.

Awk on Rails brings the best practices of modern web application development to the ALAS stack (Apache, Linux, Awk, Shell). This stack is entirely new to the field of web development, yet already brings decades of maturity.

  • Installation is a breeze — in fact, it’s unnecessary, because Awk on Rails uses commands and tools already provided by your operating system.
  • Develop web applications that leverage the power of high-speed interprocess I/O pipelining, utilizing POSIX regular expressions to optimize request routing through common gateway interfaces.
  • Generate your Awk on Rails application code–using awk! A sophisticated script-based front-end called wreak takes care of it for you.
  • You get unlimited flexibility to customize the base application scripts, using your choice of development environment: vi or emacs.
  • SQL? We got NoSQL! We don’t need no stinking SQL! Tired of being confused by relational databases? Manage your data in an “X-treme” non-relational data store exclusive to Awk on Rails. It’s called Hammock, and it’s based on the POSIX key-value system NDBM. To initialize your data store, it’s as simple as running the command: wreak hammock.
  • Design and render application views using the simple and popular M4 language. We all know we need to keep application design separate and free from logic. Awk on Rails can make sure this happens!
  • Embedded source code documentation is easy using a custom macro package. Create ready-to-typeset manuals with one simple command: nroff -Mawkdoc.
  • Awk on Rails comes with example applications to get you started, including a blogging & content management platform AwkWord, and a syndication provider AWRY.
  • Does it scale? Of course! Thanks to the power of Moore’s Law, you’ll stay ahead of the curve over the long haul.
  • Development, deployment, and distribution are all powered by a convenient set of three distinct software licenses. No other framework supports this many licenses! Contributing back to the Awk on Rails project? You get to sign and submit a fourth license — at no charge!

You will soon be able to download source for Awk on Rails and join its development community, at the social source repository As soon as we figure out whether the licenses allow us to distribute our own source code, you may be able to use it in your projects too!

Look for future Awk on Rails developments and announcements in 2010.* Also look for an innovative cloud computing extension to Awk on Rails, called VaporWare.

Awk on Rails: Not Really Rapid, Not Exactly Agile, More Like Dodgy.

* Awk on Rails comes with no guarantee of release dates or timeliness of announcements. Check your calendars.

Quantity Over Quality

Alex Netkachov recently reported a list of micro-optimizations for PHP. Several other bloggers (Sebastian, Maggie, Pádraic) responded with appropriate messages, reminding people that proper application design usually counts more than micro-optimizations.

They are all correct.

When I was an intern, I emailed a C compiler developer, to ask a question that had occurred to me regarding optimization: which is faster, ++i or i++? Assuming either form will work in my case, as in the increment expression in a for() loop. His response (paraphrased):

“By emailing me this question, you have already wasted more computing resources than you will ever save by choosing one form over the other during your entire programming career.”

(I’m still not sure if he meant to emphasize that the performance difference between the two expressions was extremely small, or that he didn’t think very highly of my career prospects. I’ll prefer to assume the former.)

A list of performance factoids like those listed by Alex are missing the guidance and wisdom that software developers need to judge their importance. All of the responses from other bloggers are similarly qualitative, instead of quantitative.

I know that it’s hard to make quantitative statements with regards to optimization.

  • How much benefit can I get by replacing print with echo? It depends on how much printing you do in a given application — and also what else you’re doing in that application.
  • Can I benefit from caching page output or results of resource-consuming functions? Probably, but not if the content is 100% dynamic and must be re-calculated for each request.
  • Which of these micro-optimizations should I employ with greater priority? Which is the best use of my development time?

These micro-optimization tips are interesting and worth knowing, but they should also be taken with a grain of salt. Their importance varies, depending on the nature of your application. There are no magic words that are guaranteed to double performance in every application.

Finding the best way to optimize your code is your job as a software developer. You must use scientific measurement, as well as good judgment, experience, and intuition to get your job done most effectively.

Accepting a job that failed The Joel Test

A user recently asked:

I’m about to accept a job offer for a company that has failed The Joel Test with flying colors.

Now, my question is how do I improve the conditions there. I am positive that within a few months I will be able to make a difference.

But where do I start? And how?

Don’t view yourself as the “new sheriff in town” who’s here to clean it all up in one year. The habits they have settled into have been a long time forming.

Watch and listen, and ask questions about the most severe and recurring pain points. Find out what bad habits have actually caused loss of work, late nights, quality problems, or lost customers. Try to quantify the cost of these bad habits.

Then at some point talk to your boss in a one-on-one meeting and make a proposal for how you could mitigate one specific risk that seems to be their biggest problem. It could be almost anything on the Joel Test, but I’d guess it’s most likely to be one of:

  • No source code control means the code is a mess, with lots of “commented out” sections. Can’t track which code changes were made for a given bug. It’s hard to do major features in parallel with ongoing maintenance. No way to roll back changes. No way to track which developer did what changes.

  • No build process means some code changes exist only on the live server. Developers are constantly pushing and pulling code to and from the live server. No one has a development environment that’s in sync with the live code, so it’s hard to reproduce bugs.

  • No bug database means some tasks “fall through the cracks” from time to time. Customers report bugs that fall into a black hole. Managers don’t know what’s being worked on. Employees have no record of their work when it comes time for annual reviews.

When presenting the solutions, don’t try to justify them with abstract concepts like “best practices” or “it’s the industry standard way” or anything so intangible. If those were enough to motivate this company, they would have done it by now.

Instead, focus on what is their deciding factor. I’d guess it’s probably related to how much time and money it costs the business to use best practices, versus how much it can save them. But you should find out if this is really their reason. It’ll take some setup work to establish these tools and practices, but you can explain the recurring benefits for quality, productivity, and predictability of the development work. All those can contribute to the business’ bottom line.

In one year, you’ll be doing extremely well if you can make just one change to help them. It’ll take a lot of patience to overcome a development culture that has been building for so long. Keep in mind that the rest of the team isn’t there by coincidence — they may actually be compatible with that level of disorganization.

I’m posting to my blog the questions I’ve answered on StackOverflow, which earned the “Good Answer” badge. This was my answer to “Accepting a job that failed The Joel Test

Unit Test Coverage

S.Lott writes in his blog about unit test code-coverage: how much is enough?

Effective tests should account not only for code paths, but also input values and other application state or external environment that may affect the behavior.

For example, it may be easy to get 100% code coverage from tests for a function like the following:

divide(x, y) { return x/y; }

But unless you test for division-by-zero (when the parameter y is zero), you haven’t tested sufficiently.

The code-coverage metric doesn’t reveal when you’ve tested a good variety of input values. It only tests if your tests have visited the given lines of code, not what values were in each variable at the time.

Likewise for other application state besides input parameters. Values in other application objects, the contents of databases or files, or the operating system environment can all affect the proper functioning of a class or function that you’re testing. These variations are not measured by code-coverage metrics.

It could be argued that if you’re testing for external state, you aren’t doing unit testing by its strict definition; you’re doing functional or system testing. Nevertheless, most people rely chiefly on unit testing tools, because automated unit testing tools that generate code-coverage metrics are pretty easy to use.

While it’s a worthwhile goal to try to get high code-coverage in your unit-testing, a score of 100% doesn’t guarantee that you’ve tested enough. Likewise, a score below 100% isn’t necessarily an indication of inadequate testing. Code-coverage is therefore not a goal in itself; it’s one way of measuring one type of testing.

Understanding Unfamiliar Databases

A user recently asked:

What kind of approaches and techniques can you employ to become familiar with an existing database if you are tasked with supporting and/or modifying it? How can you easily and effectively ramp up your knowledge of a database you have never seen before?

Here was my reply:

  • The first thing I do is create an Entity-Relationship Diagram (ERD). Sometimes you can simply describe the metadata with command-line tools but to save time there are some tools that can generate a diagram automatically.

  • Second, examine each table and column make sure I learn the meaning of what it stores.

  • Third, examine each relationship and make sure I understand how the tables relate to one another.

  • Fourth, read any views or triggers to understand custom data integrity enforcement or cascading operations.

  • Fifth, read any stored procedures. Also read SQL access privileges if there are such.

  • Sixth, read through parts of the application code that use the database. That’s where some additional business rules and data integrity rules are enforced.

I’m posting to my blog the questions I’ve answered on StackOverflow, which earned the “Good Answer” badge. This was my answer to “What are the Best Ways to Understand an Unfamiliar Database?

Verifying a Company Uses Best Practices

A user recently asked how to use the Joel Test in an interview, to confirm that a software company practices what they preach with regard to professional software development habits:

I’ve got an interview with a company that claims to score a 12 on the Joel Test. […] What are some ways of determining if they really implement all 12 points? Are there any particular questions I can ask?

It’s reasonable to say, “show me.” Ask them for examples and concrete details of their support for the Joel Test subjects. Since they claim they score all 12 points, they are obviously proud of it. People tend to like to show off, so they’ll probably be eager to share more details.

If you ask more specific questions, it’ll become apparent from their descriptions whether they really have those good practices.

We can think of many specific follow-up questions to the basic questions. The Joel Test questions are in bold below, and my follow-ups, er, follow:

  1. Do you use source control? What source control system do you use? Why did you pick that one? What is your branch/release policy? What are your tag naming conventions? Do you organize your tree by code vs. tests at the top with all modules under each directory, or do you organize by module at the top with code and tests under each module directory?
  2. Can you make a build in one step? What tools do you use to make builds? How long does it take to go from a clean checkout to an installation image? What would it take to modify the build? Is it integrated into your testing harness? What would it take to duplicate a build environment? Are the build scripts and tools also under source control?
  3. Do you make daily builds? What software testing tools do you use for daily builds? Do you use a Continuous Integration tool? If so, which one? How do you identify who “broke the build?” What is your test coverage?
  4. Do you have a bug database? What bug tracker software do you use? Why did you pick that one? What customizations did you apply to it? Can you show me trends of rate of bugs logged or bugs fixed per month? How does a change in source control get associated with the relevant bug?
  5. Do you fix bugs before writing new code? What is your bug triage process? Who is involved in prioritizing bugs? How many bugs did you fix in the last release of your product? Do you do bug hunts with bounties for finding critical bugs?
  6. Do you have an up-to-date schedule? Can I see it? How far are you ahead of/behind schedule right now? How do you do estimating? How accurate a method has that turned out to be?
  7. Do you have a spec? Can I read one? Do you have a spec template? Can I see that? Who writes the specs? Who reviews and approves the specs?
  8. Do programmers have quiet working conditions? Can I see the cubicle or work area for the position I’m interviewing for? (or an equivalent work area)
  9. Do you use the best tools money can buy? What tools do you use? Are you up to date on versions? What tools do you want you don’t have yet? Why not?
  10. Do you have testers? How many? Can I meet one? Do testers do black-box or white-box testing?
  11. Do new candidates write code during their interview? What code would you like me to write? What are you looking for by seeing my code?
  12. Do you do hallway usability testing? How frequently? Can I see a report documenting one of your usability testing sessions? Can you give me an example of something you changed in the product as a result of usability testing?

Beware if their answers to the specific follow-up questions are evasive like, “um yeah, we are committed to doing more best practices and we’ll be looking to you to help us effect changes toward that goal.” If they’re so committed to it, why don’t they have anything to show for it yet? Probably because like many companies, when the schedule is in jeopardy, following “best practices” goes out the window.

I’m posting to my blog the questions I’ve answered on StackOverflow, which earned the “Good Answer” badge. This was my answer to “Administering the Joel Test.”

Do I really need version control?

A user recently asked:

I read all over the internet (various sites and blogs) about version control. How great it is and how all developer NEED to use it because is a god bless.

Here is the question: do I really need this? … I usually work alone (freelancer) and I had no client that asked me to use svn (but never is too late for this, right?). So, should I start and struggle to learn to use svn (or something similar?) Or it’s just a waste of time?

Here’s a scenario that may illustrate the usefulness of source control even if you work alone.
Your client asks you to implement an ambitious modification to the website. It’ll take you a couple of weeks, and involve edits to many pages. You get to work.
You’re 50% done with this task when the client calls and tells you to drop what you’re doing to make an urgent but more minor change to the site. You’re not done with the larger task, so it’s not ready to go live, and the client can’t wait for the smaller change. But he also wants the minor change to be merged into your work for the larger change.
Maybe you are working on the large task in a separate folder containing a copy of the website. Now you have to figure out how to do the minor change in a way that can be deployed quickly. You work furiously and get it done. The client calls back with further refinement requests. You do this too and deploy it. All is well.
Now you have to merge it into the work in progress for the major change. What did you change for the urgent work? You were working too fast to keep notes. And you can’t just diff the two directories easily now that both have changes relative to the baseline you started from.
The above scenario shows that source control can be a great tool, even if you work solo. Source control can solve many problems for you, such as the following:

  • You can use branches to work on longer-term tasks and then merge the branch back into the main line when it’s done.
  • You can compare whole sets of files to other branches or to past revisions to see what’s different.
  • You can track work over time (which is great for reporting and invoicing by the way).
  • You can recover any revision of any file based on date or on a milestone that you defined.

For solo work, Subversion is recommended. CVS is all but antiquated, and GIT is more useful for distributed teams. A good book is Pragmatic Version Control Using Git by Travis Swicegood.

I’m posting to my blog the questions I’ve answered on StackOverflow, which earned the “Good Answer” badge. This was my answer to “Do I really need version control?