Announcing Awk on Rails

Awk on Rails is a new kind of web application development framework, with a distinction that no other framework has: Awk on Rails is fully POSIX compliant.

Awk on Rails brings the best practices of modern web application development to the ALAS stack (Apache, Linux, Awk, Shell). This stack is entirely new to the field of web development, yet already brings decades of maturity.

  • Installation is a breeze — in fact, it’s unnecessary, because Awk on Rails uses commands and tools already provided by your operating system.
  • Develop web applications that leverage the power of high-speed interprocess I/O pipelining, utilizing POSIX regular expressions to optimize request routing through common gateway interfaces.
  • Generate your Awk on Rails application code–using awk! A sophisticated script-based front-end called wreak takes care of it for you.
  • You get unlimited flexibility to customize the base application scripts, using your choice of development environment: vi or emacs.
  • SQL? We got NoSQL! We don’t need no stinking SQL! Tired of being confused by relational databases? Manage your data in an “X-treme” non-relational data store exclusive to Awk on Rails. It’s called Hammock, and it’s based on the POSIX key-value system NDBM. To initialize your data store, it’s as simple as running the command: wreak hammock.
  • Design and render application views using the simple and popular M4 language. We all know we need to keep application design separate and free from logic. Awk on Rails can make sure this happens!
  • Embedded source code documentation is easy using a custom macro package. Create ready-to-typeset manuals with one simple command: nroff -Mawkdoc.
  • Awk on Rails comes with example applications to get you started, including a blogging & content management platform AwkWord, and a syndication provider AWRY.
  • Does it scale? Of course! Thanks to the power of Moore’s Law, you’ll stay ahead of the curve over the long haul.
  • Development, deployment, and distribution are all powered by a convenient set of three distinct software licenses. No other framework supports this many licenses! Contributing back to the Awk on Rails project? You get to sign and submit a fourth license — at no charge!

You will soon be able to download source for Awk on Rails and join its development community, at the social source repository SCCSHub.net. As soon as we figure out whether the licenses allow us to distribute our own source code, you may be able to use it in your projects too!

Look for future Awk on Rails developments and announcements in 2010.* Also look for an innovative cloud computing extension to Awk on Rails, called VaporWare.

Awk on Rails: Not Really Rapid, Not Exactly Agile, More Like Dodgy.

* Awk on Rails comes with no guarantee of release dates or timeliness of announcements. Check your calendars.

Rendering Trees with Closure Tables

I got a comment from a reader about the Naive Trees section of my presentation SQL Antipatterns Strike Back. I’ve given this presentation at the MySQL Conference & Expo in the past.

I’d also like to mention that I’ve developed these ideas into a new book, SQL Antipatterns: Avoiding the Pitfalls of Database Programming. The book is now available in Beta and for pre-order from Pragmatic Bookshelf.

Here’s the reader’s question:

I would like to ask if there’s a way I can dump all the hierarchies in a single query using a closure table? For example I have a following tree:

rootree
– 1stbranch
– midbranch
– corebranch
– leafnodes
– lastbranch
– lastleaf

and I want to display it like:

rootree -> 1stbranch
rootree -> midbranch
rootree -> midbranch -> corebranch
rootree -> midbranch -> corebranch -> leafnodes
rootree -> lastbranch
rootree -> lastbranch -> lastleaf

The Closure Table is a design for representing trees in a relational database by storing all the paths between tree nodes. Using the reader’s example, one could define and populate two tables like this:

drop table if exists closure;
drop table if exists nodes;

create table nodes (
node int auto_increment primary key,
label varchar(20) not null
);

insert into nodes (node, label) values
(1, ‘rootree’),
(2, ‘1stbranch’),
(3, ‘midbranch’),
(4, ‘corebranch’),
(5, ‘leafnodes’),
(6, ‘lastbranch’),
(7, ‘lastleaf’);

create table closure (
ancestor int not null,
descendant int not null,
primary key (ancestor, descendant),
foreign key (ancestor) references nodes(node),
foreign key (descendant) references nodes(node)
);

insert into closure (ancestor, descendant) values
(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (1,7),
(2,2),
(3,3), (3,4), (3,5),
(4,4), (4,5),
(5,5),
(6,6), (6,7),
(7,7);

What we need to do is find all the descendants of the root node 1, then for each of these descendant nodes, list its ancestors in order, separated by an arrow. We can use MySQL’s useful GROUP_CONCAT() function to build this list for us.

select group_concat(n.label order by n.node separator ‘ -> ‘) as path
from closure d
join closure a on (a.descendant = d.descendant)
join nodes n on (n.node = a.ancestor)
where d.ancestor = 1 and d.descendant != d.ancestor
group by d.descendant;

Here’s the output in the MySQL client. It looks like what the reader asked for:

+-------------------------------------------------+
| path |
+-------------------------------------------------+
| rootree -> 1stbranch |
| rootree -> midbranch |
| rootree -> midbranch -> corebranch |
| rootree -> midbranch -> corebranch -> leafnodes |
| rootree -> lastbranch |
| rootree -> lastbranch -> lastleaf |
+-------------------------------------------------+

I do assume for the purposes of ordering that all of a node’s ancestors have a lower node number. You could alternatively use a pathlength column to the closure table and sort by that.

The Closure Table design is nice compared to the Nested Sets (or Preorder Traversal) design, because it supports the use of referential integrity. By using indexes, the EXPLAIN report shows that MySQL query optimizer does a pretty good job on it (I’ve omitted a few columns for brevity):

+-------+--------+-------------------+--------------------------+
| table | type | ref | Extra |
+-------+--------+-------------------+--------------------------+
| d | range | NULL | Using where; Using index |
| a | ref | test.d.descendant | |
| n | eq_ref | test.a.ancestor | |
+-------+--------+-------------------+--------------------------+

Speaking on SQL Injection at MySQL Conference

O'Reilly MySQL Conference & Expo 2010

I’m speaking this year at the MySQL Conference & Expo 2010 in Santa Clara. Be sure to get your early registration discount by Feb 22! If you miss that deadline, get 25% off with this discount code: mys10fsp

I’m presenting a talk on SQL Injection Myths and Fallacies. This may seem like a topic that’s been done to death, but it’s important for all developers to understand it. This reminds me of a story:

My mother volunteers with the League of Women Voters. One of their activities is helping college students register to vote. So every year they set up a table on campus and help young people fill out the forms.

One day one of the women expressed frustration: “We’ve been doing this for ten years! When are these students going to learn how to register to vote for themselves?!”

The rest of the group looked at her blankly. Finally someone said calmly, “You realize that every year a new class of students becomes eligible to vote, right?

The woman who complained felt suitably embarrassed.

I’m going to cover the basics about SQL Injection, but I’ll also show how much of the advice about SQL Injection (even advice from noted security experts) misses the whole picture. I’ll also give some new techniques for remedies, that I seldom see in books or blogs. Come on by!

What is QEP?

In the context of database programming, QEP is an acronym for Query Execution Plan.

The database server analyzes every SQL query and plans how to use indexes and order tables to produce the result in the most efficient way.

You can get a report of the QEP for a SELECT query using the EXPLAIN command in MySQL. This is an important tool to analyze your SQL queries and detect inefficiencies.

Consultant Ronald Bradford uses the acronym QEP freely, but this term is not ubiquitous and doesn’t appear in the MySQL documentation. Googling for QEP yields some confusing results:

Remember: if you are using an uncommon acronym in your writing, define it when you use it. Even if you have used the acronym in past blog posts, define it again in a new blog post. Or at least link to the original post where you defined it.

Free Software vs. Gratis Software

A lot of folks are unclear on the subtleties of free software and open source. Mike Hogan writes a blog article”Is Hybrid Licensing of OSS Hypocrisy?” to try to shed some light on this. With respect, I think he has missed part of it.

We’re talking about two orthogonal things here. One is open-source versus closed-source, and the other is whether we charge money for software licenses or not. As Mike points out, the former is about development methodology. The latter is about business model.

There ought to be nothing wrong with charging money for open-source software. In fact, there isn’t — even the GPL permits this. This is related to the origin of open-source in the Free Software movement. “Free software” to them is about what the user is licensed to do with that software, not whether they paid anything to license it. In fact, some companies package free software and charge for it (presumably with some added value).

The marketing interpretation of “free software” is the “free chips & salsa” concept Mike mentions. It’s a way of encouraging the market to adopt this product. It’s a loss leader, usually followed by upselling the customer with other products or services.

There’s also a case for no-charge versions of closed-source software. Typically these have either limited features (“crippleware”) or they expire after a limited time (e.g. “demo”).

Few companies have found a way to use open source methods to develop full products they then charge money for. But this could simply be because it’s hard to drive adoption of any software product, regardless of whether it’s open-source or closed-source.

On the issue of hybrid licensing, I see this as no hypocrisy; I see it as more freedom. If I develop some code, offer it under the GPL license, and you use my code as part of your project, then you are obligated to license your project with a GPL-compatible license. This is termed the “viral” nature of the GPL license, and it’s clearly intended to promote the free software movement.

What if you don’t want to adopt a GPL-compatible license for your project? Well, no one is forcing you to use my code. But my code is really amazingly good, and you want it. You want it so much that you’re willing to give me money if I grant you a license to use it under different terms. If I’m willing to do that, now you have more freedom — you can choose to contribute your own code to the body of free software in the world, or you can choose not to. But the latter choice may have a different price tag associated with it.

This should still promote the principles of the free software movement. It would be wrong to charge someone for their freedom. But in the hybrid license model, you can avoid paying for a license simply by joining the movement, by spreading the freedom. If you want to stick to your closed-source model, you can pay for the privilege of using my code in that way.

I don’t see any hypocrisy in a software maker using a hybrid licensing model, as long as they are consistent and honest about it.

I’m Speaking on SQL at OSCON

OSCON 2009

Early Registration has been extended to June 23. Save up to $250!

Enter my friends-of-speaker discount code “os09fos” when you register, and save an additional 20%! Just because you read my blog.

Practical Object-Oriented Models in SQL

Wednesday July 22, 5:20pm.

SQL is from Mars, Objects are from Venus.

This talk is for software developers who know SQL but are stuck trying to implement common object-oriented structures in an SQL database. Mimicking polymorphism, extensibility, and hierarchical data in the relational database paradigm can be confusing and awkward, but they don’t have to be.

  • Polymorphism: Suppose your blog supports comments, but then your comments need to reference multiple types of content, for example news, blog articles, and videos. What then?
  • Extensibility: We’ve all designed customizable software, allowing customers to extend a data model with new data attributes. See how to design flexible systems, while using efficient SQL queries.
  • Hierarchies: Tree-structured data relationships are common, but working with trees in SQL usually implies recursive queries. There are a few solutions to solve this more cleanly.
  • ActiveRecord Dos and Don’ts: Web development frameworks have popularized the use of design patterns, but when it comes to multi-table queries, complex views, and assignment of OO responsibilities, ActiveRecord falls short as a one-size-fits-all Domain Model.

BoF: Meet Authors from Pragmatic Bookshelf

Wednesday July 22, 7:00pm

Gather with published and upcoming authors of programming books from the industry favorite publisher, Pragmatic Bookshelf. Join this informal chat about programming, writing books, job hunting, and career development.

Agenda:

  • Author introductions, books, OSCON presentations.
  • Experiences working with a publisher.
  • How does authoring a book aid a tech career?
  • What tech books would you like to see?

Pragmatic Bookshelf authors attending OSCON include:

  • Ian Dees is presenting “Testing iPhone Apps with Ruby and Cucumber” at OSCON (Wednesday 10:45am). Ian authored the book “Scripted GUI Testing with Ruby.”
  • Bill Karwin is presenting “Practical Object-Oriented Models in SQL” at OSCON (Wednesday 5:20pm). Bill is currently writing a book “SQL Antipatterns.”
  • Other Prag authors are attending OSCON, and plan to be at this BoF.

EAV FAIL

The photo above illustrates (by counter-example) an important characteristic of a normalized database: each logical “type” of attribute belongs in a separate column.

Just because three values happen to be numeric doesn’t mean it makes sense to SUM() them together. But if dissimilar attributes are stored in the same column, it’s tempting to treat them as compatible in this way.

This also shows a fallacy of the Entity-Attribute-Value antipattern. In this design, all attribute values are stored in a single column.

CREATE TABLE EntityAttributeValue (
entity VARCHAR(20) NOT NULL,
attribute VARCHAR(20) NOT NULL,
value VARCHAR(1000) NOT NULL,
PRIMARY KEY (entity, attribute)
);

INSERT INTO EntityAttributeValue (entity, attribute, value)
VALUES
('New Cuyama', 'Population', '562'),
('New Cuyama', 'Ft. above sea level', '2150'),
('New Cuyama', 'Established', '1951'),

SELECT SUM(value) FROM EntityAttributeValue
WHERE entity = 'New Cuyama';

The Entity-Attribute-Value design does not support or conform to rules of database normalization.

To be clear, the proper way to design a database is to put different attributes in different columns. Use column names, not strings, to identify the attributes.

CREATE TABLE Cities (
city_id SERIAL PRIMARY KEY,
city_name VARCHAR(100) NOT NULL,
population INT UNSIGNED NOT NULL,
feet_altitude SMALLINT UNSIGNED NOT NULL,
year_established SMALLINT UNSIGNED NOT NULL
);

SQL Antipatterns Strike Back! Slides

I presented my tutorial at the MySQL Conference & Expo today. I have fun preparing it and presenting it, and I got many good questions and comments from the audience. Thanks to everyone for coming and participating!

I have uploaded my slides with a Creative Common 3.0 license to my SlideShare account: http://www.slideshare.net/billkarwin

For those who did not get to see my tutorial, I’m presenting some selections from it during a 45-minute session at the MySQL Camp on Wednesday at 2:00pm, under the title “Practical Object-Oriented Models in SQL.”

See you next year!

Oracle buying Sun

Stunning news today that Oracle has offered to buy Sun Microsystems. This is sending the MySQL community reeling, as they begin their MySQL Conference & Expo today. Everyone’s talking about whether this change is good for MySQL.

My first thought is: it’s not over till it’s over. These deals have a way of falling through at the last minute. Just look at Microsoft’s attempts to acquire Yahoo!. I’m not saying that it will fall through in this case. Just don’t count it as a done deal until the agreements are signed, and the shareholders and the SEC have their say.

My second thought is: it depends on how well the two companies can integrate. In any acquisition, there’s a merger not only of assets but of goals, strategies, and corporate culture. Not to mention people. People matter.

I worked for InterBase during part of the 1990’s. InterBase was an RDBMS that was acquired by Borland in 1991, as part of their acquisition of Ashton-Tate. Borland was very interested in Ashton-Tate for its dBase product, but Borland also got InterBase in the deal (InterBase had been acquired only a couple of months before AT’s merger with Borland). InterBase wasn’t in Borland’s strategy and it wasn’t what they valued as part of the acquisition. As a result, it was an unwanted step-child for over ten years (despite having a revenue matching Borland C++Builder).

What does this tell us about Oracle’s plans for MySQL? Nothing for certain. My point is that it depends on what Oracle values as part of the acquisition. Is it Java? Is it the line of enterprise hardware? Is it XFS or OpenSolaris or NetBeans or Glassfish? Any of these are likely candidates. But MySQL does not jump to the head of the list as the likeliest “jewel in the crown” that motivated Oracle to make this offer.

I hate IBM System i

In his blog, Vadim reports that a new storage pluggable engine for MySQL has appeared in the source tree, to support IBM DB2 for i as a back-end.

This reminds me that I hate the IBM System i platform (aka IBM Power Systems, aka iSeries, aka AS/400).

Don’t get me wrong — I’m sure it’s terrific technology. I’m sure IBM supports many businesses with it and they’re happy (although I do wonder why they need to keep re-branding the product line). But my fate is not aligned favorably with respect to System i.

At several companies I have worked for, the business development people struck an ill-conceived deal with IBM, to “support System i.” Meetings were had. Agreements were signed. Commissions were paid.

Then it came time to do the work and fulfill the partner agreement. At my last job, my manager came to me and said, “by the way, in your spare time, make sure your work-in-progress supports the IBM platform.”

I know nothing about the IBM System i. I have never seen one face-to-face. I have never seen any documentation for it. I enjoy command-line interfaces, but using the System i made MPE/ix seem friendly.

By the way, in spite of the “Universal” moniker, DB2 on the System i is, as far as I can tell, a completely different database implementation, with the brand name “DB2” tacked on as an afterthought.

Here are some suggestion for the System i business development folks at IBM: when you make a deal with small companies to support your platform, make sure they have enough machines to do development and testing. Include electronic documentation so everyone can have access to it. Perhaps even offer some training as part of the deal. And then ask your new partner for a project plan that details such things as:

  • Which products they promise to support on the System i.
  • When they promise to do the work and have the solution ready.
  • Who they will assign to do the work, not ask to do it in their spare time.