Speaking on SQL Injection at MySQL Conference

O'Reilly MySQL Conference & Expo 2010

I’m speaking this year at the MySQL Conference & Expo 2010 in Santa Clara. Be sure to get your early registration discount by Feb 22! If you miss that deadline, get 25% off with this discount code: mys10fsp

I’m presenting a talk on SQL Injection Myths and Fallacies. This may seem like a topic that’s been done to death, but it’s important for all developers to understand it. This reminds me of a story:

My mother volunteers with the League of Women Voters. One of their activities is helping college students register to vote. So every year they set up a table on campus and help young people fill out the forms.

One day one of the women expressed frustration: “We’ve been doing this for ten years! When are these students going to learn how to register to vote for themselves?!”

The rest of the group looked at her blankly. Finally someone said calmly, “You realize that every year a new class of students becomes eligible to vote, right?

The woman who complained felt suitably embarrassed.

I’m going to cover the basics about SQL Injection, but I’ll also show how much of the advice about SQL Injection (even advice from noted security experts) misses the whole picture. I’ll also give some new techniques for remedies, that I seldom see in books or blogs. Come on by!

What is QEP?

In the context of database programming, QEP is an acronym for Query Execution Plan.

The database server analyzes every SQL query and plans how to use indexes and order tables to produce the result in the most efficient way.

You can get a report of the QEP for a SELECT query using the EXPLAIN command in MySQL. This is an important tool to analyze your SQL queries and detect inefficiencies.

Consultant Ronald Bradford uses the acronym QEP freely, but this term is not ubiquitous and doesn’t appear in the MySQL documentation. Googling for QEP yields some confusing results:

Remember: if you are using an uncommon acronym in your writing, define it when you use it. Even if you have used the acronym in past blog posts, define it again in a new blog post. Or at least link to the original post where you defined it.

Free Software vs. Gratis Software

A lot of folks are unclear on the subtleties of free software and open source. Mike Hogan writes a blog article”Is Hybrid Licensing of OSS Hypocrisy?” to try to shed some light on this. With respect, I think he has missed part of it.

We’re talking about two orthogonal things here. One is open-source versus closed-source, and the other is whether we charge money for software licenses or not. As Mike points out, the former is about development methodology. The latter is about business model.

There ought to be nothing wrong with charging money for open-source software. In fact, there isn’t — even the GPL permits this. This is related to the origin of open-source in the Free Software movement. “Free software” to them is about what the user is licensed to do with that software, not whether they paid anything to license it. In fact, some companies package free software and charge for it (presumably with some added value).

The marketing interpretation of “free software” is the “free chips & salsa” concept Mike mentions. It’s a way of encouraging the market to adopt this product. It’s a loss leader, usually followed by upselling the customer with other products or services.

There’s also a case for no-charge versions of closed-source software. Typically these have either limited features (“crippleware”) or they expire after a limited time (e.g. “demo”).

Few companies have found a way to use open source methods to develop full products they then charge money for. But this could simply be because it’s hard to drive adoption of any software product, regardless of whether it’s open-source or closed-source.

On the issue of hybrid licensing, I see this as no hypocrisy; I see it as more freedom. If I develop some code, offer it under the GPL license, and you use my code as part of your project, then you are obligated to license your project with a GPL-compatible license. This is termed the “viral” nature of the GPL license, and it’s clearly intended to promote the free software movement.

What if you don’t want to adopt a GPL-compatible license for your project? Well, no one is forcing you to use my code. But my code is really amazingly good, and you want it. You want it so much that you’re willing to give me money if I grant you a license to use it under different terms. If I’m willing to do that, now you have more freedom — you can choose to contribute your own code to the body of free software in the world, or you can choose not to. But the latter choice may have a different price tag associated with it.

This should still promote the principles of the free software movement. It would be wrong to charge someone for their freedom. But in the hybrid license model, you can avoid paying for a license simply by joining the movement, by spreading the freedom. If you want to stick to your closed-source model, you can pay for the privilege of using my code in that way.

I don’t see any hypocrisy in a software maker using a hybrid licensing model, as long as they are consistent and honest about it.

I’m Speaking on SQL at OSCON

OSCON 2009

Early Registration has been extended to June 23. Save up to $250!

Enter my friends-of-speaker discount code “os09fos” when you register, and save an additional 20%! Just because you read my blog.

Practical Object-Oriented Models in SQL

Wednesday July 22, 5:20pm.

SQL is from Mars, Objects are from Venus.

This talk is for software developers who know SQL but are stuck trying to implement common object-oriented structures in an SQL database. Mimicking polymorphism, extensibility, and hierarchical data in the relational database paradigm can be confusing and awkward, but they don’t have to be.

  • Polymorphism: Suppose your blog supports comments, but then your comments need to reference multiple types of content, for example news, blog articles, and videos. What then?
  • Extensibility: We’ve all designed customizable software, allowing customers to extend a data model with new data attributes. See how to design flexible systems, while using efficient SQL queries.
  • Hierarchies: Tree-structured data relationships are common, but working with trees in SQL usually implies recursive queries. There are a few solutions to solve this more cleanly.
  • ActiveRecord Dos and Don’ts: Web development frameworks have popularized the use of design patterns, but when it comes to multi-table queries, complex views, and assignment of OO responsibilities, ActiveRecord falls short as a one-size-fits-all Domain Model.

BoF: Meet Authors from Pragmatic Bookshelf

Wednesday July 22, 7:00pm

Gather with published and upcoming authors of programming books from the industry favorite publisher, Pragmatic Bookshelf. Join this informal chat about programming, writing books, job hunting, and career development.

Agenda:

  • Author introductions, books, OSCON presentations.
  • Experiences working with a publisher.
  • How does authoring a book aid a tech career?
  • What tech books would you like to see?

Pragmatic Bookshelf authors attending OSCON include:

  • Ian Dees is presenting “Testing iPhone Apps with Ruby and Cucumber” at OSCON (Wednesday 10:45am). Ian authored the book “Scripted GUI Testing with Ruby.”
  • Bill Karwin is presenting “Practical Object-Oriented Models in SQL” at OSCON (Wednesday 5:20pm). Bill is currently writing a book “SQL Antipatterns.”
  • Other Prag authors are attending OSCON, and plan to be at this BoF.

EAV FAIL

The photo above illustrates (by counter-example) an important characteristic of a normalized database: each logical “type” of attribute belongs in a separate column.

Just because three values happen to be numeric doesn’t mean it makes sense to SUM() them together. But if dissimilar attributes are stored in the same column, it’s tempting to treat them as compatible in this way.

This also shows a fallacy of the Entity-Attribute-Value antipattern. In this design, all attribute values are stored in a single column.

CREATE TABLE EntityAttributeValue (
entity VARCHAR(20) NOT NULL,
attribute VARCHAR(20) NOT NULL,
value VARCHAR(1000) NOT NULL,
PRIMARY KEY (entity, attribute)
);

INSERT INTO EntityAttributeValue (entity, attribute, value)
VALUES
('New Cuyama', 'Population', '562'),
('New Cuyama', 'Ft. above sea level', '2150'),
('New Cuyama', 'Established', '1951'),

SELECT SUM(value) FROM EntityAttributeValue
WHERE entity = 'New Cuyama';

The Entity-Attribute-Value design does not support or conform to rules of database normalization.

To be clear, the proper way to design a database is to put different attributes in different columns. Use column names, not strings, to identify the attributes.

CREATE TABLE Cities (
city_id SERIAL PRIMARY KEY,
city_name VARCHAR(100) NOT NULL,
population INT UNSIGNED NOT NULL,
feet_altitude SMALLINT UNSIGNED NOT NULL,
year_established SMALLINT UNSIGNED NOT NULL
);

SQL Antipatterns Strike Back! Slides

I presented my tutorial at the MySQL Conference & Expo today. I have fun preparing it and presenting it, and I got many good questions and comments from the audience. Thanks to everyone for coming and participating!

I have uploaded my slides with a Creative Common 3.0 license to my SlideShare account: http://www.slideshare.net/billkarwin

For those who did not get to see my tutorial, I’m presenting some selections from it during a 45-minute session at the MySQL Camp on Wednesday at 2:00pm, under the title “Practical Object-Oriented Models in SQL.”

See you next year!

Oracle buying Sun

Stunning news today that Oracle has offered to buy Sun Microsystems. This is sending the MySQL community reeling, as they begin their MySQL Conference & Expo today. Everyone’s talking about whether this change is good for MySQL.

My first thought is: it’s not over till it’s over. These deals have a way of falling through at the last minute. Just look at Microsoft’s attempts to acquire Yahoo!. I’m not saying that it will fall through in this case. Just don’t count it as a done deal until the agreements are signed, and the shareholders and the SEC have their say.

My second thought is: it depends on how well the two companies can integrate. In any acquisition, there’s a merger not only of assets but of goals, strategies, and corporate culture. Not to mention people. People matter.

I worked for InterBase during part of the 1990’s. InterBase was an RDBMS that was acquired by Borland in 1991, as part of their acquisition of Ashton-Tate. Borland was very interested in Ashton-Tate for its dBase product, but Borland also got InterBase in the deal (InterBase had been acquired only a couple of months before AT’s merger with Borland). InterBase wasn’t in Borland’s strategy and it wasn’t what they valued as part of the acquisition. As a result, it was an unwanted step-child for over ten years (despite having a revenue matching Borland C++Builder).

What does this tell us about Oracle’s plans for MySQL? Nothing for certain. My point is that it depends on what Oracle values as part of the acquisition. Is it Java? Is it the line of enterprise hardware? Is it XFS or OpenSolaris or NetBeans or Glassfish? Any of these are likely candidates. But MySQL does not jump to the head of the list as the likeliest “jewel in the crown” that motivated Oracle to make this offer.

I hate IBM System i

In his blog, Vadim reports that a new storage pluggable engine for MySQL has appeared in the source tree, to support IBM DB2 for i as a back-end.

This reminds me that I hate the IBM System i platform (aka IBM Power Systems, aka iSeries, aka AS/400).

Don’t get me wrong — I’m sure it’s terrific technology. I’m sure IBM supports many businesses with it and they’re happy (although I do wonder why they need to keep re-branding the product line). But my fate is not aligned favorably with respect to System i.

At several companies I have worked for, the business development people struck an ill-conceived deal with IBM, to “support System i.” Meetings were had. Agreements were signed. Commissions were paid.

Then it came time to do the work and fulfill the partner agreement. At my last job, my manager came to me and said, “by the way, in your spare time, make sure your work-in-progress supports the IBM platform.”

I know nothing about the IBM System i. I have never seen one face-to-face. I have never seen any documentation for it. I enjoy command-line interfaces, but using the System i made MPE/ix seem friendly.

By the way, in spite of the “Universal” moniker, DB2 on the System i is, as far as I can tell, a completely different database implementation, with the brand name “DB2” tacked on as an afterthought.

Here are some suggestion for the System i business development folks at IBM: when you make a deal with small companies to support your platform, make sure they have enough machines to do development and testing. Include electronic documentation so everyone can have access to it. Perhaps even offer some training as part of the deal. And then ask your new partner for a project plan that details such things as:

  • Which products they promise to support on the System i.
  • When they promise to do the work and have the solution ready.
  • Who they will assign to do the work, not ask to do it in their spare time.

Virtually Speaking

The word virtually is overused. In many cases using the word virtually simply means not. For example:

I have finished virtually all of my homework.

This new surgical procedure is virtually pain-free.

In Modern American Usage, Bryan A. Garner counts virtually as a weasel-word. Weasel-words are so named because of the habit of weasels to attack birds’ nests, and eat their eggs by sucking the meat out, leaving an empty shell.” Likewise, words such as virtually “have the effect of rendering uncertain or hollow the statements in which they appear.”

So be careful using virtually, or other words that similarly diminish the words around them. Other weasel-words commonly used by writers today include significantly, obviously, very, and quite.

Hello EclipseCon 2009

No, I am not attending EclipseCon — but my smiling face apparently was on Tuesday. StackOverflow founder and CodingHorror blogger Jeff Atwood emailed me to let me know he displayed my StackOverflow user profile page during his keynote at EclipseCon.

I don’t know what the context was in which he showed my profile. Maybe he just needed an example of an SQL geek who has too much time on his hands.

I hope a video of the keynote will be made available. If I do find one, I’ll link to it in this blog.