Oracle buying Sun

Stunning news today that Oracle has offered to buy Sun Microsystems. This is sending the MySQL community reeling, as they begin their MySQL Conference & Expo today. Everyone’s talking about whether this change is good for MySQL.

My first thought is: it’s not over till it’s over. These deals have a way of falling through at the last minute. Just look at Microsoft’s attempts to acquire Yahoo!. I’m not saying that it will fall through in this case. Just don’t count it as a done deal until the agreements are signed, and the shareholders and the SEC have their say.

My second thought is: it depends on how well the two companies can integrate. In any acquisition, there’s a merger not only of assets but of goals, strategies, and corporate culture. Not to mention people. People matter.

I worked for InterBase during part of the 1990’s. InterBase was an RDBMS that was acquired by Borland in 1991, as part of their acquisition of Ashton-Tate. Borland was very interested in Ashton-Tate for its dBase product, but Borland also got InterBase in the deal (InterBase had been acquired only a couple of months before AT’s merger with Borland). InterBase wasn’t in Borland’s strategy and it wasn’t what they valued as part of the acquisition. As a result, it was an unwanted step-child for over ten years (despite having a revenue matching Borland C++Builder).

What does this tell us about Oracle’s plans for MySQL? Nothing for certain. My point is that it depends on what Oracle values as part of the acquisition. Is it Java? Is it the line of enterprise hardware? Is it XFS or OpenSolaris or NetBeans or Glassfish? Any of these are likely candidates. But MySQL does not jump to the head of the list as the likeliest “jewel in the crown” that motivated Oracle to make this offer.

I hate IBM System i

In his blog, Vadim reports that a new storage pluggable engine for MySQL has appeared in the source tree, to support IBM DB2 for i as a back-end.

This reminds me that I hate the IBM System i platform (aka IBM Power Systems, aka iSeries, aka AS/400).

Don’t get me wrong — I’m sure it’s terrific technology. I’m sure IBM supports many businesses with it and they’re happy (although I do wonder why they need to keep re-branding the product line). But my fate is not aligned favorably with respect to System i.

At several companies I have worked for, the business development people struck an ill-conceived deal with IBM, to “support System i.” Meetings were had. Agreements were signed. Commissions were paid.

Then it came time to do the work and fulfill the partner agreement. At my last job, my manager came to me and said, “by the way, in your spare time, make sure your work-in-progress supports the IBM platform.”

I know nothing about the IBM System i. I have never seen one face-to-face. I have never seen any documentation for it. I enjoy command-line interfaces, but using the System i made MPE/ix seem friendly.

By the way, in spite of the “Universal” moniker, DB2 on the System i is, as far as I can tell, a completely different database implementation, with the brand name “DB2” tacked on as an afterthought.

Here are some suggestion for the System i business development folks at IBM: when you make a deal with small companies to support your platform, make sure they have enough machines to do development and testing. Include electronic documentation so everyone can have access to it. Perhaps even offer some training as part of the deal. And then ask your new partner for a project plan that details such things as:

  • Which products they promise to support on the System i.
  • When they promise to do the work and have the solution ready.
  • Who they will assign to do the work, not ask to do it in their spare time.

The Next-Gen Databases

A user recently asked:

I’m learning traditional Relational Databases (with PostgreSQL) and doing some research I’ve come across some new types of databases. CouchDB, Drizzle, and Scalaris to name a few, what is going to be the next database technologies to deal with?

SQL is a language for querying and manipulating relational databases. SQL is dictated by an international standard. While the standard is revised, it seems to always work within the relational database paradigm.

Here are a few new data storage technologies that are getting attention currently:

  • CouchDB is a non-relational database. They call it a document-oriented database.
  • Amazon SimpleDB is also a non-relational database accessed in a distributed manner through a web service. Amazon also has a distributed key-value store called Dynamo, which powers some of its S3 services.
  • Dynomite and Kai are open source solutions inspired by Amazon Dynamo.
  • BigTable is a proprietary data storage solution used by Google, and implemented using their Google File System technology. Google’s MapReduce framework uses BigTable.
  • Hadoop is an open-source technology inspired by Google’s MapReduce, and serving a similar need, to distribute the work of very large scale data stores.
  • Scalaris is a distributed transactional key/value store. Also not relational, and does not use SQL. It’s a research project from the Zuse Institute in Berlin, Germany.
  • RDF is a standard for storing semantic data, in which data and metadata are interchangeable. It has its own query language SPARQL, which resembles SQL superficially, but is actually totally different.
  • Vertica is a highly scalable column-oriented analytic database designed for distributed (grid) architecture. It does claim to be relational and SQL-compliant. It can be used through Amazon’s Elastic Compute Cloud.
  • Greenplum is a high-scale data warehousing DBMS, which implements both MapReduce and SQL.
  • XML isn’t a DBMS at all, it’s an interchange format. But some DBMS products work with data in XML format.
  • ODBMS, or Object Databases, are for managing complex data. There don’t seem to be any dominant ODBMS products in the mainstream, perhaps because of lack of standardization. Standard SQL is gradually gaining some OO features (e.g. extensible data types and tables).
  • Drizzle is a relational database, drawing a lot of its code from MySQL. It includes various architectural changes designed to manage data in a scalable “cloud computing” system architecture. Presumably it will continue to use standard SQL with some MySQL enhancements.

Relational databases have weaknesses, to be sure. People have been arguing that they don’t handle all data modeling requirements since the day it was first introduced.

Year after year, researchers come up with new ways of managing data to satisfy special requirements: either requirements to handle data relationships that don’t fit into the relational model, or else requirements of high-scale volume or speed that demand data processing be done on distributed collections of servers, instead of central database servers.

Even though these advanced technologies do great things to solve the specialized problem they were designed for, relational databases are still a good general-purpose solution for most business needs. SQL isn’t going away.

I’m posting to my blog the questions I’ve answered on StackOverflow, which earned the “Good Answer” badge. This was my answer to “The Next-Gen Databases.”

SQL Antipatterns Tutorial at the MySQL Conf & Expo 2009

My tutorial proposal was accepted, so I’ll be speaking April 20 at the MySQL Conference & Expo 2009 in Santa Clara.

My tutorial is “SQL Antipatterns Strike Back.” SQL Antipatterns are frequent blunders committed by software developers, both novice and expert.

I gave a similar tutorial last year, and I think it was well-received. I’m keeping and improving the most interesting topics from last year’s tutorial, but over half of this year’s tutorial will be all-new!

The new topics in my tutorial include:

  • Polymorphic Associations
  • Storing Tree-Structured Data
  • Index Shotgun
  • Using NULL
  • Goldberg Machines
  • Using JOIN (…or not)
  • See No Evil
  • Magic Beans
  • Diplomatic Immunity

The improved topics from last year include:

  • Entity-Attribute-Value
  • Metadata Tribbles
  • ENUM
  • Ambiguous GROUP BY
  • Using HAVING instead of WHERE
  • Parameter Facade
  • Phantom Side Effects

Some of these topic names are meant to be mnemonics, just like names of design patterns and anti-patterns. You’ll have to attend the tutorial to see the full meaning of these SQL Antipatterns!

Enough is enough, PlanetMySQL

Lately has shown an increasing amount of commercial promotion for Sun products. Many recent articles included in this supposedly MySQL-related feed aggregator have nothing to do with MySQL.

Now I see an article included in the feed from ComputerWorld, about database researcher David Dewitt getting a job at Microsoft. Not only is this non-MySQL-related content, but more than half of the text in this article’s RSS summary is an advertisement for “Expedition Week” on the National Geographic Channel! Here’s what I saw in my RSS reader:

Microsoft hires noted database researcher to help scale SQL Server
from Planet MySQL by ComputerWorld

David DeWitt retired from the Computer Science Department at the University of Wisconsin last year, but he’s already returned, this time as head of a new database research center located on the Madison campus and funded primarily by Microsoft.

Presented By:
Expedition Week Continues Tonight

Seven nights of one great discovery after another continues tonight at 9P e/p only on National Geographic Channel. From the ancient pyramids to the ocean depths, from lost cities to outer space, travel with the latest generation of intrepid explorers as they make one great discovery after another. Expedition Week, only on National Geographic Channel.

Ads by Pheedo

I’m removing planetmysql from my RSS reader. So long — and no thanks for all the Glassfish.

Building MySQL from Bazaar

I wanted to try using MySQL source from its new VCS, Bazaar, and I saw Daniel Fischer’s extremely helpful post “Getting started with Bazaar for MySQL code“. Thanks Daniel!

Downloading using Bazaar is indeed slow, as Daniel warns. I am constrained to 1MBps ADSL, and the intial download took quite a bit of time. I think over an hour. But subsequent merges and other operations are quicker.

Building the source seems conventional, but I have run into Bug #37075 running tests. There seems to be a regression that is exhibited by th LIMIT tests with prepared statements. Here are the steps to reproduce:

$ configure --with-innodb --with-ssl
$ make
$ cd mysql-test
$ perl mysql-test-run limit

Here’s the output:


main.limit [ fail ]

--- /Users/bill/workspace/bzr/mysql-server/mysql-test/r/limit.result 2008-06-26
01:04:04.000000000 +0300
+++ /Users/bill/workspace/bzr/mysql-server/mysql-test/r/limit.reject 2008-06-26
23:47:59.000000000 +0300
@@ -107,6 +107,7 @@
set @a=14632475938453979136;
execute s using @a, @a;
set @a=-14632475938453979136;
execute s using @a, @a;
ERROR HY000: Incorrect arguments to EXECUTE

mysqltest: Result length mismatch

Warnings from just before the error:
Error 1292 Truncated incorrect DECIMAL value: ''

Aborting: main.limit failed in default mode.
To continue, re-run with '--force'.

This affects MySQL 5.1.27 (revno 2662) and MySQL 6.0.6 (revno 2671). But the tests pass in MySQL 5.0.66.

ActiveRecord does not suck

I’ve been reading a few blog postings such as Kore Nordmann’s ActiveRecord sucks and Mike Seth’s ActiveRecord sucks, but Kore Nordmann is wrong.

ActiveRecord is fine.  It is a tool that does just what it’s designed to do.  What sucks is when developers try to make it do other things than what it’s intended to do.

I worked for Zend, managing the Zend Framework project through its 1.0 release.  I also completed the implementation and documentation of Zend_Db and its related components. To set the record straight, Zend_Db does not implement the ActiveRecord pattern. It implements the Table Data Gateway and Row Data Gateway patterns, which taken together offer similar value as the ActiveRecord pattern.

I totally agree with Mike Seth that MVC should not be taken as “ActiveRecord-View-Controller.” I tried to make this point in documentation, screencasts, conference presentations, and in many mailing list messages, but I met with little success.

Unfortunately, the Ruby on Rails drumbeat that Models are simply database table wrappers has established momentum.  The term “Model” has (incorrectly) become synonymous in many developers’ minds with “ActiveRecord.”  Since Models by this definition are tied to database access and largely implementing various query techniques, Ruby on Rails development forces you to write a large amount of code in the controller classes that should properly be written in Model classes.

This has a few consequences. Unit-testing controller classes becomes very complex, since that’s where the majority of your application code resides.  To test a controller class you need to mock HTTP requests, and sift through HTML output.  This is fine, but it results in more work since testing the controller class is so important and complex.  If the application code were separated into a true Model, then unit-testing the controller would simply be testing whether the HTTP request had been communicated to the Model correctly.  Testing the behavior of the Model would be much more straightforward unit-testing of a class API in isolation, requiring no mock HTTP requests or scraping HTML output.

Also, unit-testing Rails-style Model classes is difficult, since the Model is coupled with the database.  We start to see unfortunate things like database fixtures as being necessary before you can execute the simplest tests against your Model class.  This makes testing Models time-consuming, error-prone, and run slowly.

If developers were to separate Models and Controllers properly, and separate data access components from Models, unit-testing all of these classes could be done more simply, and with greater isolation from other classes.  This makes it easier to diagnose defects, when they occur.  Isn’t this the point of unit tests?

A Model is a class that provides a logical component of your application domain.  Models are products of OO design, which is a development activity I see get very little attention in the developer blogosphere or the developer tools market.  Developers seem more enchanted by evangelizing their favorite code editor or debugger, or by squeezing more performance out of their database, than by mental analysis to make sure their OO architecture is modeling its application requirements well.

A single Model class may be backed by a database table, or multiple database tables, or perhaps even no database tables.  Data persistence should be an internal implementation detail within a Model; the external API of the Model class should reflect its logical OO requirements, not the physical database structure.
(update) What I often tell people is that the relationship between a Model and an ORM class should be “HAS-A” rather than “IS-A.”  The latter is the assumption of Rails and other frameworks who are enticed by ActiveRecord.  If the Model uses ORM objects instead of inheriting from ORM classes, then you can design the Model to contain all data and code for the domain it’s supposed to model — even if it takes multiple database tables to represent it.

Many developers have complained that Zend Framework provides no base Model class. Of course it doesn’t provide a base Model class! That’s your job. This complaint is like saying that Microsoft Word sucks because it doesn’t provide finished documents for you.

So I wouldn’t say ActiveRecord sucks.  I would say that people are expecting ActiveRecord to magically solve their OO design for them, in a false quest to avoid doing the design themselves.

Speaking at the MySQL Conference

It’s time for the 2008 MySQL Conference. This year should be especially interesting.

Sun announced they would acquire MySQL on Jan. 16 – fewer than eight weeks ago (I bet it doesn’t seem that way to the folks involved). The deal was closed only Feb. 26, and I’m sure there are still months of work to go, to fully integrate the two companies. Exciting times!

As always, database server technology and connectivity is the focus of the MySQL Conference. The sessions are organized into a dizzying 17 tracks, only two of which aren’t specific to some technology aspect of using MySQL. I’m looking forward to seeing many of the sessions.
One in particular that caught my eye is Beat Vontobel’s talk, “The Lost Art of the Self Join” in which he says he will solve a Sudoku puzzle in SQL. I gave a talk at OSCON 2006, “SQL Outer Joins for Fun and Profit,” in which I also used SQL to solve Sudoku puzzles. It’ll be interesting to see Beat’s solution.
This year I’m giving a tutorial, “SQL Antipatterns.” This is my way of describing many best practices of database and query design, by counter-example. After answering SQL questions on newsgroups for many years, I have seen a lot of counter-examples.
I’m busy over the next couple of weeks preparing my presentation slides. Here’s my complete outline of topics for the MySQL Conference Tutorial:
Logical Database Antipatterns
  • Jaywalking – using a comma-separated list of values in a string.
  • Entity-Attribute-Value.
  • Multi-column Attributes.
  • Metadata tribbles.
Physical Database Antipatterns
  • ID Required.
  • Phantom Files.
  • ENUM Antipattern.
  • Readable Passwords.
Query Antipatterns
  • Ambiguous GROUP BY ((also discussed in recent blog posts by Carsten and Roland Bouman)
  • Using HAVING instead of WHERE.
  • Poor Man’s Search Engine (LIKE and REGEXP).
  • Implicit Columns in SELECT and INSERT.
Application Development Antipatterns
  • SQL Injection.
  • Parameter Façade.
  • Pseudokey Neat Freak.
  • Session Coupling.
  • Phantom Side Effects.
It’s not too late to register for the MySQL Conference!

“Pure” mysqlnd interface feedback

After I posted my idea about a pure PHP mysqlnd driver, I received an email from Ulf Wendel from the MySQL AB team who works on the mysqlnd driver. He told me I could post his comments since he doesn’t have a Blogger account. I’ll include his comments below in blockquotes and then I’ll comment below that.

Ulf Wendel writes:

What is “native”?

PHP and the Borg[1] seem to be relatives. Whenever a real PHP hacker find a useful C library in the universe he speaks the famous words “resistance is futile”. A little later, the C library is available to PHP has become a PHP extension. This is one of the secrets of the early success of PHP: PHP is extentable. Extensions allow all the PHP drones to use the collective power of C libraries. As of today the PHP Function Reference shows 189 extensions [2]. Guess how many of them are written in C and how many of them are based on a C library…

To make it short: with PHP you can choose between C and PHP. Same with Lua: a scripting language (Lua) and C (the language Lua is implemented in) can be mixed whenever appropriate. That’s the nature and the secret of both PHP and Lua. Whenever your coding bees hit a limitation of the simple to use scripting language you use the C level. Implementing the MySQL Client/Server Protocol[3] is such an example. It would be slow.

Compare that to Java. Is it common to extend the Java programming language? No, not really, very few people start hacking a Java virtual machine. And once you have hacked one virtual machine, what about portability and the other virtual machines out there. Alternatives? Well, if you enjoy cross-compiling, maybe… That’s why you would never want to write a JDBC driver in any other language but Java. And, Java is more of a compiled language than PHP is, therefore it is fast enough.

Native for PHP can mean both: C and PHP. If its C – like with mysqlnd – you have to ask what external dependencies exist. The MySQL native driver for PHP (mysqlnd) gets all the infrastructure it needs from PHP itself. The driver is part of PHP, therefore mysqlnd runs on all platforms that run PHP.

Why using C?

You list some disadvantages of choosing C:

a) Platform-dependence not given

Mysqlnd runs on all platforms that run PHP. Mysqlnd is a part of PHP, it does use in particular:

– PHP memory management (limits really work now!)
– PHP data structures (less copy operations and memory savings)
– PHP Streams

b) Communication protocol inspection not possible

The MySQL native driver for PHP does use PHP Streams. PHP Streams feature hooks, aka Stream Filters [4]. With a little hacking you could expose the internal stream to the userland (PHP scripts). However, we favour MySQL Proxy[5] and therefore we have not implemented it.

To sum up: technically its possible but we have not enabled it. Tell us why you need it inside PHP and you have a fair chance to see it implemented.

Last but not least: have you ever worked on a raw binary network stream in PHP. Do your really want to know about the details of the protocol which Andrey started to love while he was implementing it? Or do you want to use something that is already there and might be the future standard: MySQL Proxy[5].

c) Deployment problems with no access to php.ini

First, with mysqlnd foundations have been laid to enable MySQL support by default: no license issue, no version issues, no external library dependencies. Its up to and its community to decide if mysqlnd should be added to the default PHP configuration.

Second, if you choose a hosting service that does not configure/compile PHP as you need it, you do something wrong – honestly.

Anyway: all the above is minor stuff. The main reason is maximum integration into PHP for the best performance and easy deployment with no license issues.


Now my comments to Ulf’s comments:

First, thank you very much Ulf for your reply to my blog posting. I appreciate getting information “from the source” and it’s important to get more information about libmysqlnd out to the community.

I want to reiterate that I think Libmysqlnd is the right solution for the PHP community, given the requirements of providing high-performance, quality extension with a PHP-compatible license. I look forward to libmysqlnd being part of the standard PHP distribution if the PHP community approves it.

I’m not lobbying to change libmysqlnd! I’m just supposing that a MySQL connector written in PHP code might also be interesting, even if it were not the preferred connector for MySQL server. It would be useful in a few circumstances, and also could be a debugging tool.

The performance advantage of C over of PHP is important. It stands to reason that an implementation of the MySQL protocol in a scripting language would be quite a bit slower.

However, it would be interesting to try it and measure the actual difference in performance, if for no other reason than understanding exactly how much performance advantage is achieved by using C. I understand from a user’s comment that libraries exist in other scripting languages that implement the MySQL Protocol. Net::MySQL for Perl, Ruby/MySQL, and an unreleased Python library.