A migrator's guide to Drizzle
By Andrew Hutchings
The stable release of Drizzle has generated a lot of interest in migrating previous MySQL web sites to Drizzle. The good news for people attempting such migrations is that this isn't incredibly difficult in many cases; this article will describe what to look out for and how to go about converting a web site or any other database related project.
The Advantages of Drizzle
Before looking into the migration process, something should be said of the advantages that migrating to Drizzle might give. For starters, Drizzle has very strong data integrity; this means that Drizzle does not assume what is meant by a query. If a query/data cannot be understood, then an error occurs.
Drizzle has microsecond precision support for the TIMESTAMP data type (by using 'TIMESTAMP(6)') as well as related functions such as 'NOW()'. There are also some new data types in the form of UUID and native BOOLEAN.
The source code for Drizzle has gone through a massive re-factor and testing process, during which many bugs and flaws that were in MySQL have been removed and some of the locks which could hinder performance with many connections have also gone. Every single branch goes through a very rigorous testing process before it is merged in to make sure it is stable; these include multiple performance regression (and indeed, enhancement) tests.
Drizzle has been created by companies that actually use the product rather than companies wanting to sell products/services on top of it. As such, everything is open source and the development and testing of every commit is open to the public. The libdrizzle client library is MySQL compatible and is BSD licensed; this makes it easier to integrate into commercial as well as open source products. Even the documentation uses a Creative Commons Share Alike 3.0 licence which makes it easy to distribute.
In the future, Drizzle will have native sharding in the protocol library and multi-tenancy; a lot of the groundwork is already in place for these features. Put simply, multi-tenancy is like having multiple instances of Drizzle running from one daemon. This is great news for virtual hosting and database-as-a-service because it means that multiple installations of the database on one server would no longer be required to isolate users.
The minor cost of all the new features and re-factoring is that Drizzle is no longer completely compatible with MySQL. Many things are still the same, but there are also several differences which are covered in this article.
Drizzle speaks the MySQL wire protocol, which means that the PHP/Python/Perl/other-language MySQL plugins will in most cases work with Drizzle straight out of the box. The one thing to note is that by default the MySQL port and socket listeners are not installed by default if using the Debian/Ubuntu and RedHat/Fedora packages – this means that the application may need to connect to 127.0.0.1 (instead of 'localhost' which tries to use the socket) on port 4427 to talk to the Drizzle server. In the future port 4427 will be used for new protocol features (such as sharding), whilst keeping port 3306 and the Unix socket fully compatible with MySQL.
Related to this, MySQL has its own native authentication, and in 5.5 started introducing a plugin API for authentication. In Drizzle, authentication is off by default but there are plugins included to implement it using PAM, flat files and several other similar methods.
What Isn't Supported
If an application uses MySQL Stored Procedures/Functions, these cannot yet be migrated to Drizzle. In the beginning it was decided that Drizzle would not support these because this logic should be in the application layer, especially in cloud environments where resources are at a premium. After a lot of feedback it is now a planned feature for the next GA release to have multiple plugins for different languages and there is a Google Summer of Code (GSoC) project for this. Views and triggers have internal implementations and can be made as plugins for the brave, but for most users these cannot be migrated. Just before the GA release the number of included storage engines was reduced; this is because some had not been supported in a while and a few were not functioning correctly any more.
The MyISAM engine is only supported for temporary tables. In fact, in Drizzle7 there are no non-transaction engines for main tables, so the tables will need converting to InnoDB. This also means full-text indexes are not supported. The remaining engines for user tables are InnoDB, HailDB and PBMS. Then for temporary tables there is also Memory and MyISAM.
As far as data types go, almost all the types from MySQL are supported with caveats due to strictness and attempts at being closer to the SQL standard which will be explained later on. The one data type that is not yet supported in Drizzle is SET. Again there is a GSoC project open this year to implement this along with a new TUPLE data type.
MySQL supports many different character sets. This has all sorts of uses in the enterprise, but as far as the web goes the one true character set is UTF-8. As such Drizzle only supports two character sets: binary and UTF-8. For many web sites this should actually make life a little simpler, but during migration this could cause a small hiccup which will be covered later in this article.
MySQL has many INT types such as TINYINT, SMALLINT, MEDIUMINT, INT and BIGINT. In Drizzle this has been simplified, there is a 4-byte int called 'INT' and an 8-byte int called 'BIGINT'. There were several reasons for this change; first and foremost the multiple types are not part of the SQL specification. Sometimes as an app grows, a larger range of values comes to be needed; this means an ALTER TABLE to another type is needed which can take a very long time to execute on large data sets. Although arguably the biggest reason is that most storage engines (InnoDB included) actually store the smaller types 4-byte aligned. So there is no storage saving by using them.
The TEXT and BLOB types also have several size specifiers in MySQL (TINY, MEDIUM and LONG), but these end up only saving a few bytes in the row header for a data type which in many cases is used to store between kilobytes and megabytes of data. In Drizzle there is just one single BLOB/TEXT type.
In MySQL, ENUM supports an empty value on top of those specified in the list; it will also use an empty value for unknown values if strict mode is not used. Drizzle does not accept empty values for ENUM.
There has been a lot of time and effort gone into re-implementing the TIME and DATE based data types. There are three points to make here as far as migrating goes:
- TIME range is between 00:00:00 and 23:59:59 in Drizzle which is much smaller than MySQL's TIME range. If anything outside these values are needed then it is recommended that INT is used.
- DATE/DATETIME/TIMESTAMP types do not accept '0000-00-00', the first valid date is '0001-01-01'. In most cases NULL can be used instead of '0000-00-00'.
- Drizzle only supports one timezone, UTC.
It may also be useful to point out the existence of TIMESTAMP(6) which is capable of storing times with microsecond resolution.