MySQL databases have been offered by most web hosting companies for a couple of years. This is the main reason I still have to deal with this product, which claims to be an “enterprise ready” relational database.
Unfortunately these companies chose MySQL and did that when MySQL was a pile of crap (pre-5 versions).
Now, they are stuck with antiquated versions (4.0.x) and no reasonable upgrade paths.
In the following, I’m going a little into details about why MySQL, at least in pre-5 versions, in my point of view is NOT to be considered a serious database product.
NOTE: When I refer to “MySQL” in the text below, I generally mean pre-5 versions and typically 4.0.x versions. Flame me – but this is what is offered to me by several hosting companies…
For some reason, the MySQL people originally thought we do not really need foreign key constraints. They are bad for performance and generally just a nuisance, right ?
WRONG ! Foreign key constraints help ensuring referential integrity. Of course there is a slight performance hit and they can be a cause of hard-to-debug deadlocks, but they ARE very important for the integrity of your data !
Yes, I know, using the InnoDB database engine allows you to utilize foreign keys. But, for some reason, not all hosting companies allow this particular database engine.
And, why do I have to use a specific “database engine” in order to get a feature that really is a core part of any decent database product ?
Trans-what ? Again, the MySQL designers originally thought that transactions are for “big enterprise” systems only. A nice little database don’t really need to care about transactions.
Again, they are bad for performance, the root-cause of deadlocks and not really necessary in the typical simple web-application; right ?
WRONG again ! Transaction management is the very core part that makes ACID a reality. That you can do a “batch” of insert/update/delete’s and have them all complete if they are all valid [commit] – or have them all fail if just one fails [rollback].
As soon as you have more than one insert/update/delete statement as part of a “logical unit of work”, transaction management WILL help you !
But, they can be a serious cause of pain (especially if the isolation level is not right) and are probably hard to implement – which I guess is why MySQL developers initially pretended they are NOT important…
Triggers and Stored Procedures
Well, triggers and stored procedures are basic elements of any serious RDBMS. Not so for MySQL. I don’t mind that too much, as I personally prefer business logic to be placed in the “middle tier”.
But, for web-/middle-tiers with poor transaction management, it actually makes good sense to implement business logic with ACID requirements in the database as stored procedures.
Now, given that MySQL does not have foreign key constraints nor any transactional support, the lack of stored procedures makes it really REALLY difficult to ENSURE referential integrity and implementing ACID/”logical unit of work” a serious PAIN in the BALLS !
Again, selecting the right “database engine” can help on some of theses issues – but seriously, this stuff should be BASIC in ANY “database engine” !
Okay, I have to admit it: I LOVE VIEWs.
Views take a “simple” select statement and make it look like a table.
They are good for performance – the database engine can “pre-prepare” the SQL and this way introduce a performance gain.
They are good for documentation – views often tell a lot more about aggregates and important relations between different tables than an up-2-date ER-diagram.
They are good for application development with a “thin” middle tier – they can help keep most of the complex SQL in the database.
Views are the most reusable SQL component of any serious database, ranging from providing complex relations and aggregates between lots of tables to just helping to filter out soft-deleted entries from a single table.
Unfortunately, they are non-existent in MySQL…
Now, not having stored procedures and views to help generate aggregate information, forces me to hardcode the SQL in my applications. Sometimes very complex SQL that are repeated in several places of the applications. And…
Subselects are great.
They are useful as aggregates: SELECT a, b, (SELECT max(z) FROM …), d FROM…
They are useful as “tables”: SELECT a.a, a.b, b.a, b.b FROM table_a a, (SELECT a, b FROM x WHERE…) b…
They are useful as pseudoviews: SELECT * FROM (… some very complex query …) x WHERE x.y=1 and x.name like ‘%udby%’…
Now, A friend of mine (he’s a DBA consultant) tells me, that most subselects can (often better) be implemented as joins. True.
But what about this: DELETE FROM t_a a WHERE a.foreign_id IN (SELECT id FROM t_foreign WHERE deleted IS NOT NULL)
Or this: UPDATE t_b b SET b.x = (SELECT y FROM t_c c WHERE c.id=b.id) WHERE…
Or this: INSERT INTO t_b (foreign_id,…) VALUES ((SELECT id FROM t_foreign f WHERE…),…)
Need I say more ? Or is it obvious that subselects are not an option in 4.0.x versions of MySQL ? (Probably hard to implement too)
The skizofrenic TIMESTAMP datatype
I have been working with “professional” RDBMS’ since about 1994, so I should know my way around. I thought…
For this feature I was designing, where I was forced to use MySQL, I needed a “good precision” created-timestamp column. In the original spec, there was no need for any other timestamp columns… Rows were basically WORM rows, where the exact time of creation was important.
So the experienced database designer chose the MySQL TIMESTAMP data type for this column; simply because he thought it would give him greater precision than the other date/time types. Would make sense.
Unfortunately, I had not read the fucking manual and therefore did not know ANYTHING about the very strange implementation of the TIMESTAMP column type in MySQL.
Initially, it looked good. New rows where given a “created” timestamp. The first surprise was that the precision of the TIMESTAMP type was not any better than the DATETIME type. I would have expected something like milli- or microsecond precision… But it worked and it was ok for a start. And the specs said nothing about subsecond precision… Would’ve been nice though…
Then, the specs changed (they always do): there was a need for a “record first seen” timestamp and a “soft deleted” timestamp.
Now, I prefer these kinds of timestamp columns in the following order: created [updated [deleted]]:
- A row is always created (or it wouldn’t be there in the first place).
- Having a “latest updated timestamp” is optional: sometimes it is not needed; sometimes the rows are never updated (WORM).
- And the idea of “soft deletion” is only for paranoid nerds like me.
So, the 3 timestamp columns went into the table as the last 3 columns in the following order: created, seen, deleted.
Good, this is the way I usually design this kind of stuff…
Now problems appeared: When the “seen” column was updated, the “created” column would follow, even if it was never specified in the UPDATE statement !!! [you would never expect this to happen in an RDBMS without support for triggers].
Second, the nice SQL that I made to select all the “not deleted entries” (…WHERE deleted is null) returned nothing.
Perkele !! (Pardon my Finish)
After bugging around with these “features” I finally sat down and RTFM… Here’s the deal:
- The first TIMESTAMP column in a table is always (mis-)used as a “last update” timestamp. BTW: it does not change if you do an UPDATE with no changed data… (now, who’s in charge of the updating ?)
- A TIMESTAMP column cannot contain the value “null”. If you explicitly set it to null, it is given the value of the current date/time. The opposite does not count, however. You cannot set it to “null” by specifying the current date/time 🙂
- You can attempt to put just about any garbage into a TIMESTAMP column. If the data does not evaluate to a valid TIMESTAMP, the nice MySQL database will just put the “special” value ‘0000-00-00 00:00:00’ into the column… So much for data integrity…
- If you do NOT want the first TIMESTAMP field in your rows to change at any UPDATE statement, you WILL have to specify the column in the UPDATE statement… Say what ??? Alas, if I DO NOT want to change the value of a field in an UPDATE statement, I HAVE TO specify the field with its current value !!? Did I just get it backwards – or is this a serious bad decision from the MySQL designers ?
The TIMESTAMP column type is no more precise than the DATETIME type.
But, the DATETIME type can contain null values and does not change it’s value at obscure times. PLEASE do not ever use the TIMESTAMP type unless you really like to be screwed seriously in the rear end…
Repeat after me: use DATETIME, use DATETIME, use DATETIME…
This is the worst hack I’ve ever seen. Normally you would have triggers to help maintain “updated” timestamp columns. Now, MySQL (in pre-5 versions) does not have triggers. So, the “clever” people at MySQL decided to implement the TIMESTAMP column type in its own special way. Very special way. Not according to specs. Not according to expected behavior. Not in a portable way – not even portable between pre-4.1.2 and 4.1.2+ versions of MySQL:
Before MySQL version 4.1, the first TIMESTAMP column is always “auto-update-on-change-unless-you-are-very-specific”.
Since MySQL version 4.1.2 (what about 4.1 and 4.1.1 ?) YOU can specify which (and only one) column is the “auto-update-on-change-unless-you-are-very-specific” column. Atleast this is a little better. It looks as if the DBA is back in charge – just by looking at the DDL you will be somewhat informed.
Still a pity that the TIMESTAMP column does not allow subsecond precision. Not even in 5+ versions… The fact that it cannot contain dates before 1970-1-1 and after “some date in 2037” looks as if it is actually implemented as a 32-bit integer containing MILLISECONDS since 1970-1-1 (unix timestamping). If this is the truth – why cannot I have millisecond precision ? (Never mind; it looks as if we are going to get rid of MySQL before the year 2038 🙂 )
Serious errors are “sometimes” ignored – e.g. I did an SQL that was a UNION of 3 different SELECT’s; looked good on paper… Tried it, but somehow it did never return anything from the second SELECT… Strange… At last, I checked each SELECT on it’s own, and it appeared I had a syntax error in the second SELECT. Now, the nice MySQL way is to just ignore this SELECT with the error… Great !!!
After this experience, how can you possibly trust any complex SQL you have dreamt up – maybe you have a serious typo somewhere, but you’ll never know; MySQL will happily ignore it for you…
It is as if the philosophy behind the MySQL implementation is: “lets make it easy to get a lot of data IN as fast as possible, and lets make it simple to get it OUT again as fast as possible. If the data that gets IN is actually consistent or close to the truth really does not matter. As long as it gets IN. Retrieving it… Hmmm… Lets just return what we can interpret and ignore the queries that we don’t understand.”
That’s ok with me. BUT DON’T FUCKING PRETEND TO BE AN RDBMS, let alone being “enterprise ready” !!!
I cannot recommend using MySQL in any serious project at all. MySQL should be used for collecting statistical information only. And for toys, prototyping etc. The lack of transaction support, stored procedures, foreign key constraints (and others) and the ability to accept bad data makes it a candidate for never-ending debug scenarios trying to fix data inconsistency faults !
The fact that MySQL have actually addressed some of these shortcomings in the 5+ releases does not make me rest. I cannot take it seriously. Only a complete rewrite of this [beep] can make me calm down. How can these people claim to take this rather basic [RDBMS] stuff seriously now (at versions 5+), when they never did before ?
Why do I care ? Well, I care simply because I’m sometimes FORCED to implement solutions hosted at one of these web-hotels that gives you MySQL support… No version information mentioned on any of the front pages. BUT, when you HAVE married with them, you can figure out for yourself that they are STILL on some antiquated 4.0.2x version of MySQL.
And, we all now know that they have cornered themselves !
Just for the fun of it, assume you are in charge of the MySQL installation of a large web-hosting company.
You have thousands of customers using your MySQL installation. Nobody demands anything on the customers, except perhaps some max on the size of the data they store in the database.
There is absolutely NO WAY the hosting company can know anything about all the customer’s specific (mis-)use of the version specific features of the MySQL installation.
How about upgrading it ?
From version 4.0.20 to 5.1.14 ? Who can tell which customers will get into trouble ? How many customers specific scripts will have to be rewritten ? What is this going to cost the customers ?
This is a no-go. It’s not possible.
The hosting companies have ONE solution: put up a new set of hardware with the newer versions of (a serious RDBMS) installed. The hosting companies will give the customers a grace period (1 ? 2 years ?) where they can change all their applications to use the new RDBMS system. And, after the grace period, the old MySQL systems will be shut down. Or maintained for customers that are willing to pay an extra large fee just to be able to stay on a crappy SQL server…
There are other OSS alternatives out there. My favorite is: PostgreSQL. It has all the features needed. And it has had it from back when we were all wet behind our ears…
Update 2007-04-20: One of the hosting companies I have to deal with just recently decided to upgrade their database installation from 4.0.21 to 4.1.11, saying they are in the process of upgrading to version 5. That is an improvement 🙂