Now, I’ve written some ranting about why I hate MySQL, now it is time to write about another Open Source RDBMS, PostgreSQL.
In this blog I am going through why I think PostgreSQL is a great database – comparable with the most advanced commercial database products.
I have worked mostly with DB2 and Oracle in my professional life. Although DB2 and Oracle are quite different, there are a lot of features they have in common, that you get to expect from a professional database.
Then when I started looking at MySQL some years ago, I got quite surprised to find that it lacked most of the data-integrity features that you would expect any decent RDBMS to have, and that the SQL dialect and supported data types were quite different from the ordinary.
As a freelance developer and an OSS enthusiast, I wanted a serious Open Source RDBMS and somehow found PostgreSQL. This was back when it was a version 8-beta something. The first PostgreSQL version to have a native windows install.
I will start with repeating some of the perhaps-obvious features that PostgreSQL has:
- Lots of native data-types: integer types, arbitrary precision numeric types, time-, date- and timestamp types (with or without time zone), network types, geometric types, Booleans, arrays and user defined types. With no “surprises” in the implementation.
- Triggers, stored procedures, user defined functions.
- Transactions; 2 out of 4 standard isolation levels supported.
- Indexes; on any column type and functional indexes…
- Table spaces.
And worth mentioning, most of the features have been core features since the beginning.
Generally, compared to most other non-OS RDBMS in the market, PostgreSQL looks most like Oracle – if you are using Oracle already, switching to PostgreSQL is almost a no-brainer, unless you use partitioning or other Oracle specific features.
Now, I’ve written an entry called “Why I Hate MySQL” and I will start this entry trying to do a 1-1 comparison between the entries…
First I will tell about foreign keys. A foreign key is when one entity (the child) has a reference to a key in another entity (the parent). The “reference to a key” in the child entity is called the foreign key, as it is a key to a foreign entity.
An example could be the Order – Order-Line relation in an order-entry system.
The Order (parent) entity contains all the details about the order in general: reference to the Customer, possible Delivery-Address, payment details, entry timestamp, status etc.
The Order-Line (child) contains details about a Product being ordered: reference to the Product, quantity, Size, Color, packing options, pricing etc. And a reference to the Order entity.
This is pretty basic stuff and you can implement it in just about any RDBMS. Even MySQL…
The fun starts when the RDBMS actually knows about the relationship. The RDBMS can then help you ensure that no Order-Line exists without a corresponding Order and that you cannot delete the Order without also deleting the corresponding Order-Lines. This is about data integrity; that you cannot have orphan children dangling in the system.
Only problem with foreign keys are that they can hurt performance and even be a source of hard to debug deadlock problems, as the RDBMS will sometimes have to lookup an Order when operations are done on an Order-Line and vice versa.
With PostgreSQL, RDBMS managed foreign keys are possible, and you have all the extended options with cascade-deletes etc. And PostgreSQL has had these for yearsâ€¦
Transactions are important; very important. If you are going to ensure some kind of referential integrity and try to maintain “controlled redundancy”, transactions are strictly necessary.
Take as example the classical accounting scenario: we have an Account entity with, among other things, a calculated balance. And we have a corresponding Account-Posting entity, which contains account history. Let us have some (PostgreSQL specific) DDL:
CREATE TABLE account ( id serial NOT NULL, balance NUMERIC(10,2) NOT NULL DEFAULT 0, created TIMESTAMP NOT NULL DEFAULT now(), updated TIMESTAMP, CONSTRAINT account_pkey PRIMARY KEY (id) );
CREATE TABLE account_posting ( id serial NOT NULL, account_id INTEGER NOT NULL, amount NUMERIC(10,2) NOT NULL, created TIMESTAMP NOT NULL DEFAULT now(), CONSTRAINT account_posting_pkey PRIMARY KEY (id), CONSTRAINT account_posting_account_id_fkey FOREIGN KEY (account_id) REFERENCES account (id) );
Lots of details are left out for clarity, but I think everyone gets the picture (and we have a nice example of a foreign key constraint as well 😉 ).
Now, take the following sequence of events, trying to transfer “:amount” from “:src_account_id” to “:dst_account_id”:
INSERT INTO account_posting (account_id, amount) VALUES (:src_account_id, -(:amount)); INSERT INTO account_posting (account_id, amount) VALUES (:dst_account_id, :amount); UPDATE account SET balance = balance - amount, updated = now() WHERE id = :src_account_id; UPDATE account SET balance = balance + amount, updated = now() WHERE id = :dst_account_id;
This all looks good on paper… Now, just for the sake of discussion, assume that just one of the 4 statements fails! And we are in big trouble – at least one account will have a bad balance! (E.g. assume we have an RDBMS without foreign key constraints and that either src_account_id or dst_account_id refers to a non-existing account…)
This is where transactions come into the picture:
BEGIN; INSERT INTO account_posting (account_id, amount) VALUES (:src_account_id, -(:amount)); INSERT INTO account_posting (account_id, amount) VALUES (:dst_account_id, :amount); UPDATE account SET balance = balance - amount, updated = now() WHERE id = :src_account_id; UPDATE account SET balance = balance + amount, updated = now() WHERE id = :dst_account_id; commit;
Now, if anything goes wrong, all 4 statements are rolled back, and no accounts are modified.
This is possible with PostgreSQL…
Functions, Stored Procedures and Triggers
The ability to create “user defined functions” is an integral part of most modern RDBMS’. Basically it enables developers to place important logic as close to the data as possible.
This can be misused, but can also be a very important feature for ensuring data integrity. And for solutions where there is no “middle tier” or where the middle tier has poor transaction management, putting business logic into functions and stored procedures actually makes sense.
I don’t actually fancy neither too much – I prefer to work with an “Enterprise Ready” middle tier – but if you are forced to use PHP or the like, they can be a great help.
I have seen triggers misused greatly, where most business logic was placed in triggers – magic happens whenever you touch anything in the database, but no one can figure out how or why.
Triggers should never update anything but “its own” entity and perhaps its parent or children. Or to maintain historical or statistical information.
A typical example of a trigger is one that maintains “update” timestamps on entities, implementing “soft deletes” or performing non-trivial checks on data.
One of my favourite PostgreSQL triggers is this (7.4 dialect):
CREATE FUNCTION ts_update() RETURNS "trigger" AS ' BEGIN NEW.updated := ''now''; RETURN NEW; END; ' LANGUAGE plpgsql;
It can be used on any table having a timestamp column named “updated”, e.g:
CREATE TRIGGER trig_update BEFORE UPDATE ON any_table FOR EACH ROW EXECUTE PROCEDURE ts_update();
PostgreSQL gives the developer the ability to create functions, stored procedures and triggers. PostgreSQL is born with SQL and its own “plpgsql” language. Other languages can be installed, e.g. tcl, Perl, JAVA, PHP, Ruby and Python. Even C can be used if you insist 🙂
In PostgreSQL, a stored procedure can return a result-set, which can be used in e.g. a join. The following example returns all dates within the given period (PostgreSQL-8 dialect):
CREATE FUNCTION period(from_date DATE, to_date DATE) RETURNS SETOF DATE AS $BODY$ DECLARE CURRENT DATE := from_date; BEGIN while CURRENT <= to_date loop RETURN NEXT CURRENT; CURRENT := CURRENT + 1; END loop; RETURN; END;$BODY$ LANGUAGE 'plpgsql' IMMUTABLE;
Views are like named sub-selects. They are great for a large number of uses:
- They can help de-normalize and “humanize” heavily normalized data.
- They can help hide soft-deleted entities.
- They can give limited, even read-only, access to selected rows and columns to low-security users.
- They can help generate aggregated and other “view-related” information for presentation purposes.
- They are good for capturing often-used queries that are reused in different places of the application.
So, I really like views a lot. And I actually prefer to create views from queries that I have spent time creating and optimizing – they are also a great documentation tool.
And, in applications developed with a “weak” middle-tier, they can help reduce the amount of complex SQL hardcoded into the application.
PostgreSQL supports views. Almost any possible query can be turned into a view. Even joins with results from stored procedures can be used in views.
PostgreSQL also supports updateable views, further enhancing the soft-delete scenarios and making “time travel” implementations relatively easy.
Okay, I’m not going into details here. Need I say the PostgreSQL naturally supports all the different kinds of sub-selects you will ever need?
There are lots of nice features in modern “Enterprise Ready” relational databases that I have not covered here. And there are lots of nice features in PostgreSQL as well.
Now, having worked mostly with Oracle the last couple of years, there are only 2 features I sometimes miss:
- Advanced partitioning. It is possible to implement something that looks like it in PostgreSQL, but the implementation leaves a lot to be desired, and there is far too much manual work involved.
- XML processing – having the database know about the structure of XML content, enabling you to do XPath searches, transformations etc.
As far as I know, XML support is planned in the coming version 8.3 release of PostgreSQL.
So, are you already an Oracle user and are looking for a good Open Source alternative, you should definitely consider PostgreSQL. It comes with native Windows installer that also gives you the absolutely amazing graphical pgAdmin tool that is far better than any GUI tool I have ever seen from Oracle 🙂
PostgreSQL has a far smaller memory footprint and takes a lot less space on your harddrive (not that it matters too much these days). The windows installation is a few-minutes no-brainer!
If you need a web-based GUI tool, there is a great PHP based tool called phpPgAdmin.
PostgreSQL has only one thing against it – its name… It is a little hard to pronounce – at least for a Danish guy.
Update 2007-05-23: People start complaining that I compare PostgreSQL version 8 with MySQL version 4 and me not being fair. But I compare PostgreSQL in general with pre-5 versions of MySQL. And this is what is offered to me by hosting companies per se.
And, I’m not a marketing or product person. I’m a blogger and I blog about what I like and dislike. I like PostgreSQL (any supported version) and I dislike MySQL (any version at all).