Schema.module Considered Mandatory

In the last couple days, two unrelated patches were committed to Drupal 6 with a common flaw: they caused inconsistencies between the database schema as defined by every module's hook_schema() function and the actual layout of the database. They are not the first two; I've personally found and fixed at least a dozen such mistakes, usually via a follow-up patch after the bug has already been committed to CVS. What makes these bugs so irritating is that they are so absolutely trivial to detect and usually very easy to fix before the initial patch goes in. Fixing them afterward takes a lot more overhead (and, more importantly from my narrow point of view, my overhead). So I want to put an end to it.

Therefore, with no authority whatsoever, I hereby declare the following law:

Every developer that submits or reviews any patch that modifies a Drupal 6 (or later) .install file must install schema.module and, using its Compare report, assert upon pain of public humiliation that the patch introduces no database schema inconsistencies into a fresh install or an upgrade from the previous core version.

It's really easy, folks. Just install the module and visit Administer > Site building > Schema. "Compare" is the default tab. If no Mismatch or Missing tables are reported, you're all set. If there are errors, open the appropriate box and look at what they are. Almost always, you changed hook_schema() without writing a hook_update_N() function or wrote a hook_update_N() function without changing hook_schema().

I know many of you are wondering, "Why is this so important?" There are several reasons.

1. Quality. Good software is squeaky-clean. Database inconsistencies, where table columns might have different properties depending on whether the system is a clean install, an upgrade, or is using one or another database engine are just sloppy. Poor quality in one area leads to poor quality in others and soon we have nothing but junk to work with.

2. Performance. Suppose a database update accidentally drops the primary key on an important table (this has happened more than once during D6 development). Most people test new versions with small sites so the problem may not be noticed until late in the release cycle (or worse, post-release) when upgraders report horrible slowness.

3. The Schema API allows Drupal to use and manipulate its database schema in a variety of ways that make all kinds of new functionality possible. In order for this to work reliably, Drupal must have access to an accurate representation of the database layout. For example, the new drupal_write_record() API function is the first step in Drupal towards data-driven database access. It constructs INSERT or UPDATE SQL statements based on the schema data structure. If the schema structure does not accurately reflect the database it can produce wrong (perhaps very subtly wrong) SQL that will lead to tricky bugs and data loss.

So if you write Drupal core patches, maintain a core branch, or are the esteemed founder of this project (note who committed the two patches above), take a stand! No more schema inconsistencies in Drupal core!

P.S.: A additional solution, of course, is to have a suite of regression tests that get run against Drupal after applying each "RTBC" patch before that patch is committed. Schema.module's comparison report could be one of those tests. I plan to write tests using the simpletest framework very shortly for that but I do not know how long it will be until such tests are run automatically before every patch is committed. So there is no excuse not to run the Compare report yourself.

P.P.S.: I've run the D5 to D6 upgrade process hundreds of times for both MySQL and PostgreSQL. I've automated the task to make it quick and easy to do so and will post an article about how to do so soon.