The DX Files: Abandon Anonymous Arrays of Attributes

This is part three of my series, The DX Files: Improving Drupal Developer Experience. This time, I’m suggesting changing some of Drupal’s most basic data structures and APIs by replacing anonymous arrays with well-defined data structures. I fully expect lots of disagreement.

Many of Drupal’s APIs (Form API, Schema API, etc.) use PHP arrays to represent complex structured data. For example, here is a Form API data structure:

<?php
$form
['author'] = array(
 
'#type' => 'fieldset',
 
'#access' => user_access('administer nodes'),
 
'#title' => t('Authoring information'),
 
'#collapsible' => TRUE,
 
'#collapsed' => TRUE,
 
'#weight' => 20,
);
$form['author']['name'] = array(
 
'#type' => 'textfield',
 
'#title' => t('Authored by'),
 
'#maxlength' => 60,
 
'#autocomplete_path' => 'user/autocomplete',
 
'#default_value' => $node->name ? $node->name : '',
 
'#weight' => -1,
 
'#description' => t('Leave blank for anonymous.');
);
?>

Some of the downsides to this representation include:

  • Developer IDEs cannot provide auto-completion or a similar form of assistance while the code is being written.
  • Invalid form properties (those that begin with “#”) cannot be identified at compile- or run-time.
  • It is awkward to associate default values or other automatic behaviors with array structures.
  • Functions that operate on specific kinds of form elements, such as textfield_validate(), are not assured they are being passed an “array of the right type.”

An alternative representation uses typed data structures, specifically a PHP class but without any methods (basically, what C calls a struct). For example:

<?php
$form
= new Form();

$form->elements['author'] = $author = new FieldsetElement();
$author->access = user_access('administer nodes');
$author->title = t('Authoring information'),
$author->collapsible = TRUE;
$author->collapsed = TRUE;
$author->weight = 20;

$author->elements['name'] = $name = new TextfieldElement();
$name->title = t('Authored by');
$name->maxlength = 60;
$name->autocomplete_path = 'user/autocomplete';
$name->default_value = $node->name ? $node->name : '';
$name->weight = -1;
$name->description = t('Leave blank for anonymous.');
?>

The second version of the code is almost identical to the first except for the change in representation; it is no harder to write and the conversion can even mostly be handled by search-and-replace. However, the object representation addresses (or can address) all the problems with the array representation and provide a variety of other benefits.

As an example, class representation solves the problem of functions not being sure about the kind of data they are passed. In the file that defines text field elements, we might have code like:

<?php
class TextfieldElement extends FormElement {
 
// Maximum length; NULL means no limit;
 
public $maxlength = NULL;

 
// AJAX path for auto-completition; NULL means no auto-complete.
 
public $autocomplete_path = NULL;
}

function
textfield_validate(TextfieldElement $element) {
 
// $element is guaranteed to be a TextfieldElement
}
?>

Now, textfield_validate() is guaranteed to be passed a TextfieldElement.

Note: Yes, we could also change functions like textfield_validate() into methods of class TextfieldElement. Gotta start somewhere, though. Baby steps!

Comments

I'm on the same page as you

I'm on the same page as you and was just looking at drupal, today, wondering what it would take to make some of these changes and what the impact (performance, barrier to entry) would be. Reading this 'baby step' is like a breath of fresh air. Thanks.

I'd strongly support this

I'd strongly support this syntactic move to objects for Form API, actually.

In addition to all the disadvantages you mentioned, another is it's completely impossible to document those attributes using any of the existing standard tools like Doxygen. The result is the monstrous Forms [sic] API Reference which is something I whipped up in Dreamweaver back when I was a total noob, because there wasn't a viable alternative. The result is an endlessly scrolling document that's nearly impossible to edit. The alternative that has been suggested is making a separate site to document FAPI, with all form API properties and attributes nodes with CCK fields. Either way, we're special-casing this stuff because we're using this weird syntax.

Beware though, this is not as simple as "find and replace" -- it touches a LOT of things.
Anything dealing with changing the way forms work is going to touch 90% of Drupal. And for example, hook_form_alter() is run through drupal_alter(), a centralized function for altering "thingies" in Drupal. Some examples of "thingies" that can be altered are menus and links. So these would also need to be translated into objects unless we go with some HORRIBLY developer unfriendly $object param that magically gets cast to an array somewhere deep inside. I'm sure there are a lot more of these kinds of things that I'm not thinking of, too.

I find, with irony, that I

I find, with irony, that I like the use of arrays especially since I was teaching OOP around 1990. Hated array use at first but have come to really appreciate it. I could certainly get use to using objects again but one thing I really appreciate is the ability to search & replace for ['bar'] (including the quotes) $foo['bar'] which is not as precise as search or ->bar from $foo->bar because it could match $foo->barista. Yes, you may view this concern as esoteric, and it is, but moving to objects is not complete without cons which I give my point to illustrate. I'm sure there are other benefits to sticking with arrays but they are not coming to mind in the 5 minutes I have to comment on this.

I fully agree, and I would

I fully agree, and I would be surprised if it happens.

Being a long time PEAR proponent, I was shocked at the non-intuitive architecture in Drupal, but I've grown to appreciate the lower barrier to entry for non-developers.

However, I think making use of the SPL, in particular interators could provide multiple interfaces to the same strong data structure. I find when teaching people Drupal (especially experienced developers), the worst part of the learning curve is remembering all this stuff like #default_value, etc...

I'm not saying we should re-write QuickForm because it is a hog, but at least changing the FAPI structure to make it more intuitive would be a good start.

So now writing

So now writing $form['markupelement']['#markup'] = t('text') would require $markupelement = new Markupelement; $markupelement->markup = t('text'); $form->elements['markup'] = $markupelement? Are you sure this is so much better DX? Functions already quite know what they are passed into -- it's $form['#type']. Oh and $form['#type'] can be dynamically changed while changing the class is not so easy... I would heartily recommend thinking more on this before posting another megapatch to the issue queue which then gets abandoned quickly.

This really wouldn't be that

This really wouldn't be that hard to mock up. Just need to add a method for turnFromIntoArray or whatever.

From your example:

<?php
$form
= new Form();

$form->elements['author'] = $author = new FieldsetElement();
$author->access = user_access('administer nodes');
$author->title = t('Authoring information'),
$author->collapsible = TRUE;
$author->collapsed = TRUE;
$author->weight = 20;
?>

Just turn that into:

<?php
function my_form_callback() {
 
$form = new Form();

 

$form->elements['author'] = $author = new FieldsetElement();
 
$author->access = user_access('administer nodes');
 
$author->title = t('Authoring information'),
 
$author->collapsible = TRUE;
 
$author->collapsed = TRUE;
 
$author->weight = 20;

  return

$form->turnFormIntoArray();
}
?>

I don't have time, but does someone want to make the proof of concept module to test this out?? *cough* EATON *cough*

The potential benefits here

The potential benefits here far outweigh the potential drawbacks. Certainly this would be perceived as raising the barrier to entry a bit, but the possibility for better and more standardized documentation as webchick mentions is a big mitigating factor there. Actually, for the most basic use-case (and the one that new developers will be encountering most) it's a minor syntactic change (as Barry points out). Is "new TextFieldElement" significantly harder to remember than "'#type' => 'textfield'"? For a developer who is just starting out I would wager it's about the same. The vast majority of devs don't write field-level validators or anything else of that nature so they would never have to touch the potentially more-confusing OO parts but for devs who *do* need to muck around with that sort of thing, the power and clean extensibility that an approach such as this offers is a big advantage.

So now writing

So now writing $form['markupelement']['#markup'] = t('text') would require $markupelement = new Markupelement; $markupelement->markup = t('text'); $form->elements['markup'] = $markupelement?

I think with the use of intelligently-crafted constructors, I think it could be more like $form->elements['markup'] = new Markupelement(t('text'));.

Oh and $form['#type'] can be dynamically changed while changing the class is not so easy

In this instance, if the form-handling functions were defined to expect a superclass for fields -- FormElement and then all specific element classes inherited from FormElement, then basic attributes and operations on the field would be the same regardless of the specific element in question (e.g. if the attribute title is defined in the class FormElement then it's guaranteed to be present in anything that inherits from FormElement and the handling functions can count on that) . I agree that it would lose *some* malleability, but for any significant change in field type there is already some special-casing that must be done and so changing to OO would just be a shift in concept, not in difficulty (existing codebase notwithstanding of course :p).

Another benefit that just dawned on me here is that a FormElement class with default values means that those defaults are available to a developer as soon as the field is instantiated. With the current system any default values for the attributes are not present until after the array is submitted to whatever function is supplying the defaults internally.

Don't get me wrong, I'm not saying this isn't a massive undertaking that would require some serious thinking, I just feel that it's definitely worth exploring because there are some major benefits to be had!

Seems like a good proposal

Seems like a good proposal to me. The IDE friendliness is a big deal for productivity and for learning curve. It is great to have your editor guide you as you code, especially for newbies.

I'd be happy if this pattern started to get used in the fields patch, and then spread to other array based areas.

Webchick is absolutely right

Webchick is absolutely right (in her comment above). Don't kid yourself, Barry: implementing this proposal would indeed quickly lead to re-writing 90% of Drupal's core APIs, in one hell of a spectacular domino show. Here's what my crystal ball tells me:

  1. Let's change the form API so it uses objects of attributes, instead of arrays of attributes.
  2. Hey, now we've got objects, let's write proper classes and constructors and inheritance systems for our form elements.
  3. If we're gonna do this properly, we really should start encapsulating all form behaviour within these classes...
  4. This means that all standard Drupal functions that are used by the form API for array processing, they'll need to be changed to process objects instead...
  5. But those functions are also used by the menu API, the mail API, the block API, the node rendering API, the schema API...
  6. Let's just go the whole hog, then, and convert all those APIs to use objects instead of arrays instead. And let's convert them properly, too - because, you know, objects are conceptually different to arrays, and if we don't embrace that then we're just wasting our time.
  7. ... what about the theme system...

While I agree with the concept of abandoning anonymous arrays of attributes, the enormity of the task makes me cautious. You REALLY need to consider, above all else, the amount of work involved vs the concrete benefits. From what I can see, the amount of work involved is potentially no less than that of re-writing Drupal in Java. And the only concrete benefit is a marginal improvement in DX (plus, whether this is an improvement is itself open to debate - personally, I think it is).

Don't tell me the abstract benefits. Don't tell me about all the amazing new potential that Drupal will have, once its underlying data structures have been made more OO. I understand these benefits as well as the next developer, and don't get me wrong - I want to see them come to fruition. But abstract benefits alone do not justify this amount of work, not to mention this steep a learning curve for developers who are used to the current Drupal paradigm. This big a change needs to be justified with clear, concrete examples of new features that are currently difficult or impossible to implement.

Baby steps is the key ingredient here, and also the key challenge. This proposal will get into core, only if we manage to solve the difficult problem of how to implement it one step at a time, and to avoid the domino effect.

I am cautiously in favor,

I am cautiously in favor, actually. I started thinking along similar lines shortly after GoPHP 5, and chx and I demoed some possible thinking along similar lines in Barcelona.

In particular, I would support this iff we are properly leveraging SPL. Using rudimentary OOP here (just classes and properties) doesn't really give us much, and makes the syntax much more verbose. However, leveraging the ArrayAccess interface instead of $form->elements[] would give us a cleaner syntax with no loss of functionality. Heck, it may even make defaults easier to handle. :-)

<?php
class FormElement implements ArrayAccess {
 
//...

 

function offsetGet($index) {
    if (!isset(
$this->children[$index])) {
     
$this->children[$index] = new FormElement;
    }
   
$this->children[$index];
  }
}

$form = new FormElement;
$form->type = 'fieldset';
$form['author']->type = 'text';
$form['author']['status']->type = 'checkbox';
?>

(I just tested the above and it does in fact work, which is quite cool.)

I observed a while ago that FAPI is actually 98% compatible with SimpleXML's syntax. We could almost use SimpleXML directly (or subclass off it, or simply provide a very very similar syntax) were it not for array properties like #options. (Sheesh, imagine being able to serialize Forms to XML easily. That would be just weird. :-) ) Of course, SimpleXML puts the properties in [] and the children in ->, so that's backward from what Barry is suggesting here.

Another benefit? "Get me the fully loaded and defaults-set and altered form structure for this form, but do NOT render or process it yet" is still more than one line of code, which is a problem.

I don't see why drupal_alter() would be an issue. drupal_alter() shouldn't care about what sort of data structure it's passing on to be altered. There's some wacky hacks in it now to work around reference handling in PHP 4, but there's big TODOs around them to say "remove in Drupal 7". (Has anyone done that yet?) Of course, since objects in PHP 5 are resources and therefore always behave as if they pass by reference, that problem goes away anyway.

All that said, however, chx has a point. Arrays are extremely expressive. They're also extremely easy to manipulate in weird and unexpected ways. I've done all sorts of unholy things to forms that while I'm sure are possible using an object syntax would be 3-4 times more verbose. There's also a knock-on effect to nearly everything in Drupal. And let's not forget that any major overhaul of that sort is a 6 month full time job if you already have buy-in from the committers. :-)

So, maybe. If done right, could be good. But doing it right would be a herculean task.

Another problem is

Another problem is hook_elements. Currently you can add defaults to any already defined element with great ease, just return $types['sometype']['#myproperty'] = 'foo'. How are you going to deal with that?

Not for the first time I feel like the lone guardian of the last nice little garden against the incoming barbarian OOP hordes.

moving in this direction is

moving in this direction is inevitable. but there is a lot of inertia to overcome with both php and drupal having deep roots in functional programming. I think it will take a lot of supporters being very vocal over an extended period.

It's not either/or, chx. I

It's not either/or, chx. I can definitely see how hook_elements could remain array based while making the interface to forms themselves OOP. It just requires code deeper in the system inside the objects to do the translation. I guess i can go either way on that.

There's nothing wrong with OOP if done right. Standing astride the river of development and yelling "stop!" just because there's a class involved is doesn't make any sense. It's not like classes are some evil invading Mongolian horde.

@Harry Slaughter: Vocal, you

@Harry Slaughter: Vocal, you see that's the problem. Those folks I try to speak out for are usually not vocal exactly because they are not trained programmers and the OOP guys are programmers by graduation -- and as I told many times they find this crap intuitve as that's what beaten into them (i was there too though not graduated) and noone else does. I will say this over and over and over...

Hey everyone, pull

Hey everyone, pull yourselves together. Re-read the very first paragraph of my post:

I’m suggesting changing some of Drupal’s most basic data structures and APIs by replacing anonymous arrays with well-defined data structures.

Nowhere in that sentence does it say switching to a class-based design. I'm proposing using TYPED DATA STRUCTURES. The fact that the word "class" is involved is because that's the data structure mechanism PHP provides.

Even if this does not happen

Even if this does not happen now you are going to make fields objects without discussing with anyone which I simply can't fight because you simply do not listen to reason. A few of us gave you a number of arguments against and asked for at least one specific use case which is currently hard / impossible with form API and is easy after this change. Jaza also asked for one but there was no reply.

Is reworking all of Form API

Is reworking all of Form API really the most parsimonious solution to these problems? Here's the original set of problems, plus webchick's extra one:

  1. Developer IDEs cannot provide auto-completion or a similar form of assistance while the code is being written.
  2. Invalid form properties (those that begin with “#”) cannot be identified at compile- or run-time.
  3. It is awkward to associate default values or other automatic behaviors with array structures.
  4. Functions that operate on specific kinds of form elements, such as textfield_validate(), are not assured they are being passed an “array of the right type.”
  5. There's no good standard tool for documenting big array-based PHP data structures like the ones in Form API.

I'm a Ruby developer in my spare time so I don't think #4 is a real problem. :) What does textfield_validate() care if its argument is a "real" textfield or not? If it looks like a textfield, and acts like a textfield, it's a textfield. Let textfield_validate() check the #type and the required properties, and if it finds them, it shouldn't worry further.

Is #2 a real problem? We could take a stab at solving it by creating a "strict" version of Form API that sends up errors when it finds a form element which contains properties that aren't part of its #type's official property list. You could turn that module on during development, then turn it off in production. In fact, thanks to the miracle of hook_form_alter(), I'm pretty sure that we could write a module that does that, right now. Has anyone seen the need for that module before? If not, it might be a strong hint that #2 isn't a significant real-world problem.

#3: Would it be useful to provide some official helper functions that create form elements for you? You could say

<?php
$form
['author'] = form_create_fieldset(array(
   
'#title' => t('Author'),
));
$form['author']['name'] = form_create_textfield(array(
 
'#title' => t('Name'),
 
'#maxlength' => 60,
));
?>

The form_create_X() functions could set defaults for you. They could check for invalid form properties. They could even (god help us) call a hook that lets modules override bits of its functionality and set defaults of their own. And they could be given their own Doxygen documentation, which would help out with problem #5.

If this seems like overkill... well, then object constructors and an inheritance tree are even more overkill.

Problem #5. Here's a glib question: If the available docs tools, like Doxygen, do a bad job of documenting the existing Drupal API, why try to solve this problem by reworking 85% of the Drupal API? Why not focus on developing a better docs tool -- one that, for example, lets you document an array argument by individually documenting the various keys, both required and optional? Has this been tried in the past and found to be impossible?

That leaves #1. Now, I know that Java programmers look askance at any programming environment that doesn't autocomplete everything under the sun. And I'm an emacs user and someone who avoids Java like the flu, so I'm vulnerable to the accusation that I just don't know what I'm missing. But given the choice between solving #1 and an embrace of strict typing, a complete rework of Form API, and the accompanying reeducation of all Drupal coders, I'd vote to notfix #1. Let those who love Java port Drupal to Java. ;)

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <i> <h1> <h2> <h3> <blockquote>
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options