XML Matters: Beyond the DOM (Tips and Tricks for Using the DOM with Ease)

Author：Eve Cole Update Time：2009-07-07 16:20:21

Dethe Elza ( [email protected] ), Senior Technical Architect, Blast Radius

The Document Object Model (DOM) is one of the most commonly used tools for manipulating XML and HTML data, yet its potential is rarely fully exploited. By taking advantage of the DOM and making it easier to use, you gain a powerful tool for XML applications, including dynamic Web applications.

This issue features a guest columnist, friend and colleague Dethe Elza. Dethe has extensive experience in developing Web applications using XML, and I would like to thank him for helping me introduce XML programming using the DOM and ECMAScript. Stay tuned to this column for more of Dethe's columns.
- David Mertz

DOM is one of the standard APIs for processing XML and HTML. It's often criticized for being memory-intensive, slow, and verbose. Still, it's the best choice for many applications, and it's certainly simpler than SAX, XML's other major API. DOM is gradually appearing in tools such as web browsers, SVG browsers, OpenOffice, and so on.

DOM is great because it's a standard and is widely implemented and built into other standards. As a standard, its handling of data is programming language agnostic (which may or may not be a strength, but at least it makes the way we handle data consistent). The DOM is now not only built into Web browsers, but is also part of many XML-based specifications. Now that it's part of your arsenal, and maybe you still use it occasionally, I guess it's time to take full advantage of what it brings to the table.

After working with the DOM for a while, you'll see patterns develop -- things you want to do over and over again. Shortcuts help you work with lengthy DOMs and create self-explanatory, elegant code. Here's a collection of tips and tricks that I use frequently, along with some JavaScript examples.

The first trick

to insertAfter and prependChild

is that there is "no trick".

The DOM has two methods to add a child node to a container node (usually an Element, or a Document or Document Fragment): appendChild(node) and insertBefore(node, referenceNode). It seems like something is missing. What if I want to insert or prepend a child node after a reference node (making the new node the first in the list)? For many years, my solution was to write the following function:

Listing 1. Wrong methods for inserting and adding by
function insertAfter(parent, node, referenceNode) {
if(referenceNode.nextSibling) {
parent.insertBefore(node, referenceNode.nextSibling);
} else {
parent.appendChild(node);
}
}
function prependChild(parent, node) {
if (parent.firstChild) {
parent.insertBefore(node, parent.firstChild);
} else {
parent.appendChild(node);
}
}

In fact, like Listing 1, the insertBefore() function has been defined to return to appendChild() when the reference node is empty. So instead of using the above methods, you can use the methods in Listing 2, or skip them and just use the built-in functions:

Listing 2. Correct way to insert and add from before
function insertAfter(parent, node, referenceNode) {
parent.insertBefore(node, referenceNode.nextSibling);
}
function prependChild(parent, node) {
parent.insertBefore(node, parent.firstChild);
}

If you're new to DOM programming, it's important to point out that, although you can have multiple pointers pointing to a node, that node can only exist in one location in the DOM tree. So if you want to insert it into the tree, there is no need to remove it from the tree first as it will be removed automatically. This mechanism is convenient when reordering nodes by simply inserting them into their new positions.

According to this mechanism, if you want to swap the positions of two adjacent nodes (called node1 and node2), you can use one of the following solutions:

node1.parentNode.insertBefore(node2, node1);
or
node1.parentNode.insertBefore(node1.nextSibling, node1);

What else can you do with DOM?

DOM is widely used in web pages. If you visit the bookmarklets site (see Resources), you'll find many creative short scripts that can rearrange pages, extract links, hide images or Flash ads, and more.

However, because Internet Explorer does not define Node interface constants that can be used to identify node types, you must ensure that if you omit an interface constant, you first define the interface constant in the DOM script for the Web.

Listing 3. Making sure the node is defined
if (!window['Node']) {
window.Node = new Object();
Node.ELEMENT_NODE = 1;
Node.ATTRIBUTE_NODE = 2;
Node.TEXT_NODE = 3;
Node.CDATA_SECTION_NODE = 4;
Node.ENTITY_REFERENCE_NODE = 5;
Node.ENTITY_NODE = 6;
Node.PROCESSING_INSTRUCTION_NODE = 7;
Node.COMMENT_NODE = 8;
Node.DOCUMENT_NODE = 9;
Node.DOCUMENT_TYPE_NODE = 10;
Node.DOCUMENT_FRAGMENT_NODE = 11;
Node.NOTATION_NODE = 12;
}

Listing 4 shows how to extract all text nodes contained in a node:

Listing 4. Internal text
function innerText(node) {
// is this a text or CDATA node?
if (node.nodeType == 3 || node.nodeType == 4) {
return node.data;
}
var i;
var returnValue = [];
for (i = 0; i < node.childNodes.length; i++) {
returnValue.push(innerText(node.childNodes[i]));
}
return returnValue.join('');
}

Shortcuts

People often complain that the DOM is too verbose and that simple functions require a lot of code. For example, if you wanted to create a <div> element that contained text and responded to a button click, the code might look like:

Listing 5. The "long road" to creating a <div>
function handle_button() {
var parent = document.getElementById('myContainer');
var div = document.createElement('div');
div.className = 'myDivCSSClass';
div.id = 'myDivId';
div.style.position = 'absolute';
div.style.left = '300px';
div.style.top = '200px';
var text = "This is the first text of the rest of this code";
var textNode = document.createTextNode(text);
div.appendChild(textNode);
parent.appendChild(div);
}

If you create nodes this way frequently, typing all this code will quickly tire you out. There had to be a better solution - and there was! Here's a utility that helps you create elements, set element properties and styles, and add text child nodes. Except for the name parameter, all other parameters are optional.

Listing 6. Function elem() shortcut
function elem(name, attrs, style, text) {
var e = document.createElement(name);
if (attrs) {
for (key in attrs) {
if (key == 'class') {
e.className = attrs[key];
} else if (key == 'id') {
e.id = attrs[key];
} else {
e.setAttribute(key, attrs[key]);
}
}
}
if (style) {
for (key in style) {
e.style[key] = style[key];
}
}
if (text) {
e.appendChild(document.createTextNode(text));
}
return e;
}

Using this shortcut, you can create the <div> element in Listing 5 in a more concise way. Note that the attrs and style parameters are given using JavaScript text objects.

Listing 7. An easy way to create a <div>
function handle_button() {
var parent = document.getElementById('myContainer');
parent.appendChild(elem('div',
{class: 'myDivCSSClass', id: 'myDivId'}
{position: 'absolute', left: '300px', top: '200px'},
'This is the first text of the rest of this code'));
}

This utility can save you a lot of time when you want to quickly create a large number of complex DHTML objects. Pattern here means that if you have a specific DOM structure that needs to be created frequently, use a utility to create them. Not only does this reduce the amount of code you write, but it also reduces the repetitive cutting and pasting of code (a culprit of errors) and makes it easier to think clearly when reading code.

What's next?
The DOM often has a hard time telling you what the next node is in the order of the document. Here are some utilities to help you move forward and backward between nodes:

Listing 8. nextNode and prevNode
// return next node in document order
function nextNode(node) {
if (!node) return null;
if (node.firstChild){
return node.firstChild;
} else {
return nextWide(node);
}
}
// helper function for nextNode()
function nextWide(node) {
if (!node) return null;
if (node.nextSibling) {
return node.nextSibling;
} else {
return nextWide(node.parentNode);
}
}
// return previous node in document order
function prevNode(node) {
if (!node) return null;
if (node.previousSibling) {
return previousDeep(node.previousSibling);
}
return node.parentNode;
}
// helper function for prevNode()
function previousDeep(node) {
if (!node) return null;
while (node.childNodes.length) {
node = node.lastChild;
}
return node;
}

Easily use DOM
Sometimes you may want to iterate through the DOM, calling a function on each node or returning a value from each node. In fact, because these ideas are so general, DOM Level 2 already includes an extension called DOM Traversal and Range (which defines objects and APIs for iterating all nodes in the DOM), which is used to apply Function and select a range in the DOM. Because these functions are not defined in Internet Explorer (at least not yet), you can use nextNode() to do something similar.

Here, the idea is to create some simple, common tools and then assemble them in different ways to achieve the desired effect. If you're familiar with functional programming, this will seem familiar. The Beyond JS library (see Resources) takes this idea forward.

Listing 9. Functional DOM utilities
// return an Array of all nodes, starting at startNode and
// continuing through the rest of the DOM tree
function listNodes(startNode) {
var list = new Array();
var node = startNode;
while(node) {
list.push(node);
node = nextNode(node);
}
return list;
}
// The same as listNodes(), but works backwards from startNode.
// Note that this is not the same as running listNodes() and
// reversing the list.
function listNodesReversed(startNode) {
var list = new Array();
var node = startNode;
while(node) {
list.push(node);
node = prevNode(node);
}
return list;
}
// apply func to each node in nodeList, return new list of results
function map(list, func) {
var result_list = new Array();
for (var i = 0; i < list.length; i++) {
result_list.push(func(list[i]));
}
return result_list;
}
// apply test to each node, return a new list of nodes for which
// test(node) returns true
function filter(list, test) {
var result_list = new Array();
for (var i = 0; i < list.length; i++) {
if (test(list[i])) result_list.push(list[i]);
}
return result_list;
}

Listing 9 contains four basic tools. The listNodes() and listNodesReversed() functions can be extended to an optional length, similar to the Array's slice() method. I leave this as an exercise for you. Another thing to note is that the map() and filter() functions are completely generic and can be used to work with any list (not just lists of nodes). Now I show you a few ways they can be combined.

Listing 10. Using functional utilities
// A list of all the element names in document order
function isElement(node) {
return node.nodeType == Node.ELEMENT_NODE;
}
function nodeName(node) {
return node.nodeName;
}
var elementNames = map(filter(listNodes(document),isElement), nodeName);
// All the text from the document (ignores CDATA)
function isText(node) {
return node.nodeType == Node.TEXT_NODE;
}
function nodeValue(node) {
return node.nodeValue;
}
var allText = map(filter(listNodes(document), isText), nodeValue);

You can use these utilities to extract IDs, modify styles, find certain nodes and remove them, and more. Once the DOM Traversal and Range APIs are widely implemented, you can use them to modify the DOM tree without first building the list. Not only are they powerful, but they also work in a similar way to what I highlighted above.

The danger zone of the DOM
Note that the core DOM API does not enable you to parse XML data into DOM, or serialize DOM into XML. These functions are defined in the DOM Level 3 extension "Load and Save", but they are not fully implemented yet, so don't think about them now. Each platform (browser or other professional DOM application) has its own method of converting between DOM and XML, but cross-platform conversion is beyond the scope of this article.

DOM is not a very safe tool - especially when using the DOM API to create trees that cannot be serialized as XML. Never mix DOM1 non-namespace APIs and DOM2 namespace-aware APIs (for example, createElement and createElementNS) in the same program. If you use namespaces, try to declare all namespaces at the root element position and don't override the namespace prefix, otherwise things will get very confusing. Generally speaking, as long as you follow your routine, you won't trigger critical situations that could get you into trouble.

If you have been using Internet Explorer's innerText and innerHTML for parsing, you can try using the elem() function. By building similar utilities, you get more convenience and inherit the benefits of cross-platform code. Mixing these two methods is very bad.

Some Unicode characters are not included in XML. The implementation of the DOM allows you to add them, but the consequence is that they cannot be serialized. These characters include most control characters and individual characters in Unicode surrogate pairs. This only happens if you try to include binary data in the document, but that's another gotcha situation.

Conclusion
I've covered a lot of what the DOM can do, but there's so much more that the DOM (and JavaScript) can do. Study and explore these examples to see how they can be used to solve problems that may require client scripts, templates, or specialized APIs.

The DOM has its limitations and shortcomings, but it also has many advantages: it's built into many applications; it works the same way whether you use Java technology, Python, or JavaScript; it's very easy to use SAX; use the templates mentioned above , which is both simple and powerful to use. An increasing number of applications are beginning to support DOM, including Mozilla-based applications, OpenOffice, and Blast Radius' XMetaL. More and more specifications require the DOM and extend it (for example, SVG), so the DOM is always around you. You'd be wise to use this widely deployed tool.