Troubleshooting Document Format There are dozens, if not hundreds, of recipes in my mother's list. If a fatal error occurs, debugging will be very difficult - you will be searching for the missing marker line by line. If you use several levels of nesting, finding errors will be difficult.
But good help can be found. Parsers - Applications that parse XML code and report malformed errors are freely available online. The best of these is Lark, written by Tim Bray - technical editor and vocal advocate of the XML specification, one of the smartest people on the planet.
I'm using Lark to analyze the code below. Note that "chocolate chips" and its closing tag appear in the wrong position within the </ingredients> tag:
<?xml version="1.0"?>
<list>
<recipe>
<author>Carol Schmidt</author>
<recipe_name>Chocolate Chip Bars</recipe_name>
<meal>Dinner
<course>Dessert</course>
</meal>
<ingredients>
<item>2/3 C butter</item>
<item>2 C brown sugar</ item>
<item>1 tsp vanilla</item>
<item>1 3/4 C unsifted all-purpose flour</item>
<item>1 1/2 tsp baking powder</item>
<item>1/2 tsp salt</item>
<item>3 eggs</item>
<item>1/2 C chopped nuts</item>
<item>
</ingredients>2 cups (12-oz pkg.) semi-sweet choc.
chips< /item>
<directions>
Preheat overn to 350 degrees. Melt butter;
combine with brown sugar and vanilla in large mixing bowl.
Set aside to cool. Combine flour, baking powder, and salt; set aside.
Add eggs to cooled sugar mixture; beat well. Stir in reserved dry
ingredients, nuts, and chips.
Spread in greased 13-by-9-inch pan. Bake for 25 to 30 minutes
until golden brown; cool. Cut into squares.
</directions>
</recipe>
</list>
The following are the results returned by the analyzer:
Error Report
Line 17, column 22: Encountered </ingredients> expected </item>
... assumed </item>
Line 18, column 36: Encountered </item> with no start-tag.
With this information, finding the error will not be a problem. So what does the validity of an XML file mean?
Implementing Effectiveness Ultimately we will add information to a well-organized XML document. Actually, we have a lot to do - there are still crises lurking - and while the XML file is well organized,
But critical information can also be lost. Take a look at the following example:
<recipe>
<author>Carol Schmidt</author>
<recipe_name>Chocolate Chip Bars</recipe_name>
<meal>Dinner <course>Dessert</course> </meal>
<ingredients> </ingredients>
<directions>Melt butter; combine with, etc. ... </directions>
</recipe>
This recipe doesn't include ingredients, and because it's so well organized,
Lark analyzer won't find the problem either. Anyone who has managed even the most benign of databases knows the mistake we humans make: given the chance, we throw out critical information and add in useless nonsense. That's why the inventor of XML introduced DTD -
Document Type Definition. DTDs provide a way to ensure that XML is more or less what you want it to be.
Let's look at a DTD used in recipes.
<!DOCTYPE list [
<!ELEMENT recipe (recipe_name, author, meal, ingredients, directions)>
<!ELEMENT ingredients (item+)>
<!ELEMENT meal (#PCDATA, course?)>
<!ELEMENT item (#PCDATA, sub_item*)>
<!ELEMENT recipe_name (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT course (#PCDATA)>
<!ELEMENT item (#PCDATA)>
<!ELEMENT subitem (#PCDATA)>
<!ELEMENT directions (#PCDATA)>
]>
The code may seem unfriendly at first, but it makes sense when you break it down. Let's explain it in detail:
<!DOCTYPE list [
This line says that what is enclosed in square brackets is a document with the root element <list>
DTD. As we mentioned before, the root element contains all other elements.
<!ELEMENT recipe (recipe_name, meal, ingredients, directions)>
This line defines the <recipe> tag. The parentheses mean that the four tags must appear in the <recipe> tag in order.
<!ELEMENT meal (#PCDATA, course?)>
This line needs detailed explanation. I have defined the following structure:
<meal>Here the meal name is mandatory
<course>One course name may appear, but it is not
mandatory</course>
</meal>
I do this because, the way I think about it, lunch doesn't necessarily have to be a specific dish, but dinner might point to appetizers, main courses, and desserts. By specifying
#PCDATA - Represents parsed character data (i.e. non-binary data) to implement this functionality. Here, #PCDATA is text - for example, "dinner".
The question mark after "course" indicates that 0 or 1 pairs of <course> tags will appear in <meal>
within the marker.
Now let's look at the next line:
<!ELEMENT ingredients (item+)>
The plus sign here indicates that at least one pair of <item> tags should appear in <ingredients>
within the marker.
The last line we are interested in is:
<!ELEMENT item (#PCDATA, sub_item*)>
I put sub_item* as a security measure. In addition to asking for the text of each item, I wish to count the amount of content for each item. The asterisk indicates the number of sub-items that can be included in the <item> tag. I don't need any sub-items for the Chocolate Chip Bars recipe, but that is useful when the ingredients are complex.
Now let's put this together and see what we get.
Complete Example of DTD Below is a complete example. I added another recipe to the file and added
The DTD is annotated. You can notice that I used sub items in the second recipe.
<?xml version="1.0"?>
<!--This starts the DTD. The first four lines address document structure-->
<!DOCTYPE list ][
<!ELEMENT recipe (recipe_name, author, meal, ingredients,directions)>
<!ELEMENT ingredients (item+)>
<!ELEMENT meal (#PCDATA, course?)>
<!ELEMENT item (#PCDATA, sub_item*)>
<!--These are the remaining elements of the recipe tag -->
<!ELEMENT recipe_name (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT directions (#PCDATA)>
<!--The remaining element of the meal tag -->
<!ELEMENT course (#PCDATA)>
<!--The remaining element of the item tag -->
<!ELEMENT sub_item (#PCDATA)>
]>
<?xml version="1.0"?>
<list>
<recipe>
<author>Carol Schmidt</author>
<recipe_name>Chocolate Chip Bars</recipe_name>
<meal>Dinner
<course>Dessert</course>
</meal>
<ingredients>
<item>2/3 C butter</item>
<item>2 C brown sugar</item>
<item>1 tsp vanilla</item>
<item>1 3/4 C unsifted all-purpose flour</item>
<item>1 1/2 tsp baking powder</item>
<item>1/2 tsp salt</item>
<item>3 eggs</item>
<item>1/2 C chopped nuts</item>
<item>2 cups (12-oz pkg.) semi-sweetchoc. chips</item>
</ingredients>
<directions>
Preheat oven to 350 degrees. Melt butter;
combinewith brown sugar and vanilla in large mixing bowl.
Set aside to cool. Combine flour, baking powder, andsalt;
set aside.Add eggs to cooled sugar mixture; beat well.
Stir in reserved dry ingredients, nuts, and chips.
Spread in greased 13-by-9-inch pan.
Bake for 25 to 30 minutes until golden brown; cool.
Cut into squares.
</directions>
</recipe>
<recipe>
<recipe_name>Pasta with tomato Sauce</recipe_name>
<meal>Dinner
<course>Entree</course>
</meal>
<ingredients>
<item>1 lb spaghetti</item>
<item>1 16-oz can diced tomatoes</item>
<item>4 cloves garlic</item>
<item>1 diced onion</item>
<item>Italian seasoning
<sub_item>oregano</sub_item>
<sub_item>basil</sub_item>
<sub_item>crushed red pepper</sub_item>
</item>
</ingredients>
<directions>
Boil pasta. Sauté garlic and onion.
Add tomatoes.Serve hot.
</directions>
</recipe>
</list>
Now that there is a DTD, the document will be checked to see if it conforms to the restrictions set by the DTD. In other words, we want to ensure the validity of the document.
To achieve this, we need another tool: a validity analyzer. Microsoft's MSXML, a Java-based program, is easy to use and works well. The above document was checked by this program and no errors were found. But if I check a
Recipes that do not contain an item in the ingredient tag will return the following message:
ingredients is not complete. Expected elements [item].