Example of method to parse XML files using DOM and SAX in Java

Author：Eve Cole Update Time：2025-02-10 14:00:04

dom4j introduction
dom4j's project address: http://sourceforge.net/projects/dom4j/?source=directory

dom4j is a simple open source library for handling XML, XPath and XSLT. It is based on the Java platform and uses Java's collection framework to fully integrate DOM, SAX and JAXP.

dom4j uses dom4j after downloading the dom4j project, unzip it and add its jar package (my current version is called dom4j-1.6.1.jar) to the class path.

(Properties->Java Build Path->Add External JARs...).

Then you can use the API it provides for programming.

Program Example 1
The first program uses Java code to generate an XML document, and the code is as follows:

 package com.example.xml.dom4j;import java.io.FileOutputStream;import java.io.FileWriter;import org.dom4j.Document;import org.dom4j.DocumentHel per;import org.dom4j.Element;import org.dom4j.io .OutputFormat;import org.dom4j.io.XMLWriter;/** * dom4j framework learns to use the dom4j framework to create an XML document and output and save * */public class Dom4JTest1{ public static void main(Strin g[] args) throws Exception { // The first method: create a document and create a root element // Create a document: Use a Helper class Document = DocumentHelper.createDocument(); // Create a root node and add it to the document Element root = DocumentHelper.createElement("st udent" ); document.setRootElement(root); // The second method: create a document and set the root element node of the document Element root2 = DocumentHelper.createElement("student"); Document document2 = DocumentHelp er.createDocument(root2); // Add Attribute root2.addAttribute("name", "zhangsan"); // Add child node: add, return this element Element helloElement = root2.addElement("hello"); Element worldElement = root2.add Element("world"); helloElement.setText("hello Text"); worldElement.setText("world text"); // Output // Output to console XMLWriter xmlWriter = new XMLWriter(); xmlWriter.writ e(document); // output to file/ / Format OutputFormat format = new OutputFormat(" ", true);// Set the indentation to 4 spaces and a new behavior is true XMLWriter xmlWriter2 = new XMLWriter( new FileOutputStream("stud ent.xml"), format); xmlWriter2 .write(document2); // Another way of output, remember to call the flush() method, otherwise blank XMLWriter will be displayed in the output file xmlWriter3 = new XMLWriter(new FileWriter("student2.xml"), format); xm lWriter3. write(document2); xmlWriter3.flush(); // close() method is also OK }}

Program Console output:

 <?xml version="1.0" encoding="UTF-8"?><student/>

A generated XML document:

 <?xml version="1.0" encoding="UTF-8"?><student name="zhangsan"> <hello>hello Text</hello> <world>world text</world></student>

Program Example 2
Program Example 2, read the XML document and analyze it, and output its contents.

First, the documents to be analyzed are as follows:

 <?xml version="1.0" encoding="UTF-8"?><students name="zhangsan"> <hello name="lisi">hello Text1</hello> <hello name="lisi2">hello Text2< : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : /hello> <hello name="lisi3">hello Text3</hello> <world name="wangwu">world text1</world> <world name="wangwu2">world text2</world> <world >world te xt3 </world></students> package com.example.xml.dom4j;import java.io.File;import java.util.Iterator;import java.util.List;import javax.xml.parsers .DocumentBuilder;import javax. xml.parsers.DocumentBuilderFactory;import org.dom4j.Document;import org.dom4j.Element;import org.dom4j.io.DOMReader;import org.dom4j.io.SAXRead er;/** * dom4j framework learning: Read and parse xml * * */public class Dom4JTest2{ public static void main(String[] args) throws Exception { SAXReader saxReader = new SAXReader(); Document document = saxReader.read(new File("students.xml")); // Get the root element Element root = document.getRootElement(); System.out.println("Root: " + root.getName()); // Get all child elements List<Element> childList = root.elements() ; System. out.println("total child count: " + childList.size()); // Get the child element of a specific name List<Element> childList2 = root.elements("hello"); System.out.println("hel lo children : " + childList2.size()); // Get the first child element with the specified name Element firstWorldElement = root.element("world"); // Output its attribute System.out.println("first World Attr : " + firstWorldElement.attribute(0).getName() + "=" + firstWorldElement.attributeValue("name")); System.out.println("Iteration output------------- -----------"); // Iterative output for (Iterator iter = root.elementIterator(); iter.hasNext();) { Element e = (Element) iter.next(); System. out.println(e.attributeValue("name")); } System.out.println("Use DOMReader------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); // Note that the full class name org.w3c.dom.Document should be used document2 = db.parse(new File("students.xml ")); DOMReader domReader = new DOMReader(); // Convert JAXP's Document to dom4j's Document Document3 = domReader.read(document2); Element rootElement = document3.get RootElement(); System.out.println("Root: " + rootElement .getName()); }}

After the code is run, the output is:

 Root: studentstotal child count: 6hello child: 3first World Attr: name=wangwu iterative output---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ader----- ------------------------Root: students

SAX parses XML
The following are the steps to implement entity parsing in SAX
//The following is an XMLReader to parse (I) Step 1: Create a new factory class SAXParserFactory, the code is as follows:
SAXParserFactory factory = SAXParserFactory.newInstance();
(2) Step 2: Let the factory class generate a SAX parser class SAXParser, the code is as follows:
SAXParser parser = factory.newSAXParser();
(III) Step 3: Get an XMLReader instance from SAXPsrser, the code is as follows:
XMLReader reader = parser.getXMLReader();
(4) Step 4: Register the handler you wrote in the XMLReader. Generally, the most important thing is the ContentHandler, the code is as follows:
reader.setContentHandler(this);
(V) Step 5: After turning an XML document or resource into an InputStream stream that Java can process, the parsing officially begins, and the code is as follows:
reader.parse(new InputSource(is));

//The following is to use SAXParser to parse (I) The first step: create a new factory class SAXParserFactory, the code is as follows:
SAXParserFactory factory = SAXParserFactory.newInstance();
(2) Step 2: Let the factory class generate a SAX parser class SAXParser, the code is as follows:
SAXParser parser = factory.newSAXParser();
(3) Step 3: After turning an XML document or resource into an InputStream stream that Java can process, the parsing officially begins, and the code is as follows:
parser.parse(is,this);
I guess everyone has seen ContentHandler. Let me explain in detail below. Before the parsing begins, you need to register a ContentHandler with XMLReader/SAXParser, which is equivalent to an event listener. Many methods are defined in ContentHandler.
//Set a locator object that can locate the location where the document content event occurs
public void setDocumentLocator(Locator locator)

// Used to handle document parsing start event
public void startDocument()throws SAXException

// Process the element start event, and you can obtain the uri, element name, attribute class table and other information of the namespace where the element is located from the parameters.
public void startElement(String namespacesURI, String localName, String qName, Attributes atts) throws SAXException

// Handle the element end event, and you can obtain the uri, element name and other information of the namespace where the element is located from the parameters.
public void endElement(String namespacesURI, String localName, String qName) throws SAXException

// Process the character content of the element, and you can get the content from the parameters
public void characters(char[] ch , int start , int length) throws SAXException
By the way, let’s introduce the methods in XMLReader.
//Register and handle XML document parsing event ContentHandler
public void setContentHandler(ContentHandler handler)

//Start parsing an XML document
public void parse(InputSorce input) throws SAXException

I've roughly finished speaking. Next, let's start explaining the analysis steps. Let's use the code from the previous chapter. First, we create a Person class to store user information.

 package com.example.demo; import java.io.Serializable; public class Person implements Serializable { /** * */ private static final long serialVersionUID = 1L; private String _id; private String _name; private String _age; public String get_id( ) { return _id; } public void set_id(String _id) { this._id = _id; } public String get_name() { return _name; } public void set_name(String _ name) { this._name = _name; } public String get_age( ) { return _age; } public void set_age(String _age) { this._age = _age; } }

Next we want to implement a ContentHandler to parse XML
Implementing a ContentHandler generally requires the following steps
1. Declare a class and inherit DefaultHandler. DefaultHandler is a base class, which simply implements a ContentHandler. We just need to rewrite the method inside.
2. Rewrite startDocument() and endDocument(). Generally, the initialization before formal parsing is placed in startDocument(), and the final work is placed in endDocument().
3. Rewrite startElement(). This function will be called when the XML parser encounters a tag in XML. Often, some data is operated in this function by judging the value of localName.
4. Rewrite the characters() method, which is a callback method. After the parser has executed startElement(), this method will be executed after parsing the content of the node, and the parameter ch[] is the content of the node.
5. Rewrite the endElement() method. This method corresponds to startElement(). After parsing a tag node, execute this method. After parsing a tag, call this process to restore and clear relevant information. First, create a new class and inherit the DefaultHandler and re-re Write the following methods

 public class SAX_parserXML extends DefaultHandler { /** * This event will be triggered when parsing the declaration of the xml file. You can do some initialization work* */ @Override public void startDocument() throws S AXException { // TODO Auto-generated method stub super.startDocument(); } /** * This event will be triggered when parsing the start tag of an element* */ @Override public void startElement(String uri, String localName, String qN ame, Attributes attributes) throws SAXException { // TODO Auto-generated method stub super.startElement(uri, localName, qName, attributes); } /** * This event will be triggered when reading a text element. * */ @Overr ide public void characters(char [] ch, int start, int length) throws SAXException { // TODO Auto-generated method stub super.characters(ch, start, length); } /** * This event will be triggered when the end tag is read * */ @Override public void endElement(String uri, String localName, String qName) throws SAXException { // TODO Auto-generated method stub super.endElement t(uri, localName, qName); } }

First, we create a list to save the parsed person data

 List<Person> persons;

but? Where to initialize? We can initialize it in startDocument(), because this event will be triggered when parsing the declaration of the xml file, so it is more appropriate to put it here

 /** * This event will be triggered when parsing the declaration of the xml file. You can do some initialization work* */ @Override public void startDocument() throws SAXException { // TODO Auto-generated method stub sup er.startDocument( ); // Initialize list persons = new ArrayList<Person>(); }

Next, we will start to analyze

 /** * This event will be triggered when parsing the start tag of an element* */ @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { // TODO Auto-generated method stub super.startElement(uri, localName, qName, attributes); // If you read that it is a person tag, start storing if (localName.equals("person")) { person = new Person(); person.set_i d(attributes.getValue ("id")); } curNode = localName; }

In the above code, localName represents the element name currently parsed

 //Step//1. Determine whether it is a person element//2. Create a new Person object//3. Get the id Add to Person object curNode to save the current element name. In characters, it will be used /** * This event is triggered when reading a text element. * */ @Override public void characters(char[] ch, int start, int length) throws SAXException { // TODO Auto-generated method stub sup er.characters(ch, start , length); if (person != null) { //Fetch out the value corresponding to the current element String txt = new String(ch, start, length); //Judge whether the element is name if (curNode.equals("name") ) { //Add the retrieved value to the person object person.set_name(txt); } else if (curNode.equals("age")) { person.set_age(txt); } } }

Next is what you need to do when the tag ends

 /** * This event will be triggered when the end tag is read* */ @Override public void endElement(String uri, String localName, String qName) throws SAXException { // TODO Auto-gen erupted method stub super.endElement(uri , localName, qName); // If so and person is not empty, add to list if (localName.equals("person") && person != null) { persons.add(person); person = null; } curNode = : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ""; }

The analysis is probably the process
1. The startElement method will be called at the beginning of an element.
2.The characters method will be called next, which can be used to obtain the value of the element.
3. When an element ends, the endElement method will be called after the parsing is completed. We need to write a method to obtain the list saved after parsing.

 public List<Person> ReadXML(InputStream is) { SAXParserFactory factory = SAXParserFactory.newInstance(); try { SAXParser parser = factory.newSAXParser(); / /The first method // parser.parse(is, this); // The second method XMLReader reader = parser.getXMLReader(); reader.setContentHandler(this); reader.parse(new InputSource(is)); } catch (Exception e) { // TODO: han dle exception e.printStackTrace(); } return persons; }

The above code does not explain. Just pass the inputStream object in and parse the content. After reading the code, I will give the complete code.

 package com.example.demo.Utils; import java.io.InputStream; import java.util.ArrayList; import java.util.List; import javax.xml.parsers.SAXParser ; import javax.xml.parsers.SAXParserFactory; import org .xml.sax.Attributes; import org.xml.sax.InputSource; import org.xml.sax.SAXException; import org.xml.sax.XMLReader; import org.xml.sax.hel pers.DefaultHandler; import com.example. demo.Person; public class SAX_parserXML extends DefaultHandler { List<Person> persons; Person person; // Current node String curNode; public List<Person> ReadXML(Inp utStream is) { SAXParserFactory factory = SAXParserFactory.newInstance(); try { SAXParser parser = factory.newSAXParser(); // The first method// parser.parse(is, this); // The second method XMLReader reader = parser.getXMLReader(); reader.setContentHandler(this); reader.par se( new InputSource(is)); } catch (Exception e) { // TODO: handle exception e.printStackTrace(); } return persons; } /** * This event will be triggered when parsing the declaration of the xml file , You can do some initialization work* */ @Override public void startDocument() throws SAXException { // TODO Auto-generated method stub super.startDocument(); // Initialize list persons = new ArrayList<Person>(); } /* * * This event will be triggered when parsing the start tag of an element* */ @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SA XException { // TODO Auto-generated method stub super .startElement(uri, localName, qName, attributes); // If you read that it is a person tag, start storing if (localName.equals("person")) { person = new Person(); person.set_id(attrib utes.getValue(" id")); } curNode = localName; } /** * This event will be triggered when reading a text element. * */ @Override public void characters(char[] ch, int start, int length) throws SAXExc episode { // TODO Auto-generated method stub super.characters(ch, start, length); if (person != null) { // Take out the value corresponding to the current element String txt = new String(ch, start, length); // Determine whether the element is name if (curNode.equals("name")) { // Add the retrieved value to the person object person.set_name(txt); } else if (curNode.equals("age")) { person. set_age(txt); } } } } /** * This event will be triggered when the end tag is read* */ @Override public void endElement(String uri, String localName, String qName) throws SAXExc episode { // TODO Auto- generated method stub super.endElement(uri, localName, qName); // If it ends with person and person is not empty, add to list if (localName.equals("person") && person != null) { p ersons.add( person); person = null; } curNode = ""; } }

Write a method to call this class

 List<Person> persons = new SAX_parserXML().ReadXML(is); StringBuffer buffer = new StringBuffer(); for (int i = 0; i < persons.size(); i++) { Person person =persons.get(i ); buffer.append("id:" + person.get_id() + " "); buffer.append("name:" + person.get_name() + " "); buffer.append("age:" + person .get_age() + "/n"); } Toast.makeText(activity, buffer, Toast.LENGTH_LONG).show();

If you see the following interface description, the analysis has been successful~

summary:

DOM (File Object Model) parsing: The parser reads the entire document, then builds a memory-resident tree structure, and the code can operate this tree structure according to the DOM interface.

Advantages: The entire document is read into memory, convenient to operate: supports various functions such as modification, deletion, and reproduction and arrangement.

Disadvantages: Read the entire document into memory, retaining too many unnecessary nodes, wasting memory and space.

Use occasions: Once the document is read, the document needs to be operated multiple times, and when the hardware resources are sufficient (memory, CPU).

In order to solve the problems existing in DOM parsing, SAX parsing occurs. Its characteristics are:

Advantages: No need to implement the entire document, which takes up less resources. Especially in embedded environments, such as Android, it is highly recommended to use SAX parsing.

Disadvantages: Unlike DOM parsing, the data is not persistent. If the data is not saved after the event, the data will be lost.

Use occasion: The machine has performance limitations