In Java, there are two ways to natively parse XML documents, namely: Dom parsing and Sax parsing.
Dom parsing function is powerful and can be added, deleted, modified and checked. During operation, the xml document will be read into the memory in the form of a document object, so it is suitable for small documents.
Sax parsing reads the content line by line and element by element from beginning to end. It is more inconvenient to modify, but it is suitable for large read-only documents.
This article mainly explains Sax parsing, and the rest will be placed later.
Sax uses an event-driven approach to parse documents. To put it simply, it is like watching a movie in a cinema. You can watch it from beginning to end without going back (Dom can read it back and forth).
In the process of watching a movie, every time you encounter a plot, a tear, or a shoulder-to-shoulder encounter, you will mobilize your brain and nerves to receive or process this information.
Similarly, during the parsing process of Sax, reading the beginning and end of the document, and the beginning and end of the element will trigger some callback methods. You can perform corresponding event processing in these callback methods.
The four methods are: startDocument(), endDocument(), startElement(), endElement
In addition, it is not enough to read the node. We also need the characters() method to carefully process the content contained in the element.
By gathering these callback methods, a class is formed, which is the trigger we need.
Generally, the document is read from the Main method, but the document is processed in the trigger. This is the so-called event-driven parsing method.
As shown above, in the trigger, it first starts to read the document, and then starts to parse the elements one by one. The content of each element will be returned to the characters() method.
Then end the element reading. After all elements have been read, end the document parsing.
Now we start to create the trigger class. To create this class, we first need to inherit DefaultHandler
Create SaxHandler and override the corresponding method:
public class SaxHandler extends DefaultHandler {
/* This method has three parameters
arg0 is the character array returned, which contains the element content
arg1 and arg2 are the starting and ending positions of the array respectively*/
@Override
public void characters(char[] arg0, int arg1, int arg2) throws SAXException {
String content = new String(arg0, arg1, arg2);
System.out.println(content);
super.characters(arg0, arg1, arg2);
}
@Override
public void endDocument() throws SAXException {
System.out.println("/n...End parsing document...");
super.endDocument();
}
/* arg0 is the namespace
arg1 is the label containing the namespace, or empty if there is no namespace
arg2 is a label without a namespace*/
@Override
public void endElement(String arg0, String arg1, String arg2)
throws SAXException {
System.out.println("End parsing element" + arg2);
super.endElement(arg0, arg1, arg2);
}
@Override
public void startDocument() throws SAXException {
System.out.println("…………Start parsing the document………/n");
super.startDocument();
}
/*arg0 is the namespace
arg1 is the label containing the namespace, or empty if there is no namespace
arg2 is the label without namespace
arg3 is obviously a collection of attributes*/
@Override
public void startElement(String arg0, String arg1, String arg2,
Attributes arg3) throws SAXException {
System.out.println("Start parsing elements" + arg2);
if (arg3 != null) {
for (int i = 0; i < arg3.getLength(); i++) {
// getQName() is to get the attribute name,
System.out.print(arg3.getQName(i) + "=/"" + arg3.getValue(i) + "/"");
}
}
System.out.print(arg2 + ":");
super.startElement(arg0, arg1, arg2, arg3);
}
}
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
public class TestDemo {
public static void main(String[] args) throws Exception {
// 1. Instantiate the SAXParserFactory object
SAXParserFactory factory = SAXParserFactory.newInstance();
// 2. Create a parser
SAXParser parser = factory.newSAXParser();
// 3. Obtain the document that needs to be parsed, generate a parser, and finally parse the document
File f = new File("books.xml");
SaxHandler dh = new SaxHandler();
parser.parse(f, dh);
}
}
Start parsing the element books
books:
Start parsing the element book
id="001"book:
Start parsing the element title
title:Harry Potter
End parsing element title
Start parsing the element author
author:J K. Rowling
End parsing element author
End parsing element book
Start parsing the element book
id="002"book:
Start parsing the element title
title:Learning XML
End parsing element title
Start parsing the element author
author:Erik T. Ray
End parsing element author
End parsing element book
End parsing element books
…………End parsing document…………
In order to execute this process more clearly, we can also rewrite SaxHandler to restore the original xml document
Overridden SaxHandler class:
public class SaxHandler extends DefaultHandler {
@Override
public void characters(char[] arg0, int arg1, int arg2) throws SAXException {
System.out.print(new String(arg0, arg1, arg2));
super.characters(arg0, arg1, arg2);
}
@Override
public void endDocument() throws SAXException {
System.out.println("/n ends parsing");
super.endDocument();
}
@Override
public void endElement(String arg0, String arg1, String arg2)
throws SAXException {
System.out.print("</");
System.out.print(arg2);
System.out.print(">");
super.endElement(arg0, arg1, arg2);
}
@Override
public void startDocument() throws SAXException {
System.out.println("Start parsing");
String s = "<?xml version=/"1.0/" encoding=/"UTF-8/"?>";
System.out.println(s);
super.startDocument();
}
@Override
public void startElement(String arg0, String arg1, String arg2,
Attributes arg3) throws SAXException {
System.out.print("<");
System.out.print(arg2);
if (arg3 != null) {
for (int i = 0; i < arg3.getLength(); i++) {
System.out.print(" " + arg3.getQName(i) + "=/"" + arg3.getValue(i) + "/"");
}
}
System.out.print(">");
super.startElement(arg0, arg1, arg2, arg3);
}
}
It looks much better now, and restoring it better illustrates its parsing process.