1 4 ways to parse XML files
There are four classic methods for parsing XML files. There are two basic parsing methods, one is called SAX and the other is called DOM. SAX is based on event stream parsing, and DOM is based on XML document tree structure parsing. On this basis, in order to reduce the amount of coding for DOM and SAX, JDOM emerged. Its advantage is that the 20-80 principle (Pareto principle) greatly reduces the amount of code. Under normal circumstances, JDOM is used to meet the simple functions to be implemented, such as parsing, creation, etc. But at the bottom level, JDOM still uses SAX (most commonly used), DOM, and Xanan documents. The other one is DOM4J, which is a very, very excellent Java XML API with excellent performance, powerful functions and extreme ease of use. It is also an open source software. Nowadays, you can see that more and more Java software is using DOM4J to read and write XML. It is particularly worth mentioning that even Sun's JAXM is also using DOM4J. For the specific use of the four methods, search Baidu and there will be many detailed introductions.
2 A brief introduction to XPath
XPath is a language for finding information in XML documents. XPath is used to navigate through elements and attributes in XML documents, and to traverse elements and attributes. XPath is a major element of the W3C XSLT standard, and both XQuery and XPointer are built on XPath expressions. Therefore, an understanding of XPath is the basis for many advanced XML applications. XPath is very similar to the SQL language for database operations, or JQuery, which makes it easy for developers to grab what they need in the document. DOM4J also supports the use of XPath.
3 DOM4J using XPath
When DOM4J uses XPath to parse XML documents, you first need to reference two JAR packages in the project:
dom4j-1.6.1.jar: DOM4J software package, download address http://sourceforge.net/projects/dom4j/;
jaxen-xx.xx.jar: Usually if this package is not added, an exception will be thrown (java.lang.NoClassDefFoundError: org/jaxen/JaxenException). The download address is http://www.jaxen.org/releases.html.
3.1 Interference of namespace
When processing xml files converted from excel files or other format files, we often encounter situations where no results can be obtained through XPath parsing. This situation is usually caused by the existence of namespaces. Taking the XML file with the following content as an example, if you perform a simple search through XPath=" // Workbook/ Worksheet / Table / Row[1]/ Cell[1]/Data[1] ", usually no results will appear. This is caused by the namespace (xmlns="urn:schemas-microsoft-com:office:spreadsheet").
Copy the code code as follows:
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel " xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">
<Worksheet ss:Name="Sheet1">
<Table ss:ExpandedColumnCount="81" ss:ExpandedRowCount="687" x:FullColumns="1" x:FullRows="1" ss:DefaultColumnWidth="52.5" ss:DefaultRowHeight="15.5625">
<Row ss:AutoFitHeight="0">
<Cell>
<Data ss:Type="String">Code-typing rat</Data>
</Cell>
</Row>
<Row ss:AutoFitHeight="0">
<Cell>
<Data ss:Type="String">Sunny</Data>
</Cell>
</Row>
</Table>
</Worksheet>
</Workbook>
3.2 XPath parsing xml files with namespaces
The first method (read1() function): Use the local-name() and namespace-uri() that come with XPath syntax to specify the node name and namespace you want to use. Writing XPath expressions is more troublesome.
The second method (read2() function): Set the XPath namespace and use the setNamespaceURIs() function.
The third method (read3() function): Set the namespace of DocumentFactory(), the function used is setXPathNamespaceURIs(). Writing XPath expressions in methods two and three is relatively simple.
The fourth method (read4() function): The method is the same as the third method, but the XPath expression is different (specifically reflected in the program). It is mainly to test the difference in XPath expression, mainly referring to the completeness, and whether it will affect the retrieval efficiency. Influence.
(The above four methods all use DOM4J combined with XPath to parse XML files)
The fifth method (read5() function): Use DOM combined with XPath to parse the XML file, mainly to test the performance difference.
Nothing speaks louder than code! Code decisively!
Copy the code code as follows:
packageXPath;
importjava.io.IOException;
importjava.io.InputStream;
importjava.util.HashMap;
importjava.util.List;
importjava.util.Map;
importjavax.xml.parsers.DocumentBuilder;
importjavax.xml.parsers.DocumentBuilderFactory;
importjavax.xml.parsers.ParserConfigurationException;
importjavax.xml.xpath.XPathConstants;
importjavax.xml.xpath.XPathExpression;
importjavax.xml.xpath.XPathExpressionException;
importjavax.xml.xpath.XPathFactory;
importorg.dom4j.Document;
importorg.dom4j.DocumentException;
importorg.dom4j.Element;
importorg.dom4j.XPath;
importorg.dom4j.io.SAXReader;
importorg.w3c.dom.NodeList;
importorg.xml.sax.SAXException;
/**
*DOM4JDOMXMLXPath
*/
publicclassTestDom4jXpath{
publicstaticvoidmain(String[]args){
read1();
read2();
read3();
read4(); //read3() method is the same, but the XPath expression is different
read5();
}
publicstaticvoidread1(){
/*
*uselocal-name()andnamespace-uri()inXPath
*/
try{
longstartTime=System.currentTimeMillis();
SAXReaderreader=newSAXReader();
InputStreamin=TestDom4jXpath.class.getClassLoader().getResourceAsStream("XPath//XXX.xml");
Documentdoc=reader.read(in);
/*Stringxpath="//*[local-name()='Workbook'andnamespace-uri()='urn:schemas-microsoft-com:office:spreadsheet']"
+"/*[local-name()='Worksheet']"
+"/*[local-name()='Table']"
+"/*[local-name()='Row'][4]"
+"/*[local-name()='Cell'][3]"
+"/*[local-name()='Data'][1]";*/
Stringxpath="//*[local-name()='Row'][4]/*[local-name()='Cell'][3]/*[local-name()='Data'][ 1]";
System.err.println("======uselocal-name()andnamespace-uri()inXPath====");
System.err.println("XPath:"+xpath);
@SuppressWarnings("unchecked")
List<Element>list=doc.selectNodes(xpath);
for(Objecto:list){
Elemente=(Element)o;
Stringshow=e.getStringValue();
System.out.println("show="+show);
longendTime=System.currentTimeMillis();
System.out.println("Program running time:"+(endTime-startTime)+"ms");
}
}catch(DocumentExceptione){
e.printStackTrace();
}
}
publicstaticvoidread2(){
/*
*setxpathnamespace(setNamespaceURIs)
*/
try{
longstartTime=System.currentTimeMillis();
Mapmap=newHashMap();
map.put("Workbook","urn:schemas-microsoft-com:office:spreadsheet");
SAXReaderreader=newSAXReader();
InputStreamin=TestDom4jXpath.class.getClassLoader().getResourceAsStream("XPath//XXX.xml");
Documentdoc=reader.read(in);
Stringxpath="//Workbook:Row[4]/Workbook:Cell[3]/Workbook:Data[1]";
System.err.println("======usesetNamespaceURIs()tosetxpathnamespace====");
System.err.println("XPath:"+xpath);
XPathx=doc.createXPath(xpath);
x.setNamespaceURIs(map);
@SuppressWarnings("unchecked")
List<Element>list=x.selectNodes(doc);
for(Objecto:list){
Elemente=(Element)o;
Stringshow=e.getStringValue();
System.out.println("show="+show);
longendTime=System.currentTimeMillis();
System.out.println("Program running time:"+(endTime-startTime)+"ms");
}
}catch(DocumentExceptione){
e.printStackTrace();
}
}
publicstaticvoidread3(){
/*
*setDocumentFactory()namespace(setXPathNamespaceURIs)
*/
try{
longstartTime=System.currentTimeMillis();
Mapmap=newHashMap();
map.put("Workbook","urn:schemas-microsoft-com:office:spreadsheet");
SAXReaderreader=newSAXReader();
InputStreamin=TestDom4jXpath.class.getClassLoader().getResourceAsStream("XPath//XXX.xml");
reader.getDocumentFactory().setXPathNamespaceURIs(map);
Documentdoc=reader.read(in);
Stringxpath="//Workbook:Row[4]/Workbook:Cell[3]/Workbook:Data[1]";
System.err.println("======usesetXPathNamespaceURIs()tosetDocumentFactory()namespace====");
System.err.println("XPath:"+xpath);
@SuppressWarnings("unchecked")
List<Element>list=doc.selectNodes(xpath);
for(Objecto:list){
Elemente=(Element)o;
Stringshow=e.getStringValue();
System.out.println("show="+show);
longendTime=System.currentTimeMillis();
System.out.println("Program running time:"+(endTime-startTime)+"ms");
}
}catch(DocumentExceptione){
e.printStackTrace();
}
}
publicstaticvoidread4(){
/*
*Same as read3() method, but the XPath expression is different
*/
try{
longstartTime=System.currentTimeMillis();
Mapmap=newHashMap();
map.put("Workbook","urn:schemas-microsoft-com:office:spreadsheet");
SAXReaderreader=newSAXReader();
InputStreamin=TestDom4jXpath.class.getClassLoader().getResourceAsStream("XPath//XXX.xml");
reader.getDocumentFactory().setXPathNamespaceURIs(map);
Documentdoc=reader.read(in);
Stringxpath="//Workbook:Worksheet/Workbook:Table/Workbook:Row[4]/Workbook:Cell[3]/Workbook:Data[1]";
System.err.println("======usesetXPathNamespaceURIs()tosetDocumentFactory()namespace====");
System.err.println("XPath:"+xpath);
@SuppressWarnings("unchecked")
List<Element>list=doc.selectNodes(xpath);
for(Objecto:list){
Elemente=(Element)o;
Stringshow=e.getStringValue();
System.out.println("show="+show);
longendTime=System.currentTimeMillis();
System.out.println("Program running time:"+(endTime-startTime)+"ms");
}
}catch(DocumentExceptione){
e.printStackTrace();
}
}
publicstaticvoidread5(){
/*
*DOMandXPath
*/
try{
longstartTime=System.currentTimeMillis();
DocumentBuilderFactorydbf=DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(false);
DocumentBuilderbuilder=dbf.newDocumentBuilder();
InputStreamin=TestDom4jXpath.class.getClassLoader().getResourceAsStream("XPath//XXX.xml");
org.w3c.dom.Documentdoc=builder.parse(in);
XPathFactoryfactory=XPathFactory.newInstance();
javax.xml.xpath.XPathx=factory.newXPath();
//Select the name attribute of all class elements
Stringxpath="//Workbook/Worksheet/Table/Row[4]/Cell[3]/Data[1]";
System.err.println("======DomXPath====");
System.err.println("XPath:"+xpath);
XPathExpressionexpr=x.compile(xpath);
NodeListnodes=(NodeList)expr.evaluate(doc,XPathConstants.NODE);
for(inti=0;i<nodes.getLength();i++){
System.out.println("show="+nodes.item(i).getNodeValue());
longendTime=System.currentTimeMillis();
System.out.println("Program running time:"+(endTime-startTime)+"ms");
}
}catch(XPathExpressionExceptione){
e.printStackTrace();
}catch(ParserConfigurationExceptione){
e.printStackTrace();
}catch(SAXExceptione){
e.printStackTrace();
}catch(IOExceptione){
e.printStackTrace();
}
}
}
PS: Here are several online tools for xml operations for your reference:
Online XML/JSON conversion tool:
http://tools.VeVB.COm/code/xmljson
Online formatting XML/Online compressing XML:
http://tools.VeVB.COm/code/xmlformat
XML online compression/formatting tool:
http://tools.VeVB.COm/code/xml_format_compress