Javaはxpathとdom4jを使用してxmlを解析します

著者：Eve Cole 更新時間：2025-01-12 12:12:01

1 XML ファイルを解析する 4 つの方法

XML ファイルを解析するには 4 つの古典的な方法があります。基本的な解析方法は 2 つあり、1 つは SAX と呼ばれ、もう 1 つは DOM と呼ばれます。 SAX はイベントストリームの解析に基づいており、DOM は XML ドキュメントのツリー構造の解析に基づいています。これを踏まえ、DOMやSAXのコーディング量を削減するために、20-80の法則（パレートの法則）によりコード量を大幅に削減できるという利点がJDOMとして登場しました。通常の状況では、JDOM は、解析や作成など、実装される単純な機能を満たすために使用されます。しかし、最下位レベルでは、JDOM は依然として SAX (最も一般的に使用されている)、DOM、および Xanan ドキュメントを使用します。もう 1 つは DOM4J です。これは、優れたパフォーマンス、強力な機能、そして非常に使いやすい、非常に優れた Java XML API です。また、オープンソースソフトウェアです。最近では、XML の読み書きに DOM4J を使用する Java ソフトウェアが増えています。特に、Sun の JAXM も DOM4J を使用していることは注目に値します。 4 つの方法の具体的な使用方法については、Baidu を検索すると、詳細な紹介がたくさんあります。

2 XPath の簡単な紹介

XPath は、XML ドキュメント内の情報を検索するための言語です。 XPath は、XML ドキュメント内の要素と属性間を移動し、要素と属性を横断するために使用されます。 XPath は W3C XSLT 標準の主要な要素であり、XQuery と XPointer は両方とも XPath 式に基づいて構築されています。したがって、XPath の理解は、多くの高度な XML アプリケーションの基礎となります。 XPath はデータベース操作用の SQL 言語 (JQuery) に非常に似ており、開発者がドキュメント内で必要なものを簡単に取得できるようにします。 DOM4J は XPath の使用もサポートしています。

3 XPath を使用する DOM4J

DOM4J が XPath を使用して XML ドキュメントを解析する場合、最初にプロジェクト内の 2 つの JAR パッケージを参照する必要があります。

dom4j-1.6.1.jar: DOM4J ソフトウェアパッケージ、ダウンロードアドレス http://sourceforge.net/projects/dom4j/;

jaxen-xx.xx.jar: 通常、このパッケージが追加されていない場合、例外がスローされます (java.lang.NoClassDefFoundError: org/jaxen/JaxenException)。ダウンロードアドレスは http://www.jaxen.org/releases です。 .html。

3.1 名前空間の干渉

Excel ファイルやその他の形式のファイルから変換された XML ファイルを処理する場合、XPath 解析で結果が得られない状況がよく発生します。この状況は通常、名前空間の存在によって引き起こされます。次の内容を含む XML ファイルを例にとると、XPath=" // Workbook/ Worksheet / Table / Row[1]/ Cell[1]/Data[1] " を通じて単純な検索を実行すると、通常は結果が得られません。現れる。これは、名前空間 (xmlns="urn:schemas-microsoft-com:office:spreadsheet") が原因で発生します。

次のようにコードをコピーします。
<ワークブック xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel " xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">
<ワークシート ss:Name="Sheet1">
<テーブル ss:ExpandedColumnCount="81" ss:ExpandedRowCount="687" x:FullColumns="1" x:FullRows="1" ss:DefaultColumnWidth="52.5" ss:DefaultRowHeight="15.5625">
<行 ss:AutoFitHeight="0">
<セル>
<Data ss:Type="String">コード入力ネズミ</Data>
</セル>
</行>
<行 ss:AutoFitHeight="0">
<セル>
<Data ss:Type="String">晴れ</Data>
</セル>
</行>
</テーブル>
</ワークシート>
</ワークブック>

3.2 名前空間を含む XML ファイルを XPath で解析する

最初の方法 (read1() 関数): XPath 構文に付属する local-name() および namespace-uri() を使用して、使用するノード名と名前空間を指定します。 XPath 式を記述するのはさらに面倒です。

2 番目の方法 (read2() 関数): XPath 名前空間を設定し、setNamespaceURIs() 関数を使用します。

3 番目の方法 (read3() 関数): DocumentFactory() の名前空間を設定します。使用される関数は setXPathNamespaceURIs() です。方法 2 と 3 で XPath 式を記述するのは比較的簡単です。

4 番目の方法 (read4() 関数): 方法は 3 番目の方法と同じですが、XPath 式が異なります (具体的にはプログラムに反映されます)。これは主に XPath 式の違いをテストするためのもので、主に完全性を参照します。、および検索効率に影響を与えるかどうか。

(上記の 4 つの方法はすべて、XPath と組み合わせて DOM4J を使用して XML ファイルを解析します)

5 番目の方法 (read5() 関数): DOM を XPath と組み合わせて使用し、主にパフォーマンスの違いをテストするために XML ファイルを解析します。

コードほど雄弁なものはありません。思い切ってコーディングしてください！

次のようにコードをコピーします。
パッケージXPath;
importjava.io.IOException;
importjava.io.InputStream;
importjava.util.HashMap;
importjava.util.List;
importjava.util.Map;

importjavax.xml.parsers.DocumentBuilder;
importjavax.xml.parsers.DocumentBuilderFactory;
importjavax.xml.parsers.ParserConfigurationException;
importjavax.xml.xpath.XPathConstants;
importjavax.xml.xpath.XPathExpression;
importjavax.xml.xpath.XPathExpressionException;
importjavax.xml.xpath.XPathFactory;

importorg.dom4j.Document;
importorg.dom4j.DocumentException;
importorg.dom4j.Element;
importorg.dom4j.XPath;
importorg.dom4j.io.SAXReader;
importorg.w3c.dom.NodeList;
importorg.xml.sax.SAXException;

/**
*DOM4JDOMXMLXパス
*/
publicclassTestDom4jXpath{
publicstaticvoidmain(String[]args){
read1();
read2();
read3();
read4(); //read3() メソッドは同じですが、XPath 式が異なります
read5();
}

publicstaticvoidread1(){
/*
*XPath で local-name() と namespace-uri() を使用する
*/
試す{
longstartTime=System.currentTimeMillis();
SAXReaderreader=newSAXReader();
InputStreamin=TestDom4jXpath.class.getClassLoader().getResourceAsStream("XPath//XXX.xml");
Documentdoc=reader.read(in);
/*Stringxpath="//*[local-name()='Workbook'andnamespace-uri()='urn:schemas-microsoft-com:office:spreadsheet']"
+"/*[ローカル名()='ワークシート']"
+"/*[ローカル名()='テーブル']"
+"/*[ローカル名()='行'][4]"
+"/*[ローカル名()='セル'][3]"
+"/*[ローカル名()='データ'][1]";*/
Stringxpath="//*[ローカル名()='行'][4]/*[ローカル名()='セル'][3]/*[ローカル名()='データ'][ 1]";
System.err.println("======XPath でローカル名 () と名前空間 -uri() を使用=======");
System.err.println("XPath："+xpath);
@SuppressWarnings("未チェック")
List<Element>list=doc.selectNodes(xpath);
for(オブジェクト:リスト){
Elemente=(要素)o;
Stringshow=e.getStringValue();
System.out.println("show="+show);
longendTime=System.currentTimeMillis();
System.out.println("プログラムの実行時間:"+(endTime-startTime)+"ms");
}
}catch(DocumentException){
e.printStackTrace();
}
}

publicstaticvoidread2(){
/*
*setxpathnamespace(setNamespaceURI)
*/
試す{
longstartTime=System.currentTimeMillis();
Mapmap=newHashMap();
map.put("ワークブック","urn:schemas-microsoft-com:office:spreadsheet");
SAXReaderreader=newSAXReader();
InputStreamin=TestDom4jXpath.class.getClassLoader().getResourceAsStream("XPath//XXX.xml");
Documentdoc=reader.read(in);
Stringxpath="//ワークブック:行[4]/ワークブック:セル[3]/ワークブック:データ[1]";
System.err.println("======usesetNamespaceURIs()tosetxpathnamespace=====);
System.err.println("XPath："+xpath);
XPathx=doc.createXPath(xpath);
x.setNamespaceURIs(マップ);
@SuppressWarnings("未チェック")
List<Element>list=x.selectNodes(doc);
for(オブジェクト:リスト){
Elemente=(要素)o;
Stringshow=e.getStringValue();
System.out.println("show="+show);
longendTime=System.currentTimeMillis();
System.out.println("プログラムの実行時間:"+(endTime-startTime)+"ms");
}
}catch(DocumentException){
e.printStackTrace();
}
}

publicstaticvoidread3(){
/*
*setDocumentFactory()namespace(setXPathNamespaceURIs)
*/
試す{
longstartTime=System.currentTimeMillis();
Mapmap=newHashMap();
map.put("ワークブック","urn:schemas-microsoft-com:office:spreadsheet");
SAXReaderreader=newSAXReader();
InputStreamin=TestDom4jXpath.class.getClassLoader().getResourceAsStream("XPath//XXX.xml");
Reader.getDocumentFactory().setXPathNamespaceURIs(map);
Documentdoc=reader.read(in);
Stringxpath="//ワークブック:行[4]/ワークブック:セル[3]/ワークブック:データ[1]";
System.err.println("======usesetXPathNamespaceURIs()tosetDocumentFactory()namespace====");
System.err.println("XPath："+xpath);
@SuppressWarnings("未チェック")
List<Element>list=doc.selectNodes(xpath);
for(オブジェクト:リスト){
Elemente=(要素)o;
Stringshow=e.getStringValue();
System.out.println("show="+show);
longendTime=System.currentTimeMillis();
System.out.println("プログラムの実行時間:"+(endTime-startTime)+"ms");
}
}catch(DocumentException){
e.printStackTrace();
}
}

publicstaticvoidread4(){
/*
※read3()メソッドと同じですが、XPath式が異なります
*/
試す{
longstartTime=System.currentTimeMillis();
Mapmap=newHashMap();
map.put("ワークブック","urn:schemas-microsoft-com:office:spreadsheet");
SAXReaderreader=newSAXReader();
InputStreamin=TestDom4jXpath.class.getClassLoader().getResourceAsStream("XPath//XXX.xml");
Reader.getDocumentFactory().setXPathNamespaceURIs(map);
Documentdoc=reader.read(in);
Stringxpath="//ワークブック:ワークシート/ワークブック:テーブル/ワークブック:行[4]/ワークブック:セル[3]/ワークブック:データ[1]";
System.err.println("======usesetXPathNamespaceURIs()tosetDocumentFactory()namespace====");
System.err.println("XPath："+xpath);
@SuppressWarnings("未チェック")
List<Element>list=doc.selectNodes(xpath);
for(オブジェクト:リスト){
Elemente=(要素)o;
Stringshow=e.getStringValue();
System.out.println("show="+show);
longendTime=System.currentTimeMillis();
System.out.println("プログラムの実行時間:"+(endTime-startTime)+"ms");
}
}catch(DocumentException){
e.printStackTrace();
}
}

publicstaticvoidread5(){
/*
*DOMandXPath
*/
試す{
longstartTime=System.currentTimeMillis();
DocumentBuilderFactorydbf=DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(false);
DocumentBuilderbuilder=dbf.newDocumentBuilder();
InputStreamin=TestDom4jXpath.class.getClassLoader().getResourceAsStream("XPath//XXX.xml");
org.w3c.dom.Documentdoc=builder.parse(in);
XPathFactoryfactory=XPathFactory.newInstance();
javax.xml.xpath.XPathx=factory.newXPath();
//すべてのクラス要素の name 属性を選択します
Stringxpath="//ワークブック/ワークシート/テーブル/行[4]/セル[3]/データ[1]";
System.err.println("======DomXPath====);
System.err.println("XPath："+xpath);
XPathExpressionexpr=x.compile(xpath);
NodeListnodes=(NodeList)expr.evaluate(doc,XPathConstants.NODE);
for(inti=0;i<nodes.getLength();i++){
System.out.println("show="+nodes.item(i).getNodeValue());
longendTime=System.currentTimeMillis();
System.out.println("プログラムの実行時間:"+(endTime-startTime)+"ms");
}
}catch(XPathExpressionException){
e.printStackTrace();
}catch(ParserConfigurationExceptione){
e.printStackTrace();
}catch(SAXException){
e.printStackTrace();
}catch(IOException){
e.printStackTrace();
}
}
}

PS: 参考までに、XML 操作用のオンラインツールをいくつか紹介します。

オンライン XML/JSON 変換ツール:
http://tools.VeVB.COm/code/xmljson

XML のオンラインフォーマット/XML のオンライン圧縮:
http://tools.VeVB.COm/code/xmlformat

XMLオンライン圧縮/フォーマットツール:
http://tools.VeVB.COm/code/xml_format_compress