Apache Poi is the open source function library of the Apache Software Foundation. POI provides the function of the API to Java program to the Microsoft Office format.
1. Read the jar package required by Word 2003 and Word 2007
Reading the 2003 version (.DOC) WoD files are relatively simple. You only need to POI-3.5-Beta6-20090622.jar and POI-SCratchPad-3.5-Beta6-20090622.jar. (.docx) It was more troublesome. The trouble I said was not that when we wrote the code, it was more troublesome to import. There were more JAR bags that were imported. As follows, there were more than 7:
1. OpenXML4J-BIN-BETA.JAR
2. POI-3.5-Beta6-20090622.jar
3. POI-OOXML-3.5-Beta6-20090622.jar
4.DOM4J-1.6.1.jar
5.
6. OOXML-SCHEMAS -.0.JAR
7. xmlbeans-2.3.0.jar
Among them, 4-7 is the JAR package relied on Poi-OOXML-3.5-Beta6-20090622.jar (you can be found in the OOXML-LIB directory in POI-BIN-3.5-Beta6-20090622.gz).
2. Change symbol
Hard switch: change in the file, if the "Enter" is used in the keyboard.
Soft switch: The number of characters in the file is limited. When the number of characters exceeds a certain value, it will automatically cut to the downward display.
For procedures, hard exchange lines are recognizable and determined changes. Soft switching is related to the size and indentation of fonts.
3. Precautions for reading
It is worth noting that: POI does not read the picture information in the word file in Word file; there is also the 2007 version of Word (.docx). If there is a table in the Word file, the data in all tables will be read out. At the end of the string.
4. Read Word text content code
Import java.io.file; Import java.io.fileInputStream; Import java.Io.InputStream; Import Org.apache.poixmlDocument; Import IXMLTEXTEXTRACTOR; Import org.apache.poi.hwpf.extractor. WordExtractor; Import org.apache.poi.openxml4j.opcpackage; Import org.apache.poi.xwpf.extRactor.xwpfordextRactor; {PUBLIC Static Void Main (String [] ARGS) {Try {InputStream is = New FileInputStream (New File ("2003.doc"); WordExtractor ex = New WordExtractor (IS); String Text2003 = ex.Gettext (); System.out.println (Text2003); OPCPACKAGE OPCP acKage = PoixMLDOCUMENT.OPACKAGE ("2007.docx "); PoixmlTextExtractor extractor = New XWPFWordExtractor (OPCPACKAGE); String Text2007 = extractor.getText (); } Catch (Exception E) {e.printstacktrace ();}}}