There are many controls for generating PDF, but not too much for parsing. PDF Toolkit can do it, but the first complex PDF tested reports an error and the Chinese characters are garbled. The version or method of use may be wrong.
I remembered that the pdfBox library under the name of Apache that was called using java was very useful, so I downloaded pdfBox and used Delphi to call pdfBox to parse the pdf text.
Environmental requirements: java running environment
pdfBox application package: pdfbox-app-2.0.6.jar
The DOS command line is used here to parse, and then the parsed results are called.
The first is to execute the DOS command:
procedure CheckResult(b: Boolean);begin if not b then raise Exception.Create(SysErrorMessage(GetLastError));end;function RunDOS(const CommandLine: string): string;var HRead, HWrite: THandle; StartInfo: TStartupInfo; ProceInfo: TProcessInformation; b: Boolean; sa: TSecurityAttributes; inS: THandleStream; sRet: TStrings;begin Result := ''; FillChar(sa, sizeof(sa), 0);//Set to allow inheritance, otherwise the output result cannot be obtained under NT and 2000 sa.nLength := sizeof(sa); sa .bInheritHandle := True; sa.lpSecurityDescriptor := nil; b := CreatePipe(HRead, HWrite, @sa, 0); CheckResult(b); FillChar(StartInfo, SizeOf(StartInfo), 0); StartInfo.cb := SizeOf(StartInfo); StartInfo.wShowWindow := SW_HIDE;//Use the specified handle as the standard input and output file handle, use the specified Display method StartInfo.dwFlags := STARTF_USESTDHANDLES or STARTF_USESHOWWINDOW; StartInfo.hStdError := HWrite; StartInfo.hStdInput := GetStdHandle(STD_INPUT_HANDLE); //HRead; StartInfo.hStdOutput := HWrite; b := CreateProcess(nil, //lpApplicationName: PChar PChar(CommandLine), //lpCommandLine: PChar nil, //lpProcessAttributes: PSecurityAttributes nil, //lpThreadAttributes: PSecurityAttributes True, //bInheritHandles: BOOL CREATE_NEW_CONSOLE, nil, nil, StartInfo, ProceInfo); CheckResult(b); WaitForSingleObject(ProceInfo.hProcess, INFINITE); inS := THandleStream.Create(HRead); if inS.Size > 0 then begin sRet := TStringList.Create; sRet.LoadFromStream(inS); Result := sRet.Text; sRet.Free; end; inS.Free; CloseHandle(HRead); CloseHandle(HWrite);end;
Then call display:
function TfrmPDFTool.GetPDFText(sFile: string): string;var cmd:string; pdfFilePath,pdfFileName,txtFileName:String;begin //java -jar pdfbox-app-2.0.6.jar ExtractText -encoding utf-8 e:// temp//test.pdf e://temp//testiii.txt pdfFilePath:=ExtractFilePath(sFile); pdfFileName:=ExtractFileName(sFile); txtFileName:=FAppPath+'Temp/'+pdfFileName+'.txt'; cmd:='java -jar '+FAppPath+'PDFBox/pdfbox-app-2.0. 6.jar ExtractText ' +' -encoding utf-8 '+sFile +' '+txtFileName; AddLog(cmd); Result:=RunDOS(cmd); AddLog(Result); memTxtFile.Lines.LoadFromFile(txtFileName,TUTF8Encoding.Create); FPDFText:=memTxtFile.Text; AddLog(FPDFText );end;
OK, you're done!
The above example of extracting PDF text with Delphi is all the content shared by the editor. I hope it can give you a reference, and I hope you will support Wulin.com.