|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
Reading PDF documentsI have a source pdf file that I want to burst into several other pdf files.
I can burst it using PdfCopy... The problem is that I need to get the name of the bursted PDF files from the text in the source PDF file. The source PDF file pages look like the following: Account Statement Report CLIFTON JONES SFA STATION PO BOX 13009 NACOGDOCHES, TX 75962-0001 USA XXXX-XXXX-9999-1234 Posting Date: 08/07/2007 Thru 08/10/2007 Posting Date Transaction Date Description Location Country Original Amount 08/08/2007 08/07/2007 FINANCIAL MANAGEMENT A 999-999-9999, FL UNITED STATES In the above example the file name that I need is the 8 numbers after the "xxxx-xxxx-". In other words in the above example the file name would be "bsr99991234.pdf" How do I do this?? My code so far is the following: private void buttonOpen_mouseClicked(MouseEvent e) { Document document = null; //Create a file chooser final JFileChooser fc = new JFileChooser(); //In response to a button click: if (e.getSource() == buttonOpen) { int returnVal = fc.showOpenDialog(basFrame.this); if (returnVal == JFileChooser.APPROVE_OPTION) { File file = fc.getSelectedFile(); log.append("Opening: " + file.getName() + "." + newline); String outPath = null; try { String inPath = "C:\\Documents and Settings\\thurmanpatri\\Desktop\\"; outPath = "C:\\Documents and Settings\\thurmanpatri\\My Documents\\aaapdf\\"; PdfReader reader = new PdfReader(inPath + file.getName()); int endPage = reader.getNumberOfPages(); String fileName = null; for (int currentPage = 1; currentPage <= endPage; currentPage++) { document = new Document(reader.getPageSizeWithRotation(currentPage)); fileName = "bsr" + currentPage + ".pdf"; File newPdf = new File(outPath, fileName); FileOutputStream fos = new FileOutputStream(newPdf); PdfCopy copy = new PdfCopy(document, fos); document.open(); PdfImportedPage page = copy.getImportedPage(reader, currentPage); copy.addPage(page); log.append(document.toString() + newline); document.close(); fos.close(); } } catch (Exception ex) { ex.printStackTrace(); log.append("9 -- Failed"); } String vRetCode = doFileRename(outPath, log); log.append("0 -- Successfull"); } else { log.append("Open command cancelled by user." + newline); } } } This will burst the source PDF file into 152 PDF files of one page each. I need to name them properly. Pat, Patrick O. Thurman Stephen F. Austin State University Information Technology Services Data Base Administrator Phone: (936) 468-1074 Fax: (936) 468-1117 ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ iText-questions mailing list iText-questions@... https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://itext.ugent.be/itext-in-action/ |
|
|
Re: Reading PDF documentsYou need some other program to extract the text.
Paulo ----- Original Message ----- From: "Patrick O. Thurman" <pthurman@...> To: <itext-questions@...> Sent: Thursday, August 30, 2007 9:42 PM Subject: [iText-questions] Reading PDF documents >I have a source pdf file that I want to burst into several other pdf files. > I can burst it using PdfCopy... The problem is that I need to get the > name > of the bursted PDF files from the text in the source PDF file. > > > The source PDF file pages look like the following: > > Account Statement Report > CLIFTON JONES > SFA STATION > PO BOX 13009 > NACOGDOCHES, TX > 75962-0001 USA > XXXX-XXXX-9999-1234 > > Posting Date: 08/07/2007 Thru 08/10/2007 > > Posting Date Transaction Date Description Location Country Original Amount > > 08/08/2007 08/07/2007 FINANCIAL MANAGEMENT A 999-999-9999, FL UNITED > STATES > > In the above example the file name that I need is the 8 numbers after the > "xxxx-xxxx-". In other words in the above example the file name would be > "bsr99991234.pdf" > > How do I do this?? My code so far is the following: > > private void buttonOpen_mouseClicked(MouseEvent e) { > Document document = null; > //Create a file chooser > final JFileChooser fc = new JFileChooser(); > //In response to a button click: > if (e.getSource() == buttonOpen) { > int returnVal = fc.showOpenDialog(basFrame.this); > if (returnVal == JFileChooser.APPROVE_OPTION) { > File file = fc.getSelectedFile(); > log.append("Opening: " + file.getName() + "." + newline); > String outPath = null; > try { > String inPath = "C:\\Documents and > Settings\\thurmanpatri\\Desktop\\"; > outPath = "C:\\Documents and Settings\\thurmanpatri\\My > Documents\\aaapdf\\"; > PdfReader reader = new PdfReader(inPath + file.getName()); > int endPage = reader.getNumberOfPages(); > String fileName = null; > for (int currentPage = 1; currentPage <= endPage; > currentPage++) > { > document = new > Document(reader.getPageSizeWithRotation(currentPage)); > fileName = "bsr" + currentPage + ".pdf"; > File newPdf = new File(outPath, fileName); > FileOutputStream fos = new FileOutputStream(newPdf); > PdfCopy copy = new PdfCopy(document, fos); > document.open(); > PdfImportedPage page = copy.getImportedPage(reader, > currentPage); > copy.addPage(page); > log.append(document.toString() + newline); > document.close(); > fos.close(); > } > > } catch (Exception ex) { > ex.printStackTrace(); > log.append("9 -- Failed"); > } > String vRetCode = doFileRename(outPath, log); > log.append("0 -- Successfull"); > } else { > log.append("Open command cancelled by user." + newline); > } > } > } > > This will burst the source PDF file into 152 PDF files of one page each. > I > need to name them properly. > > Pat, ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ iText-questions mailing list iText-questions@... https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://itext.ugent.be/itext-in-action/ |
|
|
Re: Reading PDF documentsPaulo Soares wrote:
> You need some other program to extract the text. That's true; however I have the impression the PDF the OP is talking about is generated in an automated process. If a tool similar to iText is used, it might be possible to use a hack to find out the specific String. You could try something like this: byte[] streamBytes = reader.getPageContent(1); String contentStream = new String(streamBytes); int pos = contentStream.indexOf("XXXX-XXXX-") + 11; String name = contentStream.substring(pos, pos + 10)); Of course: if "XXXX-XXXX-" isn't a String recurring on every page, you'll have a hard time finding the rest of the String. Also, it heavily depends on the tool that was used to create the original PDF document whether or not this hack will work. Note that I generally don't advise workarounds like this, because they aren't always waterproof. br, Bruno ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ iText-questions mailing list iText-questions@... https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://itext.ugent.be/itext-in-action/ |
| Free embeddable forum powered by Nabble | Forum Help |