Saturday, June 22, 2013

PDF To Text Converter

The PDFToTextConverter program can be used to convert a PDF file in to a text file. When the program runs you can selected one or many PDF files for converting. Then wait for a while. The amount of time waiting depends mainly on the number of PDF files you selected and each file size. It is noted that a PDF file that does not have text can not be converted.


PDF To Text Converter

PDFToTextConverter source code:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.BufferedWriter;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import java.awt.Desktop;
import javax.swing.filechooser.FileNameExtensionFilter;
import javax.swing.JFileChooser;

public class PDFToTextConverter{
public static void main(String[] args){
selectPDFFiles();
}


//allow pdf files selection for converting
public static void selectPDFFiles(){

JFileChooser chooser = new JFileChooser();
    FileNameExtensionFilter filter = new FileNameExtensionFilter("PDF","pdf");
    chooser.setFileFilter(filter);
    chooser.setMultiSelectionEnabled(true);
    int returnVal = chooser.showOpenDialog(null);
    if(returnVal == JFileChooser.APPROVE_OPTION) {
File[] Files=chooser.getSelectedFiles();
System.out.println("Please wait...");
            for( int i=0;i<Files.length;i++){    
            convertPDFToText(Files[i].toString(),"textfrompdf"+i+".txt");
            }
System.out.println("Conversion complete");
            }

     
}

public static void convertPDFToText(String src,String desc){
try{
//create file writer
FileWriter fw=new FileWriter(desc);
//create buffered writer
BufferedWriter bw=new BufferedWriter(fw);
//create pdf reader
PdfReader pr=new PdfReader(src);
//get the number of pages in the document
int pNum=pr.getNumberOfPages();
//extract text from each page and write it to the output text file
for(int page=1;page<=pNum;page++){
String text=PdfTextExtractor.getTextFromPage(pr, page);
bw.write(text);
bw.newLine();

}
bw.flush();
bw.close();



}catch(Exception e){e.printStackTrace();}

}

}

Converting a pdf document to text file is simple. Firstly, you need to use the PdfReader class (in iText library) to get all pages of the pdf document. One you have the PdfReader object, you can extract the text from the pdf document by using the getTextFromPage(PdfReader pdfreader, int page_num) method of the PdfTextExtractor class.  This method extract the text from each page of the PdfReader object. While getting the text, you will use the BufferedWriter class to write the text out to a destination file.



PDF To Word Converter, convert PDF To Word

30 comments:

  1. I have used Aspose.PDF for .NET API to convert my text files to pdf and it has produced very good result exactly what i wanted and you can even convert pdf files to text also even if your files are large in size. Try this API, i hope you will like it also.

    http://www.aspose.com/java/pdf-component.aspx

    ReplyDelete
    Replies
    1. is it possible to read hindi text also...
      as it is.

      Delete
    2. Is it possible if the file convert from .DCM file to PDF and then to .Txt delimiter

      Delete
  2. Very useful information for beginners like me.Thank you very much..

    ReplyDelete
  3. Hello, if interested, for pdf conversion to text format, you can also check out this free toolkit with more options available. Just upload your needed pdf doc and convert it in a plain text. If you want to try and see how it works, see here: http://kitpdf.com/pdf_to_text/ . Maybe it's useful for you.

    ReplyDelete
  4. Replies
    1. Thx! It was useful as first approach. it is easy to know convert multi page pdf to single jpg. This Website says that convert multi page pdf to single jpg pages is also possible http://www.rasteredge.com/online/pdf/convert-pdf-to-jpeg/.

      Delete
  5. I read your post and need to thank you for sharing such pleasant lines. Buzz Applications is a combination of multiple services

    ReplyDelete
  6. thank u for code but i want to know where this file is being stored after conversion.....

    ReplyDelete
  7. Documento vazio, foi o resultado...

    ReplyDelete
  8. Documento vazio, foi o resultado final...

    ReplyDelete
  9. Empty Document, the result was final ...

    ReplyDelete
  10. It is showing error at com at import line

    ReplyDelete
  11. Irrespective of code also provide details of jar so that it become easy for us.

    ReplyDelete
  12. NaturalReader Free is an exceptionally helpful program to change over composed text (MSWord, Webpage, PDF records, messages) in sound documents (MP3, WAV or CD) or in oral speech. text to speech

    ReplyDelete
  13. Thanks for sharing this useful article, keep posting more like this.
    Ecommerce Web Development
    Online Shop Builder

    ReplyDelete
  14. where is the destination text file?

    ReplyDelete
  15. This comment has been removed by the author.

    ReplyDelete
  16. Can you please share a code to convert .msg file to .txt file?

    ReplyDelete
  17. Can you please share a code to convert .msg file to .txt file?

    ReplyDelete
  18. Simply envision the accommodation of tuning in to your normal perusing, specialized issues and long reports. This altogether helps lessening the strain on your eyes.from text to speech

    ReplyDelete
  19. first perused it out loud is an east to utilize program and it doesn't utilize clipboard, it is adequate to squeeze some hot keys.text to speech mp3

    ReplyDelete
  20. Hi there, I discovered your web site by way of Google at the same time as searching for a related topic, your web site got here up, it appears to be like good. I've bookmarked it in my google bookmarks. search engine marketing singapore

    ReplyDelete
  21. hello sir,thanks for giving that type of information...This page contains all the active and recently expired job openings and recruitment notification from India Postal Recruitment 2020...

    ReplyDelete