Java programs: PDF To Text Converter

Saturday, June 22, 2013

PDF To Text Converter

The PDFToTextConverter program can be used to convert a PDF file in to a text file. When the program runs you can selected one or many PDF files for converting. Then wait for a while. The amount of time waiting depends mainly on the number of PDF files you selected and each file size. It is noted that a PDF file that does not have text can not be converted.

PDFToTextConverter source code:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.BufferedWriter;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import java.awt.Desktop;
import javax.swing.filechooser.FileNameExtensionFilter;
import javax.swing.JFileChooser;

public class PDFToTextConverter{
public static void main(String[] args){
selectPDFFiles();
}

//allow pdf files selection for converting
public static void selectPDFFiles(){

JFileChooser chooser = new JFileChooser();
FileNameExtensionFilter filter = new FileNameExtensionFilter("PDF","pdf");
chooser.setFileFilter(filter);
chooser.setMultiSelectionEnabled(true);
int returnVal = chooser.showOpenDialog(null);
if(returnVal == JFileChooser.APPROVE_OPTION) {
File[] Files=chooser.getSelectedFiles();
System.out.println("Please wait...");
for( int i=0;i<Files.length;i++){
convertPDFToText(Files[i].toString(),"textfrompdf"+i+".txt");
}
System.out.println("Conversion complete");
}

}

public static void convertPDFToText(String src,String desc){
try{
//create file writer
FileWriter fw=new FileWriter(desc);
//create buffered writer
BufferedWriter bw=new BufferedWriter(fw);
//create pdf reader
PdfReader pr=new PdfReader(src);
//get the number of pages in the document
int pNum=pr.getNumberOfPages();
//extract text from each page and write it to the output text file
for(int page=1;page<=pNum;page++){
String text=PdfTextExtractor.getTextFromPage(pr, page);
bw.write(text);
bw.newLine();

}
bw.flush();
bw.close();

}catch(Exception e){e.printStackTrace();}

}

}

Converting a pdf document to text file is simple. Firstly, you need to use the PdfReader class (in iText library) to get all pages of the pdf document. One you have the PdfReader object, you can extract the text from the pdf document by using the getTextFromPage(PdfReader pdfreader, int page_num) method of the PdfTextExtractor class. This method extract the text from each page of the PdfReader object. While getting the text, you will use the BufferedWriter class to write the text out to a destination file.

PDF To Word Converter, convert PDF To Word

30 comments:

DelianaOctober 31, 2013 at 3:02 AM
I have used Aspose.PDF for .NET API to convert my text files to pdf and it has produced very good result exactly what i wanted and you can even convert pdf files to text also even if your files are large in size. Try this API, i hope you will like it also.

http://www.aspose.com/java/pdf-component.aspx
ReplyDelete
Replies
JavaOctober 31, 2013 at 4:54 PM
Thank you for sharing.
ReplyDelete
Replies
ArunkumarMarch 25, 2014 at 9:07 PM
Very useful information for beginners like me.Thank you very much..
ReplyDelete
Replies
UnknownSeptember 8, 2014 at 1:27 AM
Hello, if interested, for pdf conversion to text format, you can also check out this free toolkit with more options available. Just upload your needed pdf doc and convert it in a plain text. If you want to try and see how it works, see here: http://kitpdf.com/pdf_to_text/ . Maybe it's useful for you.
ReplyDelete
Replies
malik masisNovember 5, 2014 at 12:05 PM
so thank you , good document
ReplyDelete
Replies
AnonymousNovember 15, 2015 at 11:07 PM
.NET PDF To Text Converter: convert a PDF file in to a text file
ReplyDelete
Replies
UnknownJanuary 13, 2016 at 3:29 AM
I read your post and need to thank you for sharing such pleasant lines. Buzz Applications is a combination of multiple services
ReplyDelete
Replies
AnonymousFebruary 22, 2016 at 1:38 AM
thank u for code but i want to know where this file is being stored after conversion.....
ReplyDelete
Replies
UnknownMay 23, 2016 at 5:34 AM
You shared very useful post. Thanks for sharing.

Magento Development in Chennai
ReplyDelete
Replies
UnknownJune 9, 2016 at 4:16 AM
Good post. Keep sharing such a useful post.

Magento eCommerce Website Development
ReplyDelete
Replies
192.168.1.1September 27, 2016 at 1:49 AM
HAPPY
ReplyDelete
Replies
UnknownDecember 12, 2016 at 11:11 AM
Documento vazio, foi o resultado...
ReplyDelete
Replies
UnknownDecember 12, 2016 at 11:12 AM
Documento vazio, foi o resultado final...
ReplyDelete
Replies
UnknownDecember 12, 2016 at 11:21 AM
Empty Document, the result was final ...
ReplyDelete
Replies
UnknownJanuary 10, 2017 at 3:23 AM
Can you plz eexpalin the code
ReplyDelete
Replies
AnonymousFebruary 19, 2017 at 4:46 AM
It is showing error at com at import line
ReplyDelete
Replies
AnonymousMarch 24, 2017 at 2:06 AM
Irrespective of code also provide details of jar so that it become easy for us.
ReplyDelete
Replies
James walkerSeptember 6, 2017 at 10:55 AM
NaturalReader Free is an exceptionally helpful program to change over composed text (MSWord, Webpage, PDF records, messages) in sound documents (MP3, WAV or CD) or in oral speech. text to speech
ReplyDelete
Replies
shopieasyDecember 18, 2017 at 3:46 AM
Thanks for sharing this useful article, keep posting more like this.
Ecommerce Web Development
Online Shop Builder
ReplyDelete
Replies
Amit TomarMay 6, 2018 at 1:02 AM
where is the destination text file?
ReplyDelete
Replies
UnknownMay 10, 2018 at 3:47 AM
This comment has been removed by the author.
ReplyDelete
Replies
AnonymousMay 10, 2018 at 3:48 AM
Can you please share a code to convert .msg file to .txt file?
ReplyDelete
Replies
UnknownMay 10, 2018 at 3:48 AM
Can you please share a code to convert .msg file to .txt file?
ReplyDelete
Replies
Sharon M JenkinsJuly 21, 2018 at 7:13 AM
Simply envision the accommodation of tuning in to your normal perusing, specialized issues and long reports. This altogether helps lessening the strain on your eyes.from text to speech
ReplyDelete
Replies
ShawnFebruary 17, 2019 at 2:11 AM
first perused it out loud is an east to utilize program and it doesn't utilize clipboard, it is adequate to squeeze some hot keys.text to speech mp3
ReplyDelete
Replies
John DriskellApril 11, 2019 at 8:04 AM
Hi there, I discovered your web site by way of Google at the same time as searching for a related topic, your web site got here up, it appears to be like good. I've bookmarked it in my google bookmarks. search engine marketing singapore
ReplyDelete
Replies
Anurag SrivastavaMarch 3, 2020 at 3:32 AM
hello sir,thanks for giving that type of information...This page contains all the active and recently expired job openings and recruitment notification from India Postal Recruitment 2020...
ReplyDelete
Replies

Add comment

Java programs

Home

Saturday, June 22, 2013

PDF To Text Converter

30 comments:

Popular Posts

References

Translate