PDFToTextConverter source code:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.BufferedWriter;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import java.awt.Desktop;
import javax.swing.filechooser.FileNameExtensionFilter;
import javax.swing.JFileChooser;
public class PDFToTextConverter{
public static void main(String[] args){
selectPDFFiles();
}
//allow pdf files selection for converting
public static void selectPDFFiles(){
JFileChooser chooser = new JFileChooser();
FileNameExtensionFilter filter = new FileNameExtensionFilter("PDF","pdf");
chooser.setFileFilter(filter);
chooser.setMultiSelectionEnabled(true);
int returnVal = chooser.showOpenDialog(null);
if(returnVal == JFileChooser.APPROVE_OPTION) {
File[] Files=chooser.getSelectedFiles();
System.out.println("Please wait...");
for( int i=0;i<Files.length;i++){
convertPDFToText(Files[i].toString(),"textfrompdf"+i+".txt");
}
System.out.println("Conversion complete");
}
}
public static void convertPDFToText(String src,String desc){
try{
//create file writer
FileWriter fw=new FileWriter(desc);
//create buffered writer
BufferedWriter bw=new BufferedWriter(fw);
//create pdf reader
PdfReader pr=new PdfReader(src);
//get the number of pages in the document
int pNum=pr.getNumberOfPages();
//extract text from each page and write it to the output text file
for(int page=1;page<=pNum;page++){
String text=PdfTextExtractor.getTextFromPage(pr, page);
bw.write(text);
bw.newLine();
}
bw.flush();
bw.close();
}catch(Exception e){e.printStackTrace();}
}
}
Converting a pdf document to text file is simple. Firstly, you need to use the PdfReader class (in iText library) to get all pages of the pdf document. One you have the PdfReader object, you can extract the text from the pdf document by using the getTextFromPage(PdfReader pdfreader, int page_num) method of the PdfTextExtractor class. This method extract the text from each page of the PdfReader object. While getting the text, you will use the BufferedWriter class to write the text out to a destination file.
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.BufferedWriter;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import java.awt.Desktop;
import javax.swing.filechooser.FileNameExtensionFilter;
import javax.swing.JFileChooser;
public class PDFToTextConverter{
public static void main(String[] args){
selectPDFFiles();
}
//allow pdf files selection for converting
public static void selectPDFFiles(){
JFileChooser chooser = new JFileChooser();
FileNameExtensionFilter filter = new FileNameExtensionFilter("PDF","pdf");
chooser.setFileFilter(filter);
chooser.setMultiSelectionEnabled(true);
int returnVal = chooser.showOpenDialog(null);
if(returnVal == JFileChooser.APPROVE_OPTION) {
File[] Files=chooser.getSelectedFiles();
System.out.println("Please wait...");
for( int i=0;i<Files.length;i++){
convertPDFToText(Files[i].toString(),"textfrompdf"+i+".txt");
}
System.out.println("Conversion complete");
}
}
public static void convertPDFToText(String src,String desc){
try{
//create file writer
FileWriter fw=new FileWriter(desc);
//create buffered writer
BufferedWriter bw=new BufferedWriter(fw);
//create pdf reader
PdfReader pr=new PdfReader(src);
//get the number of pages in the document
int pNum=pr.getNumberOfPages();
//extract text from each page and write it to the output text file
for(int page=1;page<=pNum;page++){
String text=PdfTextExtractor.getTextFromPage(pr, page);
bw.write(text);
bw.newLine();
}
bw.flush();
bw.close();
}catch(Exception e){e.printStackTrace();}
}
}
Converting a pdf document to text file is simple. Firstly, you need to use the PdfReader class (in iText library) to get all pages of the pdf document. One you have the PdfReader object, you can extract the text from the pdf document by using the getTextFromPage(PdfReader pdfreader, int page_num) method of the PdfTextExtractor class. This method extract the text from each page of the PdfReader object. While getting the text, you will use the BufferedWriter class to write the text out to a destination file.
I have used Aspose.PDF for .NET API to convert my text files to pdf and it has produced very good result exactly what i wanted and you can even convert pdf files to text also even if your files are large in size. Try this API, i hope you will like it also.
ReplyDeletehttp://www.aspose.com/java/pdf-component.aspx
is it possible to read hindi text also...
Deleteas it is.
Is it possible if the file convert from .DCM file to PDF and then to .Txt delimiter
DeleteThank you for sharing.
ReplyDeleteVery useful information for beginners like me.Thank you very much..
ReplyDeleteHello, if interested, for pdf conversion to text format, you can also check out this free toolkit with more options available. Just upload your needed pdf doc and convert it in a plain text. If you want to try and see how it works, see here: http://kitpdf.com/pdf_to_text/ . Maybe it's useful for you.
ReplyDeleteso thank you , good document
ReplyDeleteThx! It was useful as first approach. it is easy to know convert multi page pdf to single jpg. This Website says that convert multi page pdf to single jpg pages is also possible http://www.rasteredge.com/online/pdf/convert-pdf-to-jpeg/.
Delete.NET PDF To Text Converter: convert a PDF file in to a text file
ReplyDeleteI read your post and need to thank you for sharing such pleasant lines. Buzz Applications is a combination of multiple services
ReplyDeletethank u for code but i want to know where this file is being stored after conversion.....
ReplyDeleteYou shared very useful post. Thanks for sharing.
ReplyDeleteMagento Development in Chennai
Good post. Keep sharing such a useful post.
ReplyDeleteMagento eCommerce Website Development
HAPPY
ReplyDeleteDocumento vazio, foi o resultado...
ReplyDeleteDocumento vazio, foi o resultado final...
ReplyDeleteEmpty Document, the result was final ...
ReplyDeleteCan you plz eexpalin the code
ReplyDeleteIt is showing error at com at import line
ReplyDeleteIrrespective of code also provide details of jar so that it become easy for us.
ReplyDeleteNaturalReader Free is an exceptionally helpful program to change over composed text (MSWord, Webpage, PDF records, messages) in sound documents (MP3, WAV or CD) or in oral speech. text to speech
ReplyDeleteThanks for sharing this useful article, keep posting more like this.
ReplyDeleteEcommerce Web Development
Online Shop Builder
where is the destination text file?
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteCan you please share a code to convert .msg file to .txt file?
ReplyDeleteCan you please share a code to convert .msg file to .txt file?
ReplyDeleteSimply envision the accommodation of tuning in to your normal perusing, specialized issues and long reports. This altogether helps lessening the strain on your eyes.from text to speech
ReplyDeletefirst perused it out loud is an east to utilize program and it doesn't utilize clipboard, it is adequate to squeeze some hot keys.text to speech mp3
ReplyDeleteHi there, I discovered your web site by way of Google at the same time as searching for a related topic, your web site got here up, it appears to be like good. I've bookmarked it in my google bookmarks. search engine marketing singapore
ReplyDeletehello sir,thanks for giving that type of information...This page contains all the active and recently expired job openings and recruitment notification from India Postal Recruitment 2020...
ReplyDelete