Java programs: HTML To PDF Converter

Friday, July 5, 2013

HTML To PDF Converter

HTMLToPDFConverter is able to convert multiple html files to PDF files. It is easy to use. The program provides you two options. First, you can select html files in your local computer to be converted to PDF files. A file open dialog is shown. It allows you to select html files from your local computer. You will wait for a while to complete the conversion task.

Another option allows you to convert html files on the web to PDF files. You will need to type or paste the address of the html page in to the Address box and click Add to add this address to the conversion list. You can add many html pages as you want. After adding all addresses that you want to the list, click OK and wait a moment until the conversion task finishes.

html to pdf converter web pages selection

Technically, to convert an html file to a PDF file, there are few steps that have to be taken. These steps are:
-After the html file is read, it is cleaned. The Jsoup library is used to clean the html file.
-The cleaned html file is converted to xhtml file by using the Jtidy library.
-The final step is to convert the xhtml file to a PDF file by using XMLWorker library.

HTMLToPDFConverter source code:

import java.io.BufferedReader;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FilenameFilter;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.util.Scanner;
import javax.swing.JFileChooser;
import javax.swing.filechooser.FileNameExtensionFilter;
import org.jsoup.Jsoup;
import org.w3c.tidy.Tidy;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorkerHelper;
import org.jsoup.Jsoup;
import org.jsoup.safety.Whitelist;
import org.jsoup.select.Elements;
import javax.swing.*;
import java.awt.event.*;
import java.awt.*;

public class HTMLToPDFConverter{
public static void main(String[] args){

System.out.println("........HTML to PDF Converter......");
int ch;
Scanner sc=new Scanner(System.in);
showOptions();
System.out.println("Enter your choice:");
ch=sc.nextInt();
switch(ch){
case 1: selectLocal();break;
case 2: selectWeb();break;
case 3: System.exit(0);
default: System.out.println("Invalid choice");
}

}

public static void selectLocal(){
JFileChooser chooser = new JFileChooser();
FileNameExtensionFilter filter = new FileNameExtensionFilter("HTML", "html", "htm");
chooser.setFileFilter(filter);
chooser.setMultiSelectionEnabled(true);
int returnVal = chooser.showOpenDialog(null);
if(returnVal == JFileChooser.APPROVE_OPTION) {
File[] Files=chooser.getSelectedFiles();
System.out.println("Please wait...");
for( int i=0;i<Files.length;i++){
String address="file:///"+Files[i].toString();
loadHTML(address);
convertHTMLToPDF();
}
System.out.println("Conversion complete");
}
}
public static void selectWeb(){
new LinksAdd("Add links");

}

//display a window to add links to the list for conversion

static class LinksAdd extends JFrame implements ActionListener{
DefaultListModel listmodel;
JTextField textlink;
JLabel lblwait;

LinksAdd(String title){
Container cont=getContentPane();
cont.setLayout(new BorderLayout());
setTitle(title);
setPreferredSize(new Dimension(600,300));
JLabel lbl=new JLabel("Address:");
textlink=new JTextField(30);
JButton btadd=new JButton("Add");
btadd.addActionListener(this);
JPanel panelnorth=new JPanel();
panelnorth.add(lbl);
panelnorth.add(textlink);
panelnorth.add(btadd);
listmodel=new DefaultListModel();
JList linkslist=new JList(listmodel);
linkslist.setVisibleRowCount(10);
linkslist.setFixedCellWidth(200);
JScrollPane pane=new JScrollPane(linkslist);
JButton btok=new JButton("OK");
lblwait=new JLabel();
JPanel panelsouth=new JPanel();
panelsouth.add(btok);
panelsouth.add(lblwait);
btok.addActionListener(this);
cont.add(panelnorth, BorderLayout.NORTH);
cont.add(pane, BorderLayout.CENTER);
cont.add(panelsouth, BorderLayout.SOUTH);

pack();
setVisible(true);
}

public void actionPerformed(ActionEvent e){
if(e.getActionCommand().equals("Add")){
if(!textlink.getText().equals("")){
listmodel.addElement(textlink.getText());
textlink.setText("");
}

}
else if(e.getActionCommand().equals("OK")){
Thread t=new Thread(){
public void run(){
lblwait.setText("Please wait...");

for(int i=0;i<listmodel.size();i++){
loadHTML(listmodel.getElementAt(i).toString());
convertHTMLToPDF();
}
dispose();
}
};
t.start();

}
}
}

//read, and convert the original html to xhtml file
//by using parse method of Tidy class
public static void loadHTML(String src){
Tidy tidy=new Tidy();
tidy.setMakeClean(true);
tidy.setXHTML(true);
try{

URL url=new URL(src);
File cleanedHTMLFile=cleanHTML(url.openStream());
tidy.parse(new FileInputStream(cleanedHTMLFile),new FileOutputStream("d:/tempfile.xhtml"));
}catch(Exception fnfe){
System.out.println("The file can't be read.");
System.exit(-1);
}
}
//convert the xhtml file to PDF file by using
//the parseXHtml method of the XMLWorkerHelper class
public static void convertHTMLToPDF(){
try{

Document doc=new Document();
PdfWriter pw=PdfWriter.getInstance(doc, new FileOutputStream(System.currentTimeMillis()+".pdf"));
doc.open();
XMLWorkerHelper.getInstance().parseXHtml(pw, doc, new FileInputStream("d:/tempfile.xhtml"));
doc.close();

}catch(Exception e){
System.out.println("Conversion can't be completed.");
System.exit(-1);}
}

public static File cleanHTML(InputStream srcstream){
String htmlfile="d:/cleanedhtml.html";

try{
org.jsoup.nodes.Document doc=Jsoup.parse(srcstream, "UTF-8", "");
Whitelist wlist=Whitelist.relaxed();
wlist.addAttributes(":all","style","href","ftp","http","https","class");
String cleanedBody=Jsoup.clean(doc.toString(),wlist);
Elements ls=doc.select("body");
ls.remove();
cleanedBody="<body>"+cleanedBody+"</body>";
String cleanedHTML=doc.toString().replaceFirst("</html>",cleanedBody+"</html>");
DataOutputStream dos=new DataOutputStream(new FileOutputStream(htmlfile));
dos.writeBytes(cleanedHTML);
dos.flush();
dos.close();

}catch(IOException ie){System.out.println("Unable to process the file");}
return new File(htmlfile);
}

public static void showOptions(){
System.out.println("1. html files from local computer");
System.out.println("2. html files from the web");

}

}

PDF To Word Converter, convert PDF To Word

21 comments:

UnknownAugust 25, 2013 at 1:40 AM
the program did not create the pdf with images?
Thanks
ReplyDelete
Replies
JavaAugust 30, 2013 at 12:34 AM
Now, we still not have a free library that does the perfect thing in html to pdf conversion task.
ReplyDelete
Replies
AnonymousSeptember 10, 2013 at 8:57 PM
It is useful Java code to convert html to pdf file. I tried to find it for long a go.
ReplyDelete
Replies
AnonymousOctober 22, 2013 at 6:56 AM
Very informative, Thanks for Sharing, I came across another Java PDF Component that converts PDF files to HTML format. Here is the link, Aspose.Pdf for Java. Could you please tell me how your product is different from this one?
ReplyDelete
Replies
AnonymousNovember 1, 2013 at 3:54 AM
Thanks for the reply Dara,
Here are are some code examples I found on the Aspose documentation section, It can give you any overview how Aspose guys are doing it. Here are the links

http://www.aspose.com/docs/display/pdfnet/HTML+to+PDF+conversion
http://www.aspose.com/docs/display/pdfnet/How+to+convert+HTML+to+PDF+using+InLineHTML+approach
http://www.aspose.com/docs/display/pdfnet/Convert+PDF+file+into+HTML+format

Thanks once again
David
ReplyDelete
Replies
Mark HenryJune 16, 2015 at 12:53 AM
I really like your method of converting HTML to PDF. However, at times a need may arise to convert JPG to PDF . For this purpose, i would like to refer to this amazing JPG to PDF converter. You must try this once and experience the smooth operation.
ReplyDelete
Replies
UnknownFebruary 7, 2016 at 12:47 PM
iam getting error while using method 1.i.e..fetching html files from local computer.and the erroe is

Tidy (vers 4th August 2000) Parsing "InputStream"
line 2 column 8 - Warning: inserting missing 'title' element

InputStream: Document content looks like HTML 2.0
1 warnings/errors were found!

Conversion can't be completed.

can u pls help in resolving it?
ReplyDelete
Replies
UnknownFebruary 15, 2016 at 10:45 AM
Interesting Article

JavaScript Training Courses | Angularjs Training in Chennai
ReplyDelete
Replies
AnonymousDecember 13, 2016 at 12:28 AM
Hey can you tell me how to fix it so it shows images in PDF?
ReplyDelete
Replies
Divyesh KanzariyaJune 20, 2017 at 3:24 AM
can you suggest what to do if HTML contain image your solution is not work for me. there is any solution for that ? let me know.

Thank for article
ReplyDelete
Replies
unknownMarch 26, 2018 at 4:49 AM
Due to how long Java has been around, almost any question you can imagine has already been asked, answered, indexed, and democratically perfected through upvotes on the Internet. It is seriously hard to stump a search engine with a Java coding problem.

core java training institute in chennai
ReplyDelete
Replies
unknownMarch 26, 2018 at 4:54 AM
One thing you need to remember about spring is that it’s not a single entity. Spring contains various modules like Spring Security, Spring Boot etc.

Since Spring is most demanded framework you can find many study materials for it online. Check the below diagram which shows the most popular frameworks used in market.

spring training institute in chennai
ReplyDelete
Replies
cynthiawilliamsNovember 20, 2018 at 11:07 PM
Informative post, thanks for taking time to share this page.
Spring framework Certification
Spring framework Training
Spring Hibernate Training
Spring and Hibernate Training
Hibernate Training Chennai
Struts Training center in Chennai
Struts course in Chennai
ReplyDelete
Replies
Barbara PantusoJanuary 12, 2019 at 8:11 PM
I will use this tool for my work. Thanks

Blog | Guest Blogging | Guest Posting
ReplyDelete
Replies
luckysJuly 13, 2019 at 10:45 AM
gta 5 apk
ReplyDelete
Replies
luckysJuly 17, 2019 at 9:19 AM
ogyoutube
ReplyDelete
Replies
iteducationcentreJuly 12, 2024 at 10:27 PM
Thanks for sharing such an useful and helpful post.
Java training in Pune
ReplyDelete
Replies

Add comment

Java programs

Home

Friday, July 5, 2013

HTML To PDF Converter

21 comments:

Popular Posts

References

Translate