Another option allows you to convert html files on the web to PDF files. You will need to type or paste the address of the html page in to the Address box and click Add to add this address to the conversion list. You can add many html pages as you want. After adding all addresses that you want to the list, click OK and wait a moment until the conversion task finishes.
Technically, to convert an html file to a PDF file, there are few steps that have to be taken. These steps are:
-After the html file is read, it is cleaned. The Jsoup library is used to clean the html file.
-The cleaned html file is converted to xhtml file by using the Jtidy library.
-The final step is to convert the xhtml file to a PDF file by using XMLWorker library.
HTMLToPDFConverter source code:
import java.io.BufferedReader;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FilenameFilter;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.util.Scanner;
import javax.swing.JFileChooser;
import javax.swing.filechooser.FileNameExtensionFilter;
import org.jsoup.Jsoup;
import org.w3c.tidy.Tidy;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorkerHelper;
import org.jsoup.Jsoup;
import org.jsoup.safety.Whitelist;
import org.jsoup.select.Elements;
import javax.swing.*;
import java.awt.event.*;
import java.awt.*;
public class HTMLToPDFConverter{
public static void main(String[] args){
System.out.println("........HTML to PDF Converter......");
int ch;
Scanner sc=new Scanner(System.in);
showOptions();
System.out.println("Enter your choice:");
ch=sc.nextInt();
switch(ch){
case 1: selectLocal();break;
case 2: selectWeb();break;
case 3: System.exit(0);
default: System.out.println("Invalid choice");
}
}
public static void selectLocal(){
JFileChooser chooser = new JFileChooser();
FileNameExtensionFilter filter = new FileNameExtensionFilter("HTML", "html", "htm");
chooser.setFileFilter(filter);
chooser.setMultiSelectionEnabled(true);
int returnVal = chooser.showOpenDialog(null);
if(returnVal == JFileChooser.APPROVE_OPTION) {
File[] Files=chooser.getSelectedFiles();
System.out.println("Please wait...");
for( int i=0;i<Files.length;i++){
String address="file:///"+Files[i].toString();
loadHTML(address);
convertHTMLToPDF();
}
System.out.println("Conversion complete");
}
}
public static void selectWeb(){
new LinksAdd("Add links");
}
//display a window to add links to the list for conversion
static class LinksAdd extends JFrame implements ActionListener{
DefaultListModel listmodel;
JTextField textlink;
JLabel lblwait;
LinksAdd(String title){
Container cont=getContentPane();
cont.setLayout(new BorderLayout());
setTitle(title);
setPreferredSize(new Dimension(600,300));
JLabel lbl=new JLabel("Address:");
textlink=new JTextField(30);
JButton btadd=new JButton("Add");
btadd.addActionListener(this);
JPanel panelnorth=new JPanel();
panelnorth.add(lbl);
panelnorth.add(textlink);
panelnorth.add(btadd);
listmodel=new DefaultListModel();
JList linkslist=new JList(listmodel);
linkslist.setVisibleRowCount(10);
linkslist.setFixedCellWidth(200);
JScrollPane pane=new JScrollPane(linkslist);
JButton btok=new JButton("OK");
lblwait=new JLabel();
JPanel panelsouth=new JPanel();
panelsouth.add(btok);
panelsouth.add(lblwait);
btok.addActionListener(this);
cont.add(panelnorth, BorderLayout.NORTH);
cont.add(pane, BorderLayout.CENTER);
cont.add(panelsouth, BorderLayout.SOUTH);
pack();
setVisible(true);
}
public void actionPerformed(ActionEvent e){
if(e.getActionCommand().equals("Add")){
if(!textlink.getText().equals("")){
listmodel.addElement(textlink.getText());
textlink.setText("");
}
}
else if(e.getActionCommand().equals("OK")){
Thread t=new Thread(){
public void run(){
lblwait.setText("Please wait...");
for(int i=0;i<listmodel.size();i++){
loadHTML(listmodel.getElementAt(i).toString());
convertHTMLToPDF();
}
dispose();
}
};
t.start();
}
}
}
//read, and convert the original html to xhtml file
//by using parse method of Tidy class
public static void loadHTML(String src){
Tidy tidy=new Tidy();
tidy.setMakeClean(true);
tidy.setXHTML(true);
try{
URL url=new URL(src);
File cleanedHTMLFile=cleanHTML(url.openStream());
tidy.parse(new FileInputStream(cleanedHTMLFile),new FileOutputStream("d:/tempfile.xhtml"));
}catch(Exception fnfe){
System.out.println("The file can't be read.");
System.exit(-1);
}
}
//convert the xhtml file to PDF file by using
//the parseXHtml method of the XMLWorkerHelper class
public static void convertHTMLToPDF(){
try{
Document doc=new Document();
PdfWriter pw=PdfWriter.getInstance(doc, new FileOutputStream(System.currentTimeMillis()+".pdf"));
doc.open();
XMLWorkerHelper.getInstance().parseXHtml(pw, doc, new FileInputStream("d:/tempfile.xhtml"));
doc.close();
}catch(Exception e){
System.out.println("Conversion can't be completed.");
System.exit(-1);}
}
public static File cleanHTML(InputStream srcstream){
String htmlfile="d:/cleanedhtml.html";
try{
org.jsoup.nodes.Document doc=Jsoup.parse(srcstream, "UTF-8", "");
Whitelist wlist=Whitelist.relaxed();
wlist.addAttributes(":all","style","href","ftp","http","https","class");
String cleanedBody=Jsoup.clean(doc.toString(),wlist);
Elements ls=doc.select("body");
ls.remove();
cleanedBody="<body>"+cleanedBody+"</body>";
String cleanedHTML=doc.toString().replaceFirst("</html>",cleanedBody+"</html>");
DataOutputStream dos=new DataOutputStream(new FileOutputStream(htmlfile));
dos.writeBytes(cleanedHTML);
dos.flush();
dos.close();
}catch(IOException ie){System.out.println("Unable to process the file");}
return new File(htmlfile);
}
public static void showOptions(){
System.out.println("1. html files from local computer");
System.out.println("2. html files from the web");
}
}
the program did not create the pdf with images?
ReplyDeleteThanks
Now, we still not have a free library that does the perfect thing in html to pdf conversion task.
ReplyDeleteIt is useful Java code to convert html to pdf file. I tried to find it for long a go.
ReplyDeleteYou can try http://www.tagpdf.com/online/convert-pdf-to-html/ for high quality convert pdf to xhtml online. Even there is 10% discount on the first order.
DeleteVery informative, Thanks for Sharing, I came across another Java PDF Component that converts PDF files to HTML format. Here is the link, Aspose.Pdf for Java. Could you please tell me how your product is different from this one?
ReplyDeleteThis program is able to convert local html files to pdf files and you can also download html files from the internet and convert them to pdf file. Sorry, I am not sure what Aspose can do.
DeleteThanks for the reply Dara,
ReplyDeleteHere are are some code examples I found on the Aspose documentation section, It can give you any overview how Aspose guys are doing it. Here are the links
http://www.aspose.com/docs/display/pdfnet/HTML+to+PDF+conversion
http://www.aspose.com/docs/display/pdfnet/How+to+convert+HTML+to+PDF+using+InLineHTML+approach
http://www.aspose.com/docs/display/pdfnet/Convert+PDF+file+into+HTML+format
Thanks once again
David
I really like your method of converting HTML to PDF. However, at times a need may arise to convert JPG to PDF . For this purpose, i would like to refer to this amazing JPG to PDF converter. You must try this once and experience the smooth operation.
ReplyDeleteiam getting error while using method 1.i.e..fetching html files from local computer.and the erroe is
ReplyDeleteTidy (vers 4th August 2000) Parsing "InputStream"
line 2 column 8 - Warning: inserting missing 'title' element
InputStream: Document content looks like HTML 2.0
1 warnings/errors were found!
Conversion can't be completed.
can u pls help in resolving it?
r u build in Jar File
Deleter u build in Jar File
DeleteInteresting Article
ReplyDeleteJavaScript Training Courses | Angularjs Training in Chennai
Hey can you tell me how to fix it so it shows images in PDF?
ReplyDeletecan you suggest what to do if HTML contain image your solution is not work for me. there is any solution for that ? let me know.
ReplyDeleteThank for article
Due to how long Java has been around, almost any question you can imagine has already been asked, answered, indexed, and democratically perfected through upvotes on the Internet. It is seriously hard to stump a search engine with a Java coding problem.
ReplyDeletecore java training institute in chennai
One thing you need to remember about spring is that it’s not a single entity. Spring contains various modules like Spring Security, Spring Boot etc.
ReplyDeleteSince Spring is most demanded framework you can find many study materials for it online. Check the below diagram which shows the most popular frameworks used in market.
spring training institute in chennai
Informative post, thanks for taking time to share this page.
ReplyDeleteSpring framework Certification
Spring framework Training
Spring Hibernate Training
Spring and Hibernate Training
Hibernate Training Chennai
Struts Training center in Chennai
Struts course in Chennai
I will use this tool for my work. Thanks
ReplyDeleteBlog | Guest Blogging | Guest Posting
gta 5 apk
ReplyDeleteogyoutube
ReplyDeleteThanks for sharing such an useful and helpful post.
ReplyDeleteJava training in Pune