PDFImageExtractor source code
import java.awt.image.BufferedImage;
import java.io.File;
import javax.imageio.ImageIO;
import com.itextpdf.text.pdf.PRStream;
import com.itextpdf.text.pdf.PdfName;
import com.itextpdf.text.pdf.PdfObject;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfImageObject;
import javax.swing.filechooser.FileNameExtensionFilter;
import javax.swing.JFileChooser;
public class PDFImageExtractor{
public static void main(String[] args){
selectPDF();
}
//allow pdf file selection for extracting
public static void selectPDF(){
JFileChooser chooser = new JFileChooser();
FileNameExtensionFilter filter = new FileNameExtensionFilter("PDF","pdf");
chooser.setFileFilter(filter);
chooser.setMultiSelectionEnabled(false);
int returnVal = chooser.showOpenDialog(null);
if(returnVal == JFileChooser.APPROVE_OPTION) {
File file=chooser.getSelectedFile();
System.out.println("Please wait...");
extractImage(file.toString());
System.out.println("Extraction complete");
}
}
public static void extractImage(String src){
try{
//create pdf reader object
PdfReader pr=new PdfReader(src);
PRStream pst;
PdfImageObject pio;
PdfObject po;
int n=pr.getXrefSize(); //number of objects in pdf document
for(int i=0;i<n;i++){
po=pr.getPdfObject(i); //get the object at the index i in the objects collection
if(po==null || !po.isStream()) //object not found so continue
continue;
pst=(PRStream)po; //cast object to stream
PdfObject type=pst.get(PdfName.SUBTYPE); //get the object type
//check if the object is the image type object
if(type!=null && type.toString().equals(PdfName.IMAGE.toString())){
pio=new PdfImageObject(pst); //get the image
BufferedImage bi=pio.getBufferedImage(); //convert the image to buffered image
ImageIO.write(bi, "jpg", new File("image"+i+".jpg")); //write the buffered image
//to local disk
}
}
}catch(Exception e){e.printStackTrace();}
}
}
In the example code above, the getPdfObject(int index) is used to extract an object from the pdf document at the specified index. To determine whether the object is an image, you need to get the type of the object by using the get method of the stream created from the object.
Note: When you use this program to extract the images from the PDF document, some images might be in wrong order (different from what you see on the PDF pages). It is the problem from iText library itself. I tried to solve this problem with PdfBox. However, it can not be solved.
how to extract images containing fonts such as formula
ReplyDeleteI'm not a developer, i always use this free online tool to extract images from pdf
ReplyDeleteLearned to extract all images on a PDF document and convert Adobe PDF file.
ReplyDelete"bi" is allways null!
ReplyDeleteLooks kinda difficult... Wonder if I should go to this website or anything similar to learn to code or at least write good papers...
ReplyDeleteReally enjoyed this article post. Really looking forward to read more. Will read on... visit this website
ReplyDeleteThe PDF standard grants individuals in various areas to chip away at similar archives. https://www.altoconvertpdftojpg.com/faq
ReplyDeleteAdditionally, someone at one point needed to join at least two PDF records into a solitary document. https://altoconvertjpgtopdf.com/about-us
ReplyDelete