我正在编写一个java代码，它利用ApachePOI读取ms office.doc文件，并使用iTextJARAPI创建并写入pdf文件。我已经阅读了.doc文件中的文本和表格。现在我正在寻找一种解决方案，可以读取文档中写入的图像。为了读取文档文件中的图像，我编写了如下代码。为什么这个代码不起作用。

public static void main(String[] args) {
    POIFSFileSystem fs = null;  
    Document document = new Document();
    WordExtractor extractor = null ;
    try {
        fs = new POIFSFileSystem(new FileInputStream("C:\\DATASTORE\\tableandImage.doc"));
        HWPFDocument hdocument=new HWPFDocument(fs);
        extractor = new WordExtractor(hdocument);
        OutputStream fileOutput = new FileOutputStream(new File("C:/DATASTORE/tableandImage.pdf"));
        PdfWriter.getInstance(document, fileOutput);
        document.open();
        Range range=hdocument.getRange();
        String readText=null;
        PdfPTable createTable;
        CharacterRun run;
        PicturesTable picture;

        for(int i=0;i<range.numParagraphs();i++) {
            Paragraph par = range.getParagraph(i);
            readText=par.text();
            if(!par.isInTable()) {
                if(readText.endsWith("\n")) {
                    readText=readText+"\n";
                    document.add(new com.itextpdf.text.Paragraph(readText));
                } if(readText.endsWith("\r")) {
                      readText += "\n";
                      document.add(new com.itextpdf.text.Paragraph(readText));
                  }
                run =range.getCharacterRun(i);
                picture=hdocument.getPicturesTable();
                if(picture.hasPicture(run)) {
                //if(run.isSpecialCharacter()) {  
                    Picture pic=picture.extractPicture(run, true);
                    byte[] picturearray=pic.getContent();
                    com.itextpdf.text.Image image=com.itextpdf.text.Image.getInstance(picturearray);
                    document.add(image);
                }
            } else if (par.isInTable()) { 
                  Table table = range.getTable(par);
                  TableRow tRow1= table.getRow(0);
                  int numColumns=tRow1.numCells();
                  createTable=new PdfPTable(numColumns);
                  for (int rowId=0;rowId<table.numRows();rowId++) {
                      TableRow tRow = table.getRow(rowId);
                      for (int cellId=0;cellId<tRow.numCells();cellId++) {
                          TableCell tCell = tRow.getCell(cellId);
                          PdfPCell c1 = new PdfPCell(new Phrase(tCell.text()));
                          createTable.addCell(c1);
                      }
                  }
                  document.add(createTable);
              } 
        }
    }catch(IOException e) {
        System.out.println("IO Exception");
        e.printStackTrace();
    }
    catch(Exception exep) {
        exep.printStackTrace();
    }finally {  
        document.close();  
    }  
}

问题是：1。如果（picture.haspricture（run））不满足，但文档具有jpeg图像，则为条件。
我在阅读表格时遇到以下异常。
java.lang.illegalargumentexception:此段落不是org.apache.poi.hwpf.usermodel.range.gettable（range）表中的第一段。java:876)在pagecode.readdocxordocfile.main（readdocxordocfile。java:113)
有人能帮我解决这个问题吗。谢谢您。

if (par.isInTable()) { Table table = range.getTable(par); for (int rn=0; rn<table.numRows(); rn++) { TableRow row = table.getRow(rn); for (int cn=0; cn<row.numCells(); cn++) { TableCell cell = row.getCell(cn); for (int pn=0; pn<cell.numParagraphs(); pn++) { Paragraph cellParagraph = cell.getParagraph(pn); // your PDF conversion code goes here } } } i += table.numParagraphs()-1; // skip the already processed (table-)paragraphs in the outer loop }

PictureStore pictureStore = new PictureStore(hdocument); // bla bla ... for (int cr=0; cr < par.numCharacterRuns(); cr++) { CharacterRun characterRun = par.getCharacterRun(cr); Field field = hdocument.getFields().getFieldByStartOffset(FieldsDocumentPart.MAIN, characterRun.getStartOffset()); if (field != null && field.getType() == 0x3A) { // 0x3A is type "EMBED" Picture pic = pictureStore.getPicture(field.secondSubrange(characterRun)); } }

1条答案

按热度按时间

pxy2qtax1#

关于您的例外：
您的代码迭代所有段落和调用 isInTable() 为他们每个人。由于表通常由几个这样的段落组成，因此 getTable() 对于单个表也执行多次。
但是，代码应该做的是找到表的第一段，然后处理其中的所有段落（通过 getRow(m).getCell(n) )最后继续表后面第一段的外循环。代码方面，这可能大致如下所示（假设没有合并单元格、嵌套表和其他有趣的边缘情况）：

关于图片问题：
我猜你是想得到一个固定在给定段落中的图片，对吗？不幸的是，预定义的poi方法只有在图片没有嵌入到字段中时才起作用（实际上这是相当罕见的）。对于基于字段的图像（即，预览嵌入OLE的图像），您应该执行以下操作（未测试！）：

的可能值的列表 Field.getType() 看这里。

赞(0）回复(0）举报 2021-06-30

读取.doc文件内容并用java写入pdf文件

1条答案

相关问题

热门标签

最新问答