I'm presently working Java project with utilization of apache poi. Now during my project I wish to convert doc file to pdf. The conversion done effectively however i only get text in pdf no text style or text colour. My pdf appears like a black &lifier whitened. While my doc file is coloured and also have different type of text.

This really is my code,

 POIFSFileSystem fs = null;  
 Document document = new Document(); 

 try {  
     System.out.println("Starting the test");  
     fs = new POIFSFileSystem(new FileInputStream("/document/test2.doc"));  

     HWPFDocument doc = new HWPFDocument(fs);  
     WordExtractor we = new WordExtractor(doc);  

     OutputStream file = new FileOutputStream(new File("/document/test.pdf")); 

     PdfWriter writer = PdfWriter.getInstance(document, file);  

     Range range = doc.getRange();
     document.open();  
     writer.setPageEmpty(true);  
     document.newPage();  
     writer.setPageEmpty(true);  

     String[] paragraphs = we.getParagraphText();  
     for (int i = 0; i < paragraphs.length; i++) {  

         org.apache.poi.hwpf.usermodel.Paragraph pr = range.getParagraph(i);
        // CharacterRun run = pr.getCharacterRun(i);
        // run.setBold(true);
        // run.setCapitalized(true);
        // run.setItalic(true);
         paragraphs[i] = paragraphs[i].replaceAll("\\cM?\r?\n", "");  
     System.out.println("Length:" + paragraphs[i].length());  
     System.out.println("Paragraph" + i + ": " + paragraphs[i].toString());  

     // add the paragraph to the document  
     document.add(new Paragraph(paragraphs[i]));  
     }  

     System.out.println("Document testing completed");  
 } catch (Exception e) {  
     System.out.println("Exception during test");  
     e.printStackTrace();  
 } finally {  
                 // close the document  
    document.close();  
             }  
 }  

help me.

Thnx ahead of time.

Should you take a look at Apache Tika, there's among reading through some style information from the HWPF document. The code in Tika creates HTML in line with the HWPF contents, however, you should discover that something much the same works best for your situation.

The Tika class is https://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/primary/java/org/apache/tika/parser/microsoft/WordExtractor.java

One factor to notice about word documents is the fact that my way through anyone Character Run has got the same formatting put on it. A Paragraph thus remains comprised of a number of Character Runs. Some styling is used to some Paragraph, along with other parts are carried out around the runs. Based on what formatting you are interested in, it might therefore be around the paragraph or even the run.