public final class WordExtractor extends POIOLE2TextExtractor
document
Constructor and Description |
---|
WordExtractor(DirectoryNode dir) |
WordExtractor(DirectoryNode dir,
POIFSFileSystem fs)
Deprecated.
Use
WordExtractor(DirectoryNode) instead |
WordExtractor(HWPFDocument doc)
Create a new Word Extractor
|
WordExtractor(InputStream is)
Create a new Word Extractor
|
WordExtractor(POIFSFileSystem fs)
Create a new Word Extractor
|
Modifier and Type | Method and Description |
---|---|
String[] |
getCommentsText() |
String[] |
getEndnoteText() |
String |
getFooterText()
Deprecated.
|
String[] |
getFootnoteText() |
String |
getHeaderText()
Deprecated.
|
String[] |
getMainTextboxText() |
String[] |
getParagraphText()
Get the text from the word file, as an array with one String per
paragraph
|
protected static String[] |
getParagraphText(Range r) |
String |
getText()
Grab the text, based on the WordToTextConverter.
|
String |
getTextFromPieces()
Grab the text out of the text pieces.
|
static void |
main(String[] args)
Command line extractor, so people will stop moaning that they can't just
run this.
|
static String |
stripFields(String text)
Removes any fields (eg macros, page markers etc) from the string.
|
getDocSummaryInformation, getFileSystem, getMetadataTextExtractor, getRoot, getSummaryInformation
public WordExtractor(InputStream is) throws IOException
is
- InputStream containing the word fileIOException
public WordExtractor(POIFSFileSystem fs) throws IOException
fs
- POIFSFileSystem containing the word fileIOException
@Deprecated public WordExtractor(DirectoryNode dir, POIFSFileSystem fs) throws IOException
WordExtractor(DirectoryNode)
insteadIOException
public WordExtractor(DirectoryNode dir) throws IOException
IOException
public WordExtractor(HWPFDocument doc)
doc
- The HWPFDocument to extract frompublic static void main(String[] args) throws IOException
IOException
public String[] getParagraphText()
public String[] getFootnoteText()
public String[] getMainTextboxText()
public String[] getEndnoteText()
public String[] getCommentsText()
@Deprecated public String getHeaderText()
@Deprecated public String getFooterText()
public String getTextFromPieces()
public String getText()
getText
in class POITextExtractor
Copyright © 2020. All rights reserved.