Package org.apache.poi.hssf.extractor
Class ExcelExtractor
- java.lang.Object
-
- org.apache.poi.hssf.extractor.ExcelExtractor
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
,POIOLE2TextExtractor
,POITextExtractor
,ExcelExtractor
public class ExcelExtractor extends Object implements POIOLE2TextExtractor, ExcelExtractor
A text extractor for Excel files.Returns the textual content of the file, suitable for indexing by something like Lucene, but not really intended for display to the user.
To turn an excel file into a CSV or similar, then see the XLS2CSVmra example
- See Also:
- XLS2CSVmra
-
-
Constructor Summary
Constructors Constructor Description ExcelExtractor(HSSFWorkbook wb)
ExcelExtractor(DirectoryNode dir)
ExcelExtractor(POIFSFileSystem fs)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static String
_extractHeaderFooter(HeaderFooter hf)
HSSFWorkbook
getDocument()
Return the underlying POIDocumentHSSFWorkbook
getFilesystem()
String
getText()
Retrieves all the text from the document.boolean
isCloseFilesystem()
static void
main(String[] args)
Command line extractor.void
setCloseFilesystem(boolean doCloseFilesystem)
void
setFormulasNotResults(boolean formulasNotResults)
Should we return the formula itself, and not the result it produces? Default is falsevoid
setIncludeBlankCells(boolean includeBlankCells)
Should blank cells be output? Default is to only output cells that are present in the file and are non-blank.void
setIncludeCellComments(boolean includeCellComments)
Should cell comments be included? Default is falsevoid
setIncludeHeadersFooters(boolean includeHeadersFooters)
Should headers and footers be included in the output? Default is truevoid
setIncludeSheetNames(boolean includeSheetNames)
Should sheet names be included? Default is true-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.poi.extractor.POIOLE2TextExtractor
getDocSummaryInformation, getMetadataTextExtractor, getRoot, getSummaryInformation
-
Methods inherited from interface org.apache.poi.extractor.POITextExtractor
close
-
-
-
-
Constructor Detail
-
ExcelExtractor
public ExcelExtractor(HSSFWorkbook wb)
-
ExcelExtractor
public ExcelExtractor(POIFSFileSystem fs) throws IOException
- Throws:
IOException
-
ExcelExtractor
public ExcelExtractor(DirectoryNode dir) throws IOException
- Throws:
IOException
-
-
Method Detail
-
main
public static void main(String[] args) throws IOException
Command line extractor.- Parameters:
args
- the command line parameters- Throws:
IOException
- if the file can't be read or contains errors
-
setIncludeSheetNames
public void setIncludeSheetNames(boolean includeSheetNames)
Description copied from interface:ExcelExtractor
Should sheet names be included? Default is true- Specified by:
setIncludeSheetNames
in interfaceExcelExtractor
- Parameters:
includeSheetNames
-true
if the sheet names should be included
-
setFormulasNotResults
public void setFormulasNotResults(boolean formulasNotResults)
Description copied from interface:ExcelExtractor
Should we return the formula itself, and not the result it produces? Default is false- Specified by:
setFormulasNotResults
in interfaceExcelExtractor
- Parameters:
formulasNotResults
-true
if the formula itself is returned
-
setIncludeCellComments
public void setIncludeCellComments(boolean includeCellComments)
Description copied from interface:ExcelExtractor
Should cell comments be included? Default is false- Specified by:
setIncludeCellComments
in interfaceExcelExtractor
- Parameters:
includeCellComments
-true
if cell comments should be included
-
setIncludeBlankCells
public void setIncludeBlankCells(boolean includeBlankCells)
Should blank cells be output? Default is to only output cells that are present in the file and are non-blank.- Parameters:
includeBlankCells
-true
if blank cells should be included
-
setIncludeHeadersFooters
public void setIncludeHeadersFooters(boolean includeHeadersFooters)
Description copied from interface:ExcelExtractor
Should headers and footers be included in the output? Default is true- Specified by:
setIncludeHeadersFooters
in interfaceExcelExtractor
- Parameters:
includeHeadersFooters
-true
if headers and footers should be included
-
getText
public String getText()
Description copied from interface:POITextExtractor
Retrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.- Specified by:
getText
in interfaceExcelExtractor
- Specified by:
getText
in interfacePOITextExtractor
- Returns:
- All the text from the document
-
_extractHeaderFooter
public static String _extractHeaderFooter(HeaderFooter hf)
-
getDocument
public HSSFWorkbook getDocument()
Description copied from interface:POIOLE2TextExtractor
Return the underlying POIDocument- Specified by:
getDocument
in interfacePOIOLE2TextExtractor
- Specified by:
getDocument
in interfacePOITextExtractor
- Returns:
- the underlying POIDocument
-
setCloseFilesystem
public void setCloseFilesystem(boolean doCloseFilesystem)
- Specified by:
setCloseFilesystem
in interfacePOITextExtractor
- Parameters:
doCloseFilesystem
-true
(default), if underlying resources/filesystem should be closed onPOITextExtractor.close()
-
isCloseFilesystem
public boolean isCloseFilesystem()
- Specified by:
isCloseFilesystem
in interfacePOITextExtractor
- Returns:
true
, if resources/filesystem should be closed onPOITextExtractor.close()
-
getFilesystem
public HSSFWorkbook getFilesystem()
- Specified by:
getFilesystem
in interfacePOITextExtractor
- Returns:
- The underlying resources/filesystem
-
-