Interface ExtractorProvider

    • Method Detail

      • create

        POITextExtractor create​(File file,
                                String password)
                         throws IOException
        Create Extractor via file
        Parameters:
        file - the file
        password - the password or null if not encrypted
        Returns:
        the extractor
        Throws:
        IOException - if file can't be read or parsed
      • create

        POITextExtractor create​(InputStream inputStream,
                                String password)
                         throws IOException
        Create Extractor via InputStream
        Parameters:
        inputStream - the stream
        password - the password or null if not encrypted
        Returns:
        the extractor
        Throws:
        IOException - if stream can't be read or parsed
      • create

        POITextExtractor create​(DirectoryNode poifsDir,
                                String password)
                         throws IOException
        Create Extractor from POIFS node
        Parameters:
        poifsDir - the node
        password - the password or null if not encrypted
        Returns:
        the extractor
        Throws:
        IOException - if node can't be parsed
        IllegalStateException - if processing fails for some other reason, e.g. missing JCE Unlimited Strength Jurisdiction Policy files while handling encrypted files.
      • identifyEmbeddedResources

        default void identifyEmbeddedResources​(POIOLE2TextExtractor ext,
                                               List<Entry> dirs,
                                               List<InputStream> nonPOIFS)
                                        throws IOException
        Returns an array of text extractors, one for each of the embedded documents in the file (if there are any). If there are no embedded documents, you'll get back an empty array. Otherwise, you'll get one open POITextExtractor for each embedded file.
        Parameters:
        ext - the extractor holding the directory to start parsing
        dirs - a list to be filled with directory references holding embedded
        nonPOIFS - a list to be filled with streams which aren't based on POIFS entries
        Throws:
        IOException - when the format specific extraction fails because of invalid entires