Class StringUtil


  • @Internal
    public final class StringUtil
    extends Object
    Collection of string handling utilities
    • Field Detail

      • UTF16LE

        public static final Charset UTF16LE
      • UTF8

        public static final Charset UTF8
      • WIN_1252

        public static final Charset WIN_1252
    • Method Detail

      • setMaxRecordLength

        public static void setMaxRecordLength​(int length)
        Parameters:
        length - the max record length allowed for StringUtil
      • getMaxRecordLength

        public static int getMaxRecordLength()
        Returns:
        the max record length allowed for StringUtil
      • getFromUnicodeLE

        public static String getFromUnicodeLE​(byte[] string,
                                              int offset,
                                              int len)
                                       throws ArrayIndexOutOfBoundsException,
                                              IllegalArgumentException
        Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it.

        { 0x16, 0x00 } -0x16

        Parameters:
        string - the byte array to be converted
        offset - the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode character
        len - the length of the final string
        Returns:
        the converted string, never null.
        Throws:
        ArrayIndexOutOfBoundsException - if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)
        IllegalArgumentException - if len is too large (i.e., there is not enough data in string to create a String of that length)
      • getFromUnicodeLE

        public static String getFromUnicodeLE​(byte[] string)
        Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it.

        { 0x16, 0x00 } -0x16

        Parameters:
        string - the byte array to be converted
        Returns:
        the converted string, never null
      • getToUnicodeLE

        public static byte[] getToUnicodeLE​(String string)
        Convert String to 16-bit unicode characters in little endian format
        Parameters:
        string - the string
        Returns:
        the byte array of 16-bit unicode characters
      • getFromCompressedUnicode

        public static String getFromCompressedUnicode​(byte[] string,
                                                      int offset,
                                                      int len)
        Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return. (In Excel terms, read compressed 8 bit unicode as a string)
        Parameters:
        string - byte array to read
        offset - offset to read byte array
        len - length to read byte array
        Returns:
        String generated String instance by reading byte array (ISO-8859-1)
      • getFromCompressedUTF8

        public static String getFromCompressedUTF8​(byte[] string,
                                                   int offset,
                                                   int len)
        Read 8 bit data (in UTF-8 codepage) into a (unicode) Java String and return. (In Excel terms, read compressed 8 bit unicode as a string)
        Parameters:
        string - byte array to read
        offset - offset to read byte array
        len - length to read byte array
        Returns:
        String generated String instance by reading byte array (UTF-8)
      • readCompressedUnicode

        public static String readCompressedUnicode​(LittleEndianInput in,
                                                   int nChars)
        Parameters:
        in - stream,
        nChars - number pf chars
        Returns:
        ISO_8859_1 encoded result
      • readUnicodeString

        public static String readUnicodeString​(LittleEndianInput in)
        InputStream in is expected to contain:
        1. ushort nChars
        2. byte is16BitFlag
        3. byte[]/char[] characterData
        For this encoding, the is16BitFlag is always present even if nChars==0.

        This structure is also known as a XLUnicodeString.

      • readUnicodeString

        public static String readUnicodeString​(LittleEndianInput in,
                                               int nChars)
        InputStream in is expected to contain:
        1. byte is16BitFlag
        2. byte[]/char[] characterData
        For this encoding, the is16BitFlag is always present even if nChars==0.
        This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise, readUnicodeString(LittleEndianInput) can be used.
      • writeUnicodeString

        public static void writeUnicodeString​(LittleEndianOutput out,
                                              String value)
        OutputStream out will get:
        1. ushort nChars
        2. byte is16BitFlag
        3. byte[]/char[] characterData
        For this encoding, the is16BitFlag is always present even if nChars==0.
      • writeUnicodeStringFlagAndData

        public static void writeUnicodeStringFlagAndData​(LittleEndianOutput out,
                                                         String value)
        OutputStream out will get:
        1. byte is16BitFlag
        2. byte[]/char[] characterData
        For this encoding, the is16BitFlag is always present even if nChars==0.
        This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise, writeUnicodeString(LittleEndianOutput, String) can be used.
      • putCompressedUnicode

        public static void putCompressedUnicode​(String input,
                                                byte[] output,
                                                int offset)
        Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage). (In Excel terms, write compressed 8 bit unicode)
        Parameters:
        input - the String containing the data to be written
        output - the byte array to which the data is to be written
        offset - an offset into the byte arrat at which the data is start when written
      • putUnicodeLE

        public static void putUnicodeLE​(String input,
                                        byte[] output,
                                        int offset)
        Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array. (In Excel terms, write uncompressed unicode)
        Parameters:
        input - the String containing the unicode data to be written
        output - the byte array to hold the uncompressed unicode, should be twice the length of the String
        offset - the offset to start writing into the byte array
      • getPreferredEncoding

        public static String getPreferredEncoding()
        Returns:
        the encoding we want to use, currently hardcoded to ISO-8859-1
      • hasMultibyte

        public static boolean hasMultibyte​(String value)
        check the parameter has multibyte character
        Parameters:
        value - string to check
        Returns:
        boolean result true:string has at least one multibyte character
      • startsWithIgnoreCase

        public static boolean startsWithIgnoreCase​(String haystack,
                                                   String prefix)
        Tests if the string starts with the specified prefix, ignoring case consideration.
      • endsWithIgnoreCase

        public static boolean endsWithIgnoreCase​(String haystack,
                                                 String suffix)
        Tests if the string ends with the specified suffix, ignoring case consideration.
      • isUpperCase

        @Internal
        public static boolean isUpperCase​(char c)
      • countMatches

        public static int countMatches​(CharSequence haystack,
                                       char needle)
        Count number of occurrences of needle in haystack Has same signature as org.apache.commons.lang3.StringUtils#countMatches
        Parameters:
        haystack - the CharSequence to check, may be null
        needle - the character to count the quantity of
        Returns:
        the number of occurrences, 0 if the CharSequence is null
      • getFromUnicodeLE0Terminated

        public static String getFromUnicodeLE0Terminated​(byte[] string,
                                                         int offset,
                                                         int len)
                                                  throws ArrayIndexOutOfBoundsException,
                                                         IllegalArgumentException
        Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it. Scans the byte array for two continous 0 bytes and returns the string before.

        #61881: there seem to be programs out there, which write the 0-termination also at the beginning of the string. Check if the next two bytes contain a valid ascii char and correct the _recdata with a '?' char

        Parameters:
        string - the byte array to be converted
        offset - the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode character
        len - the max. length of the final string
        Returns:
        the converted string, never null.
        Throws:
        ArrayIndexOutOfBoundsException - if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)
        IllegalArgumentException - if len is too large (i.e., there is not enough data in string to create a String of that length)
      • length

        public static int length​(CharSequence cs)
        Gets a CharSequence length or 0 if the CharSequence is null. copied from commons-lang3
        Parameters:
        cs - a CharSequence or null
        Returns:
        CharSequence length or 0 if the CharSequence is null.
      • isBlank

        public static boolean isBlank​(CharSequence cs)

        Checks if a CharSequence is empty (""), null or whitespace only.

        Whitespace is defined by Character.isWhitespace(char).

         StringUtil.isBlank(null)      = true
         StringUtil.isBlank("")        = true
         StringUtil.isBlank(" ")       = true
         StringUtil.isBlank("bob")     = false
         StringUtil.isBlank("  bob  ") = false
         
        copied from commons-lang3
        Parameters:
        cs - the CharSequence to check, may be null
        Returns:
        true if the CharSequence is null, empty or whitespace only
      • isNotBlank

        public static boolean isNotBlank​(CharSequence cs)

        Checks if a CharSequence is not empty (""), not null and not whitespace only.

        Whitespace is defined by Character.isWhitespace(char).

         StringUtil.isNotBlank(null)      = false
         StringUtil.isNotBlank("")        = false
         StringUtil.isNotBlank(" ")       = false
         StringUtil.isNotBlank("bob")     = true
         StringUtil.isNotBlank("  bob  ") = true
         
        copied from commons-lang3
        Parameters:
        cs - the CharSequence to check, may be null
        Returns:
        true if the CharSequence is not empty and not null and not whitespace only
      • repeat

        public static String repeat​(char ch,
                                    int repeat)

        Returns padding using the specified delimiter repeated to a given length.

         StringUtil.repeat('e', 0)  = ""
         StringUtil.repeat('e', 3)  = "eee"
         StringUtil.repeat('e', -2) = ""
         

        Note: this method does not support padding with Unicode Supplementary Characters as they require a pair of chars to be represented.

        copied from commons-lang3
        Parameters:
        ch - character to repeat
        repeat - number of times to repeat char, negative treated as zero
        Returns:
        String with repeated character