Class StringUtil
- java.lang.Object
-
- org.apache.poi.util.StringUtil
-
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static int
countMatches(CharSequence haystack, char needle)
Count number of occurrences of needle in haystack Has same signature as org.apache.commons.lang3.StringUtils#countMatchesstatic boolean
endsWithIgnoreCase(String haystack, String suffix)
Tests if the string ends with the specified suffix, ignoring case consideration.static int
getEncodedSize(String value)
static String
getFromCompressedUnicode(byte[] string, int offset, int len)
Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return.static String
getFromCompressedUTF8(byte[] string, int offset, int len)
Read 8 bit data (in UTF-8 codepage) into a (unicode) Java String and return.static String
getFromUnicodeLE(byte[] string)
Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it.static String
getFromUnicodeLE(byte[] string, int offset, int len)
Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it.static String
getFromUnicodeLE0Terminated(byte[] string, int offset, int len)
Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it.static int
getMaxRecordLength()
static String
getPreferredEncoding()
static byte[]
getToUnicodeLE(String string)
Convert String to 16-bit unicode characters in little endian formatstatic boolean
hasMultibyte(String value)
check the parameter has multibyte characterstatic boolean
isBlank(CharSequence cs)
Checks if a CharSequence is empty (""), null or whitespace only.static boolean
isNotBlank(CharSequence cs)
Checks if a CharSequence is not empty (""), not null and not whitespace only.static boolean
isUpperCase(char c)
static String
join(Object[] array)
static String
join(Object[] array, String separator)
static String
join(String separator, Object... array)
static int
length(CharSequence cs)
Gets a CharSequence length or0
if the CharSequence isnull
.static String
mapMsCodepointString(String string)
Some strings may contain encoded characters of the unicode private use area.static void
putCompressedUnicode(String input, byte[] output, int offset)
Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage).static void
putCompressedUnicode(String input, LittleEndianOutput out)
static void
putUnicodeLE(String input, byte[] output, int offset)
Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array.static void
putUnicodeLE(String input, LittleEndianOutput out)
static String
readCompressedUnicode(LittleEndianInput in, int nChars)
static String
readUnicodeLE(LittleEndianInput in, int nChars)
static String
readUnicodeString(LittleEndianInput in)
InputStreamin
is expected to contain: ushort nChars byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.static String
readUnicodeString(LittleEndianInput in, int nChars)
InputStreamin
is expected to contain: byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.static String
repeat(char ch, int repeat)
Returns padding using the specified delimiter repeated to a given length.static void
setMaxRecordLength(int length)
static boolean
startsWithIgnoreCase(String haystack, String prefix)
Tests if the string starts with the specified prefix, ignoring case consideration.static String
toLowerCase(char c)
static String
toUpperCase(char c)
static void
writeUnicodeString(LittleEndianOutput out, String value)
OutputStreamout
will get: ushort nChars byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.static void
writeUnicodeStringFlagAndData(LittleEndianOutput out, String value)
OutputStreamout
will get: byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.
-
-
-
Method Detail
-
setMaxRecordLength
public static void setMaxRecordLength(int length)
- Parameters:
length
- the max record length allowed for StringUtil
-
getMaxRecordLength
public static int getMaxRecordLength()
- Returns:
- the max record length allowed for StringUtil
-
getFromUnicodeLE
public static String getFromUnicodeLE(byte[] string, int offset, int len) throws ArrayIndexOutOfBoundsException, IllegalArgumentException
Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it.{ 0x16, 0x00 } -0x16
- Parameters:
string
- the byte array to be convertedoffset
- the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode characterlen
- the length of the final string- Returns:
- the converted string, never
null
. - Throws:
ArrayIndexOutOfBoundsException
- if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)IllegalArgumentException
- if len is too large (i.e., there is not enough data in string to create a String of that length)
-
getFromUnicodeLE
public static String getFromUnicodeLE(byte[] string)
Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it.{ 0x16, 0x00 } -0x16
- Parameters:
string
- the byte array to be converted- Returns:
- the converted string, never
null
-
getToUnicodeLE
public static byte[] getToUnicodeLE(String string)
Convert String to 16-bit unicode characters in little endian format- Parameters:
string
- the string- Returns:
- the byte array of 16-bit unicode characters
-
getFromCompressedUnicode
public static String getFromCompressedUnicode(byte[] string, int offset, int len)
Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return. (In Excel terms, read compressed 8 bit unicode as a string)- Parameters:
string
- byte array to readoffset
- offset to read byte arraylen
- length to read byte array- Returns:
- String generated String instance by reading byte array (ISO-8859-1)
-
getFromCompressedUTF8
public static String getFromCompressedUTF8(byte[] string, int offset, int len)
Read 8 bit data (in UTF-8 codepage) into a (unicode) Java String and return. (In Excel terms, read compressed 8 bit unicode as a string)- Parameters:
string
- byte array to readoffset
- offset to read byte arraylen
- length to read byte array- Returns:
- String generated String instance by reading byte array (UTF-8)
-
readCompressedUnicode
public static String readCompressedUnicode(LittleEndianInput in, int nChars)
- Parameters:
in
- stream,nChars
- number pf chars- Returns:
- ISO_8859_1 encoded result
-
readUnicodeString
public static String readUnicodeString(LittleEndianInput in)
InputStreamin
is expected to contain:- ushort nChars
- byte is16BitFlag
- byte[]/char[] characterData
This structure is also known as a XLUnicodeString.
-
readUnicodeString
public static String readUnicodeString(LittleEndianInput in, int nChars)
InputStreamin
is expected to contain:- byte is16BitFlag
- byte[]/char[] characterData
This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise,readUnicodeString(LittleEndianInput)
can be used.
-
writeUnicodeString
public static void writeUnicodeString(LittleEndianOutput out, String value)
OutputStreamout
will get:- ushort nChars
- byte is16BitFlag
- byte[]/char[] characterData
-
writeUnicodeStringFlagAndData
public static void writeUnicodeStringFlagAndData(LittleEndianOutput out, String value)
OutputStreamout
will get:- byte is16BitFlag
- byte[]/char[] characterData
This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise,writeUnicodeString(LittleEndianOutput, String)
can be used.
-
getEncodedSize
public static int getEncodedSize(String value)
- Returns:
- the number of bytes that would be written by
writeUnicodeString(LittleEndianOutput, String)
-
putCompressedUnicode
public static void putCompressedUnicode(String input, byte[] output, int offset)
Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage). (In Excel terms, write compressed 8 bit unicode)- Parameters:
input
- the String containing the data to be writtenoutput
- the byte array to which the data is to be writtenoffset
- an offset into the byte arrat at which the data is start when written
-
putCompressedUnicode
public static void putCompressedUnicode(String input, LittleEndianOutput out)
-
putUnicodeLE
public static void putUnicodeLE(String input, byte[] output, int offset)
Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array. (In Excel terms, write uncompressed unicode)- Parameters:
input
- the String containing the unicode data to be writtenoutput
- the byte array to hold the uncompressed unicode, should be twice the length of the Stringoffset
- the offset to start writing into the byte array
-
putUnicodeLE
public static void putUnicodeLE(String input, LittleEndianOutput out)
-
readUnicodeLE
public static String readUnicodeLE(LittleEndianInput in, int nChars)
-
getPreferredEncoding
public static String getPreferredEncoding()
- Returns:
- the encoding we want to use, currently hardcoded to ISO-8859-1
-
hasMultibyte
public static boolean hasMultibyte(String value)
check the parameter has multibyte character- Parameters:
value
- string to check- Returns:
- boolean result true:string has at least one multibyte character
-
startsWithIgnoreCase
public static boolean startsWithIgnoreCase(String haystack, String prefix)
Tests if the string starts with the specified prefix, ignoring case consideration.
-
endsWithIgnoreCase
public static boolean endsWithIgnoreCase(String haystack, String suffix)
Tests if the string ends with the specified suffix, ignoring case consideration.
-
isUpperCase
@Internal public static boolean isUpperCase(char c)
-
mapMsCodepointString
public static String mapMsCodepointString(String string)
Some strings may contain encoded characters of the unicode private use area. Currently the characters of the symbol fonts are mapped to the corresponding characters in the normal unicode range.- Parameters:
string
- the original string- Returns:
- the string with mapped characters
- See Also:
- Private Use Area (symbol), Symbol font - Unicode alternatives for Greek and special characters in HTML
-
countMatches
public static int countMatches(CharSequence haystack, char needle)
Count number of occurrences of needle in haystack Has same signature as org.apache.commons.lang3.StringUtils#countMatches- Parameters:
haystack
- the CharSequence to check, may be nullneedle
- the character to count the quantity of- Returns:
- the number of occurrences, 0 if the CharSequence is null
-
getFromUnicodeLE0Terminated
public static String getFromUnicodeLE0Terminated(byte[] string, int offset, int len) throws ArrayIndexOutOfBoundsException, IllegalArgumentException
Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it. Scans the byte array for two continous 0 bytes and returns the string before.#61881: there seem to be programs out there, which write the 0-termination also at the beginning of the string. Check if the next two bytes contain a valid ascii char and correct the _recdata with a '?' char
- Parameters:
string
- the byte array to be convertedoffset
- the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode characterlen
- the max. length of the final string- Returns:
- the converted string, never
null
. - Throws:
ArrayIndexOutOfBoundsException
- if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)IllegalArgumentException
- if len is too large (i.e., there is not enough data in string to create a String of that length)
-
length
public static int length(CharSequence cs)
Gets a CharSequence length or0
if the CharSequence isnull
. copied from commons-lang3- Parameters:
cs
- a CharSequence ornull
- Returns:
- CharSequence length or
0
if the CharSequence isnull
.
-
isBlank
public static boolean isBlank(CharSequence cs)
Checks if a CharSequence is empty (""), null or whitespace only.
Whitespace is defined by
Character.isWhitespace(char)
.StringUtil.isBlank(null) = true StringUtil.isBlank("") = true StringUtil.isBlank(" ") = true StringUtil.isBlank("bob") = false StringUtil.isBlank(" bob ") = false
copied from commons-lang3- Parameters:
cs
- the CharSequence to check, may be null- Returns:
true
if the CharSequence is null, empty or whitespace only
-
isNotBlank
public static boolean isNotBlank(CharSequence cs)
Checks if a CharSequence is not empty (""), not null and not whitespace only.
Whitespace is defined by
Character.isWhitespace(char)
.StringUtil.isNotBlank(null) = false StringUtil.isNotBlank("") = false StringUtil.isNotBlank(" ") = false StringUtil.isNotBlank("bob") = true StringUtil.isNotBlank(" bob ") = true
copied from commons-lang3- Parameters:
cs
- the CharSequence to check, may be null- Returns:
true
if the CharSequence is not empty and not null and not whitespace only
-
repeat
public static String repeat(char ch, int repeat)
Returns padding using the specified delimiter repeated to a given length.
StringUtil.repeat('e', 0) = "" StringUtil.repeat('e', 3) = "eee" StringUtil.repeat('e', -2) = ""
Note: this method does not support padding with Unicode Supplementary Characters as they require a pair of
copied from commons-lang3char
s to be represented.- Parameters:
ch
- character to repeatrepeat
- number of times to repeat char, negative treated as zero- Returns:
- String with repeated character
-
-