Saturday, May 7, 2016

Trim is not removing all whitespaces in Java

Java trim is removing only ASCII whitespace characters, but ignores unicode whitespaces. This is backward compatibility thing, and there is big and detailed explanation of this problem It can be easily fixed by using regular expression that will remove all official unicode whitespaces:


Pattern TRIM_PATTERN = Pattern.compile("^\\s*(.*?)\\s*$", Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher = TRIM_PATTERN.matcher(input);
if (matcher.matches() && matcher.groupCount() > 0) {
    return matcher.group(1);
}
return input;

But for more extreme cases you may want to use also this pattern

"^[\\s\\u2060\\u200D\\u200C\\u200B\\u180E\\uFEFF\\u00AD]*(.*?)[\\s\\u2060\\u200D\\u200C\\u200B\\u180E\\uFEFF\\u00AD]*$"