Java trim is removing only ASCII whitespace characters, but ignores unicode whitespaces. This is backward compatibility
thing, and there is big and detailed explanation of this problem.
It can be easily fixed by using regular expression that will remove all official unicode whitespaces:
But for more extreme cases you may want to use also this pattern
besides whitespaces it will also remove other invisible symbols.
Pattern TRIM_PATTERN = Pattern.compile("^\\s*(.*?)\\s*$", Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher = TRIM_PATTERN.matcher(input);
if (matcher.matches() && matcher.groupCount() > 0) {
return matcher.group(1);
}
return input;
But for more extreme cases you may want to use also this pattern
"^[\\s\\u2060\\u200D\\u200C\\u200B\\u180E\\uFEFF\\u00AD]*(.*?)[\\s\\u2060\\u200D\\u200C\\u200B\\u180E\\uFEFF\\u00AD]*$"
besides whitespaces it will also remove other invisible symbols.