Java trim is removing only ASCII whitespace characters, but ignores unicode whitespaces. This is backward compatibility
thing, and there is big and detailed explanation of this problem
It can be easily fixed by using regular expression that will remove all official unicode whitespaces:
But for more extreme cases you may want to use also this pattern
Pattern TRIM_PATTERN = Pattern.compile("^\\s*(.*?)\\s*$", Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher = TRIM_PATTERN.matcher(input);
if (matcher.matches() && matcher.groupCount() > 0) {
return matcher.group(1);
}
return input;
But for more extreme cases you may want to use also this pattern
"^[\\s\\u2060\\u200D\\u200C\\u200B\\u180E\\uFEFF\\u00AD]*(.*?)[\\s\\u2060\\u200D\\u200C\\u200B\\u180E\\uFEFF\\u00AD]*$"
No comments:
Post a Comment