Simple pattern matching is too trivial, as malicious code can be hidden behind some strange unicode symbols or so (you can check how tricky it is at http://ha.ckers.org/xss.html). Basically, it usually requires parsing HTML and detecting, what can be used and what can't. Fortunately, there is already nice HTML parser built into JDK. It is intended to be used in Swing, but is abstract enough to be used for validation too. What you need to do is to extend
javax.swing.text.html.parser.Parser
, like:import javax.swing.text.html.parser.*
import static javax.swing.text.html.HTML.Tag.*
import static javax.swing.text.html.HTML.Attribute.*
class RteParser extends Parser {
boolean hasErrors = false
public RteParser() {
super(DTD.getDTD('html'));
}
void validateTag(tag) {
...
}
void handleStartTag(TagElement tag) {
validateTag(tag)
this.flushAttributes()
}
void handleEndTag(TagElement tag) {
validateTag(tag)
this.flushAttributes()
}
void handleEmptyTag(TagElement tag) {
validateTag(tag)
this.flushAttributes()
}
public static boolean validate(String value) {
RteParser parser = new RteParser()
StringReader reader = new StringReader("<html>${value}</html>")
parser.parse(reader)
return parser.isValid()
}
public boolean isValid() {
return !hasErrors
}
}
All validation can be done by calling static
validate
method.All magic is done in
validateTag
. This method is specific, this is place where you check all tags and attributes against some black list or validation patterns.
No comments:
Post a Comment