Saturday, July 9, 2016

URL canonicalization and normalization in Java

Recently I had to implement integration with Google Safe Browsing in Java and one part of the task is URL normalisation, basically it is like JSoup for URL. You should remove redundant parts, decode, encode, etc. Seems trivial: even java.net.URI has normalisation, but it really was not trivial, nothing was working and result was not even remotely compliant.

After searching and trying everything suggested on Stackoverflow, I finally found working solution - URL-Detector from Linkedin. Lib itself looks raw and it is not even in public Maven as of now, but it successfully passes all Google tests after replacing port and using URL without fragment.

1 comment:

  1. Play Real Money No Deposit Casino Bonuses and Free Spins
    Real Money No Deposit Bonuses 2021: Latest Real Money Free Spins and No Deposit 메리트 카지노 고객센터 Bonuses 12bet for US Players. Take advantage of our No Deposit Free Spins 인카지노

    ReplyDelete