Dmitrijs Artjomenko blog

Saturday, August 20, 2016

Converting byte array to string without loosing data

Recently I had to deal with poor Redis interface, that didn't allowed to store bytes and it was not possible to update it or change. Redis handles bytes perfectly for keys and data, but there was no way to pass it. Standard solution for this task is to use Base64, but for my case overhead was too big as I had to store hundreds of millions of records. UTF is not good too, as it is ruining data. Finally, I found that right encoding for this task is to use ISO-8859-1 encoding and it works just fine.


new String([10,20,-30,126] as byte[], "ISO-8859-1").getBytes("ISO-8859-1");

Saturday, July 9, 2016

URL canonicalization and normalization in Java

Recently I had to implement integration with Google Safe Browsing in Java and one part of the task is URL normalisation, basically it is like JSoup for URL. You should remove redundant parts, decode, encode, etc. Seems trivial: even java.net.URI has normalisation, but it really was not trivial, nothing was working and result was not even remotely compliant.

After searching and trying everything suggested on Stackoverflow, I finally found working solution - URL-Detector from Linkedin. Lib itself looks raw and it is not even in public Maven as of now, but it successfully passes all Google tests after replacing port and using URL without fragment.

Saturday, June 25, 2016

Quickly create groovy script in Intellij IDEA

It is sometimes frustrating when you need quickly check db, non-trivial http, or AWS with Groovy. It is fine and great if standard lib is enough for your case, but once you need new libraries, it is not that quick. Grab is amazing, but you need to specify dependency ids and recall exact API and settings. At some point I had a bunch of typical scripts with typical settings, but you have to be in same project to find them, remember not to commit them, and they tend to grow big and have their own lives at some point. Bad for hacking and it is not convenient for majority of one-off cases.

That is where I started using IntelliJ Live Templates: they are available in every project, easily configurable, you can dispose or save results as needed, they accessible from regular files, scratches, even Groovy console.

My typical template consists of everything needed for single use case: grabs, imports, initialisation and bunch of examples. It is always easier to slash unneeded stuff than recall all API specifics. For me, it is more like tldr output, than what typical default template looks like.

There is example of one template I use for AWS S3:


@Grab('com.amazonaws:aws-java-sdk:1.9.31')
import com.amazonaws.auth.BasicAWSCredentials
import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.ObjectMetadata

def client = new AmazonS3Client(new BasicAWSCredentials('access', 'secret'))

println client.listObjects("bucket", "name").getObjectSummaries().collect{it.key}

def meta = new ObjectMetadata()
meta.setContentType("text/plain")
client.putObject("bucket", "name", new ByteArrayInputStream("1".bytes), meta)

client.deleteObject("bucket", "name")

println new String(client.getObject("bucket", "name").objectContent.bytes)

client.listVersions("bucket", "name").versionSummaries.each {
    println "${it.deleteMarker}   ${it.key}   ${it.size}"
}

client.deleteVersion("bucket", "name", "123")

And of course, you can have multiple templates preconfigured for different environments and use cases.

Biggest disadvantages comparing with regular files is that they are stored somewhere deeply in IntelliJ and not easily transferable, still they are easily exportable with other settings. Other disadvantage is that content is not searchable, so you have to pack all keywords as abbreviation or description, but anyway it is not part of universal search.

Saturday, May 7, 2016

Trim is not removing all whitespaces in Java

Java trim is removing only ASCII whitespace characters, but ignores unicode whitespaces. This is backward compatibility thing, and there is big and detailed explanation of this problem It can be easily fixed by using regular expression that will remove all official unicode whitespaces:


Pattern TRIM_PATTERN = Pattern.compile("^\\s*(.*?)\\s*$", Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher = TRIM_PATTERN.matcher(input);
if (matcher.matches() && matcher.groupCount() > 0) {
    return matcher.group(1);
}
return input;

But for more extreme cases you may want to use also this pattern

"^[\\s\\u2060\\u200D\\u200C\\u200B\\u180E\\uFEFF\\u00AD]*(.*?)[\\s\\u2060\\u200D\\u200C\\u200B\\u180E\\uFEFF\\u00AD]*$"

Saturday, April 9, 2016

SSH key forwarding is not working on Mac OS

If you need to use ssh key forwarding (in git, for example) from Mac - it might not work. This is because Mac is not loading SSH keys automatically.
It can be easily done manually by calling command:

ssh-add

There is good tutorial about configuration and troubleshooting at https://developer.github.com/guides/using-ssh-agent-forwarding/

Saturday, March 5, 2016

XML parsing in Java

JSON is much popular now, but occasionally it is still possible to come across XML API. I had such experience recently and I have to say that in 2016 it is much easier than some 10 years ago. I mostly use Jackson for JSON, so for me best way to use it is XmlMapper plugin. After that it is plain Jackson.
There is example:


XmlMapper mapper = new XmlMapper();
List rates = mapper.readValue(ratesString, List.class);


compile 'com.fasterxml.jackson.dataformat:jackson-dataformat-xml:2.6.3'
compile 'org.codehaus.woodstox:woodstox-core-asl:4.4.1'

Friday, February 12, 2016

Using property files with map values in Spring configuration beans

Spring and Spring Boot can map property files as configuration beans automatically. What is less known is that it can easily wire Map objects too:


@Configuration
@Component
@PropertySource("classpath:config.properties")
public class Config {
    @Value("#{${my.map}}")
    private Map map;
...


my.map={\
'key1' : 'val1', \
'key2' : 'val2', \
...