At JCrete the last couple of years I’ve had the pleasure to be socializing with, among others, HFT people like Peter Lawrey and Martin Thompson.
Those HFT dudes really makes me thinking when I’m implementing stuff. Thinking about how much garbage I create.
In High Frequency Trading systems you cannot afford garbage collection and therefore they are doing lots of tricks to get around generating garbage.
With the stuff I’m doing a gc pause is generally not critical, so generating garbage is no big deal. Writing easy to read- and maintain code is more important.
But then again, it does not harm to think about garbage.
So, here I was, implementing the good-old escapeXml
function to be used in some internal HTML generating library.
I more or less copied the first attempt from the ‘net:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | /** * XML escape input string * * @param input * @return */ public static String escapeXml_1(String input) { if (input == null || input.isEmpty()) { return input; } StringBuilder sb = null; final char[] charr = input.toCharArray(); for (int i = 0; i < charr.length; i++) { final String repl; final char ch = charr[i]; switch (ch) { case '&': repl = "&"; break; case '<': repl = "<"; break; case '>': repl = ">"; break; case '"': repl = """; break; case '\'': repl = "'"; break; default: repl = null; break; } if (repl != null) { if (sb == null) { sb = new StringBuilder(xmllen(charr)); if (i > 0) { sb.append(charr, 0, i); } } sb.append(repl); } else if (sb != null) { sb.append(ch); } } if (sb != null) { return sb.toString(); } return input; } private static int xmllen(char[] charr) { int len = charr.length; for (int i = 0; i < charr.length; i++) { final char ch = charr[i]; switch (ch) { case '&': len += 4; break; case '<': len += 3; break; case '>': len += 3; break; case '"': len += 5; break; case '\'': len += 5; break; } } return len; } |
Does it generate garbage? Yes, it does, in lines 12, 26 and 37. In line 12 a character array is generated from the input string. In line 26 a StringBuilder
and again in line 37 the StringBuilder
generates a String
object, which again creates a new character-array. Lots of garbage actually.
So, can we reduce some of this garbage, still have maintainable code and not compromise performance too much?
Well, we can get rid of the garbage in line 12 by not doing a toCharArray
and just charAt
into the string.
Here it goes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | /** * XML escape input string * * @param input * @return */ public static String escapeXml_2(String input) { if (input == null || input.isEmpty()) { return input; } StringBuilder sb = null; for (int i = 0; i < input.length(); i++) { final String repl; final char ch = input.charAt(i); switch (ch) { case '&': repl = "&"; break; case '<': repl = "<"; break; case '>': repl = ">"; break; case '"': repl = """; break; case '\'': repl = "'"; break; default: repl = null; break; } if (repl != null) { if (sb == null) { sb = new StringBuilder(xmllen(input)); if (i > 0) { sb.append(input, 0, i); } } sb.append(repl); } else if (sb != null) { sb.append(ch); } } if (sb != null) { return sb.toString(); } return input; } private static int xmllen(String str) { int len = str.length(); for (int i = 0; i < str.length(); i++) { final char ch = str.charAt(i); switch (ch) { case '&': len += 4; break; case '<': len += 3; break; case '>': len += 3; break; case '"': len += 5; break; case '\'': len += 5; break; } } return len; } |
Just a tiny bit less garbage. How about performance? Well, my simple test-harness show that version 2 is about 9% slower than 1, which is not ideal.
So I remembered a session where we discussed sun.misc.Unsafe
and somewhere in the discussion Peter mentioned that the String
has a secret constructor that can be used for generating strings without creating new character arrays. And there was (at least) 3 different ways to get access to that ctor.
I was thinking that I just might go for the simplest one – not trying anything fancy with Unsafe
and stuff. This way I can get rid of the StringBuilder
and the inherent garbage created in lines 26 (25) and 37 (36).
Here it goes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | /** * XML escape input string * * @param input * @return */ public static String escapeXml_3(String input) { if (input == null || input.isEmpty()) { return input; } char[] out = null; int x = 0; final char[] charr = input.toCharArray(); for (int i = 0; i < charr.length; i++) { final String repl; final char ch = charr[i]; switch (ch) { case '&': repl = "&"; break; case '<': repl = "<"; break; case '>': repl = ">"; break; case '"': repl = """; break; case '\'': repl = "'"; break; default: repl = null; break; } if (repl != null) { if (out == null) { out = new char[xmllen(charr)]; if (i > 0) { System.arraycopy(charr, 0, out, 0, i); } x = i; } final int len = repl.length(); repl.getChars(0, len, out, x); x += len; } else if (out != null) { out[x++] = ch; } } if (out != null) { return charArrayToString(out); } return input; } private static final Constructor<String> SECRET_STRING_CTOR; static { Constructor<String> secret = null; try { Constructor<String> c = String.class.getDeclaredConstructor(char[].class, boolean.class); c.setAccessible(true); String s = c.newInstance(new char[]{'c'}, true); if ("c".equals(s)) { secret = c; } } catch (Exception e) { } SECRET_STRING_CTOR = secret; } private static final String charArrayToString(char[] chars) { if (SECRET_STRING_CTOR != null) { try { return SECRET_STRING_CTOR.newInstance(chars, true); } catch (Exception ignored) { } } return new String(chars); } |
In the static initalizer (lines 48+) we try to get access to the package-protected (char[], boolean)
constructor, and if it succeeds, we use that for creating new strings from character arrays in the charArrayToString
function. (Doing this actually might generate a bit of garbage, since the newInstance
might involve generating an Object
array to hold the arguments)
If we do not get access to that ctor for whatever reason, we just go with the normal public (char[])
ctor.
This seemed to work much better, in my test-harness this version 3 is about 25% faster than version 1. Awesome!
Perhaps we could combine the efforts and use the charAt
method and not generate any superflous garbage?
Here it goes:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | /** * XML escape input string * * @param input * @return */ public static String escapeXml_4(String input) { if (input == null || input.isEmpty()) { return input; } char[] out = null; int x = 0; final int ilen = input.length(); for (int i = 0; i < ilen; i++) { final String repl; final char ch = input.charAt(i); switch (ch) { case '&': repl = "&"; break; case '<': repl = "<"; break; case '>': repl = ">"; break; case '"': repl = """; break; case '\'': repl = "'"; break; default: repl = null; break; } if (repl != null) { if (out == null) { out = new char[xmllen(input)]; if (i > 0) { input.getChars(0, i, out, 0); } x = i; } final int len = repl.length(); repl.getChars(0, len, out, x); x += len; } else if (out != null) { out[x++] = ch; } } if (out != null) { return charArrayToString(out); } return input; } |
Nice and simple. Still readable 🙂 And performance is still some 19% better than the first version. This is going into my codebase!
If there are any manager-material-kind-of people reading this: Please do remember to send your employees on conferences…
Happy Easter.
I encourage you to keep writing. It is nice knowing that coding runs through our bloodline. As an Udby in the USA, I have tried to learn coding before but I know my father before me has experienced more success. I wish you the best of luck. While your post is 6 years old, I hope this finds you in good health and success.
Andreas Udby