Boxing impact on string concatenation

·

4 min read

Recently I went to a session given by Maarten Balliauw about memory management. In that talk, he mentioned the effect boxing and unboxing has on performance. He also talked about how a lot of strings can affect memory management. This got me thinking on the impact of boxing and unboxing when I format strings. What kind of impact does it have?

Boxing is when a value type is wrapped inside a reference type. An example of this is when an integer gets assigned to an object. How many times would that happen when you format a string? I think it happens a lot. I can't rememer how many times I've written string.Format("explanation: {0}", 0) without giving it much thought. The 0 would be in a variable such as age or percentage. I would just give it to the formatting function and be done with it.

What happens is that the integer value type is boxed into an object reference type so it can be passed to the Format function. To insert the integer into the text, unboxing needs to happen first. This is the process of converting the object back to its original value type. This takes quite a bit of processing time. When the unboxing is done, the value can be represented as a string and inserted into the correct position in the text.

If I call the ToString method myself, I convert the value type to a reference type (yes, string is a reference type). A reference type can easily be assigned to another reference type if it's in the hierarchy of the most specific type. Since every reference type in C# inherits from object, this can happen very efficiently. Thus saving the processor a lot of work.

Now, I want to know the difference a ToString() call would make on the performance of a string format with a value type. So I set up a number of tests that all follow the same pattern. Per string concatenation, I have two tests that do a single string format: one with a call to ToString() and one without. Then I have two similar tests but they repeat the operation a million times.

[Fact]
public void Single_Append_without_ToString()
{
    var i = 1;
    var format = new StringBuilder("Number: ").Append(i.ToString()).ToString();
}
[Fact]
public void Million_Append_without_ToString()
{
    for (var i = 0; i < 1_000_000; i++)
    {
        var format = new StringBuilder("Number: ").Append(i.ToString()).ToString();
    }
}

In the tests that box the value, I remove the call to ``ToString```, so I won't bore you with that code. If you do want to check out my full setup, head over to my GitHub repo to see all code. I even put all the test results in a convenient Excel spreadsheet. Mainly so I could easily compare the results myself.

The string concatenations I tested are the static Format method, interpolation, the Append function of StringBuilder and a good old-fashioned string summation ("append " + "text").

Before I delve into the results, I have to display them as ticks.aspx). When I printed out the elapsed milliseconds, I didn't have enough granularity so some tests both showed 1ms had elapsed and I couldn't compare the two properly.

So, on to the results. The first thing that popped out is that adding the ToString method does make an impact. In the single concatenation, the biggest difference goes to the Format concatenation. With ToString it takes 1400 ticks and with boxing (not using ToString) it takes a whopping 215 000 ticks. That is over 15 000% increase in time. Let that sink in for a minute.

The other single concatenations aren't that big apart, but the effect is noticeable. For example, the next biggest difference is interpolation where the boxing takes almost 1600 ticks and using ToString takes about 800 ticks. Boxing doubles the time it takes. The StringBuilder difference is about 80 ticks apart and the summation a good 10 ticks. In every situation, the boxing loses to the manual ToString.

That does not mean the difference when iterating a million times cannot have an impact. Again, all the methods that use ToString are faster than their boxing counterpart. The slowest ToString method is the interpolation, which takes almost 2 352 000 ticks. The fastest boxing method is the StringBuilder which takes approximately 2 418 000 ticks.

The fastest processing a million is the summation where ToString is used. That takes just over 1 370 000 ticks to complete. Second place goes to the StringBuilder with just under 2 000 000 ticks.

Another remarkable change is between the fastest single concatenation that uses ToString, which is the interpolation method. But when iterating over a million times, it becomes the slowest concatenation method, loosing to all other concatenation methods that use ToString.

After performing these tests, I was quite surprised how much a performance impact boxing has on both single and batch processing. This is something I (and you too) should keep in mind when writing software. Especially since string.Format() is so widely used and I rarely see parameters being cast to string manually. Resharper even discourages the use of ToString in concatenation. For reference types, I don't think there is a lot of speed to gain, but value types can improve performance drastically. I thought the compiler could and would optimise a lot of this, but unfortunately, it's not that smart yet.