String hackery, or how to initialize your own strings in a single allocation.

I just finished a fun couple of days trying to solve a puzzle that I ran across at work.

We're doing a huge amount of text processing in an application I work on, and the memory churn and string copies were killing performance. The problem boils down to this:

In a single batch of processing, we collect maybe 500k strings objects from a bunch of content production methods and then concatenate those strings into a single string to send off for further processing.

I did some digging and found out that we were copying the string data an extra 2 or 3 times before it finally ended up in our final concatenated string. Since the sum total of the data was 200-500 MB worth of character data, we were shuffling around an extra 400-1000 MB of ram that we didn't need to.

Well, I found a way to take a provided collection of strings and concatenate those into a single string by performing exactly one allocation and one copy pass over the strings. I didn't know it was even possible until I had some help from some of the folks that responded to my stackoverflow post.

So I thought you guys might be interested in seeing how to do it.

The code only compiles under .Net 4.6 because of the use of the new Buffer.MemoryCopy method; you can replace that with a loop (which would be slow..) or by other tricks to access Buffer.memcpyif you want.

But here's the code:

public static unsafe class FastConcat { public static string Concat( IList<string> list ) { string destinationString; int destLengthChars = 0; for( int i = 0; i < list.Count; i++ ) { destLengthChars += list[i].Length; } destinationString = new string( '\0', destLengthChars ); unsafe { fixed( char* origDestPtr = destinationString ) { char* destPtr = origDestPtr; // a pointer we can modify. string source; for( int i = 0; i < list.Count; i++ ) { source = list[i]; fixed( char* sourcePtr = source ) { Buffer.MemoryCopy( sourcePtr, destPtr, long.MaxValue, // yep, I lie so I don't have keep track of the math source.Length * sizeof( char ) ); } destPtr += source.Length; } } } return destinationString; } } 

If you're curious, here's the stackoverflow post that I've been working on over the last few days trying to come up with this: http://ift.tt/1hIGmG9

And my answer to the post with the solution and some analysis: http://ift.tt/1hIGkOO

by antiduh via /r/csharp

Leave a Reply