Who Loves Interns?

The topic at hand is interning. More specifically, string interning.

“What is string interning?” you ask? Good question. As you may or may not know, strings are immutable reference types. This means that they are read-only and a pointer will refer to the string’s location on the heap. Typically, a new string is created and stored within your application’s memory each time that you assign a string – even if the same string is defined repeatedly. What this means is that you can define the same string N times and have it take up the string’s memory N times. This sucks when dealing with repeating string data.

Enter string interning. String interning is the process by which you defer your string reference to the .NET runtime’s string intern pool, thereby conserving memory as N identical strings point to the same single reference. From MSDN:

“The common language runtime conserves string storage by maintaining a table, called the intern pool, that contains a single reference to each unique literal string declared or created programmatically in your program. Consequently, an instance of a literal string with a particular value only exists once in the system. For example, if you assign the same literal string to several variables, the runtime retrieves the same reference to the literal string from the intern pool and assigns it to each variable. The Intern method uses the intern pool to search for a string equal to the value of str. If such a string exists, its reference in the intern pool is returned. If the string does not exist, a reference to str is added to the intern pool, then that reference is returned.”

Up until this point, you may not have been aware of this pool or the string interning behaviour at all. This is understandable because the .NET compiler does a bloody good job of interning strings for you. In fact, any literal string will be automatically interned by the compiler. Almost any late-bound (at runtime) string, however, is not. A quick code example – let us define a program that simply creates 10,000,000 of the same string and assigns it to a Dictionary:

class Program
{
    // The Dictionary that will be used to store our strings at an int 
    // (so that they are not de-referenced and GC'd)
    private static readonly IDictionary<int, string> _dict = new Dictionary<int, string>();

    static void Main(string[] args)
    {
        Console.WriteLine("Storing a non-constant string 10000000 times");

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        // Loop 10,000,000 times
        for (int i = 0; i < 10000000; i++)
        {
            // Define the same string repeatedly
            string blah = "Hello! This Is A String!";
            // Add it to the Dictionary at the index of i
            _dict.Add(i, blah);
        }

        stopwatch.Stop();

        Console.WriteLine("Memory Used: " + Process.GetCurrentProcess().PagedMemorySize64.ToString("n") + " bytes");
        Console.WriteLine("Elapsed milliseconds: " + stopwatch.ElapsedMilliseconds);

        Console.WriteLine("Press any key");
        Console.ReadKey();
    }
}

Running this program results in the following output:

/img/repeated-literal-string.png

A Repeated Literal String

It uses a relatively small amount of memory (269 megs) and takes very little time, since the compiler detects that the string which we’re creating is a literal and thus interns it for us automatically. Now, let us make a slight change to the application by creating a late-bound string which won’t be interned automatically:

class Program
{
    // The Dictionary that will be used to store our strings at an int 
    // (so that they are not de-referenced and GC'd)
    private static readonly IDictionary<int, string> _dict = new Dictionary<int, string>();

    static void Main(string[] args)
    {
        // Define a string which will be concatenated with another string later
        string dynamicString = "Some Other Large String";

        Console.WriteLine("Storing a non-constant string 10000000 times");

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        // Loop 10,000,000 times
        for (int i = 0; i < 10000000; i++)
        {
            // Define the same string repeatedly
            string blah = "Hello! This Is A String!" + dynamicString;
            // Add it to the Dictionary at the index of i
            _dict.Add(i, blah);
        }

        stopwatch.Stop();

        Console.WriteLine("Memory Used: " + Process.GetCurrentProcess().PagedMemorySize64.ToString("n") + " bytes");
        Console.WriteLine("Elapsed milliseconds: " + stopwatch.ElapsedMilliseconds);

        Console.WriteLine("Press any key");
        Console.ReadKey();
    }
}

Running THIS code yields crappier results:

/img/repeated-late-bound-string.png

Repeated Late Bound String

Note that we use nearly five times as much memory (1.379 gigs) as we did before! And that our application got considerably slower! Sadly, we can’t do much about the slower part because concatenating strings takes time and effort. However, we can intern the string to return our memory usage back to something realistic while adding minimal computational cost. We do this with the string.Intern(string) method:

class Program
{
    // The Dictionary that will be used to store our strings at an int 
    // (so that they are not de-referenced and GC'd)
    private static readonly IDictionary<int, string> _dict = new Dictionary<int, string>();

    static void Main(string[] args)
    {
        // Define a string which will be concatenated with another string later
        string dynamicString = "Some Other Large String";

        Console.WriteLine("Storing a non-constant string 10000000 times");

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        // Loop 10,000,000 times
        for (int i = 0; i < 10000000; i++)
        {
            // Define the same string repeatedly
            string blah = "Hello! This Is A String!" + dynamicString;
            // Add it to the Dictionary at the index of i
            // Intern string "blah" to save memory!
            _dict.Add(i, string.Intern(blah));
        }

        stopwatch.Stop();

        Console.WriteLine("Memory Used: " + Process.GetCurrentProcess()
		    .PagedMemorySize64.ToString("n") + " bytes");
        Console.WriteLine("Elapsed milliseconds: " + stopwatch.ElapsedMilliseconds);

        Console.WriteLine("Press any key");
        Console.ReadKey();
    }
}

And the results:

/img/repeated-late-bound-string-interned.png

Repeated Late Bound String (Interned)

Note that by interning the late-bound string, we reduced memory usage considerably (272 megs, with only an additional 4 megs used for pointers) while adding only a minimal amount of additional computation (600 milliseconds)… 80% less memory used for the cost of an additional 20% computation is generally a good trade.

Now the icing on the cake. Compile and run your code in Release mode to allow for even more compiler optimizations:

/img/repeated-late-bound-string-interned-release-mode.png

Repeated Late Bound String (Interned) - Release Mode

Even less memory and even less interning costs! It’s win-win and just a friendly reminder to always release your code in Release mode. 🙂

So, who loves interns? You do. Remember: when you must use the same string repeatedly in an application (perhaps when storing a huge collection of User objects with a FirstName property, where many of your users are “David”, “John”, “Tim”, etc.), intern the string. Why create N copies of the exact same object, all using up your precious managed memory?


See also