Archive | May 2012

Make Your Debugging Life Easier

Sorry for the delay in posts, May has been a very busy month.

In order to accurately debug or profile an external assembly or library (AKA one you’re not directly compiling), you need the associated PDB files to accompany each of the DLLs. These files give the debugger some information about the compiled assembly so that your debugger or profiler can become aware of function names, line numbers, and other related meta data.

One thing that sucks is debugging and profiling native Microsoft .NET assemblies. When debugging an exception where you have no line number, or performance profiling an application and not knowing what method is being referred to, it’s easy to become very frustrated, very quickly. The latter scenario happened to me just this week. I was performance profiling my application at work and found that a large portion of the application’s time (~43%) was spent in a method called “clr.dll” (displayed here as memory addresses):

Performance Profiling - No Symbols

Performance Profiling - No Symbols

This was not exactly a useful indication of what REALLY happened. What am I supposed to do with the knowledge that over 40% of my application’s time is spent in a Microsoft assembly called “clr.dll”? Not much, which is a little concerning. I needed to know what was really happening!

Fortunately, there’s a solution for this very issue. A little known feature implemented in Visual Studio 2010 is the ability to connect to Microsoft’s Symbol Servers and obtain most of the debugging symbols for their assemblies and libraries!

Just go to Tools –> Options –> (expand) Debugging –> Symbols

Here, select/check the “Microsoft Symbol Servers” as a source for Symbols. Now, getting the symbols from Microsoft every time you debug or profile is going to be slow and painful (and it’ll even give you a pop-up saying as much once you check the Microsoft Symbol Servers), so be sure that you specify a directory for the “Cache symbols in this directory” input – it will keep a local copy of the PDBs and simply check for updates every so often. As a result, you get your regular debugging/profiling AND you can see the function names of the Microsoft assemblies!

Using this feature, I was able to re-evaluate my latest performance tests and see that the supposed “clr.dll” method was actually “TransparentProxyStub_CrossContext” – a method buried deep within the WCF framework:

Performance Profiling - Symbols

Performance Profiling - Symbols

A little Google-Fu and a discussion with a co-worker who is well-versed in WCF told me that my application was actually spending its time waiting for a reply from a WCF request. Since this was expected behaviour (the application calls out to a service for every request), it put my performance profiling mind at ease.

Take advantage of Microsoft’s PDBs, especially when the price is right – free. You’d be amazed how useful they are in your day-to-day debugging and profiling.

Who Loves Interns?

The topic at hand is interning. More specifically, string interning.

“What is string interning?” you ask? Good question. As you may or may not know, strings are immutable reference types. This means that they are read-only and a pointer will refer to the string’s location on the heap. Typically, a new string is created and stored within your application’s memory each time that you assign a string – even if the same string is defined repeatedly. What this means is that you can define the same string N times and have it take up the string’s memory N times. This sucks when dealing with repeating string data.

Enter string interning. String interning is the process by which you defer your string reference to the .NET runtime’s string intern pool, thereby conserving memory as N identical strings point to the same single reference. From MSDN):

“The common language runtime conserves string storage by maintaining a table, called the intern pool, that contains a single reference to each unique literal string declared or created programmatically in your program. Consequently, an instance of a literal string with a particular value only exists once in the system. For example, if you assign the same literal string to several variables, the runtime retrieves the same reference to the literal string from the intern pool and assigns it to each variable. The Intern method uses the intern pool to search for a string equal to the value of str. If such a string exists, its reference in the intern pool is returned. If the string does not exist, a reference to str is added to the intern pool, then that reference is returned.”

Up until this point, you may not have been aware of this pool or the string interning behaviour at all. This is understandable because the .NET compiler does a bloody good job of interning strings for you. In fact, any literal string will be automatically interned by the compiler. Almost any late-bound (at runtime) string, however, is not. A quick code example – let us define a program that simply creates 10,000,000 of the same string and assigns it to a Dictionary:

class Program
{
    // The Dictionary that will be used to store our strings at an int 
    // (so that they are not de-referenced and GC'd)
    private static readonly IDictionary<int, string> _dict = new Dictionary<int, string>();

    static void Main(string[] args)
    {
        Console.WriteLine("Storing a non-constant string 10000000 times");

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        // Loop 10,000,000 times
        for (int i = 0; i < 10000000; i++)
        {
            // Define the same string repeatedly
            string blah = "Hello! This Is A String!";
            // Add it to the Dictionary at the index of i
            _dict.Add(i, blah);
        }

        stopwatch.Stop();

        Console.WriteLine("Memory Used: " + Process.GetCurrentProcess().PagedMemorySize64.ToString("n") + " bytes");
        Console.WriteLine("Elapsed milliseconds: " + stopwatch.ElapsedMilliseconds);

        Console.WriteLine("Press any key");
        Console.ReadKey();
    }
}

Running this program results in the following output:

A Repeated Literal String

A Repeated Literal String

It uses a relatively small amount of memory (269 megs) and takes very little time, since the compiler detects that the string which we’re creating is a literal and thus interns it for us automatically. Now, let us make a slight change to the application by creating a late-bound string which won’t be interned automatically:

class Program
{
    // The Dictionary that will be used to store our strings at an int 
    // (so that they are not de-referenced and GC'd)
    private static readonly IDictionary<int, string> _dict = new Dictionary<int, string>();

    static void Main(string[] args)
    {
        // Define a string which will be concatenated with another string later
        string dynamicString = "Some Other Large String";

        Console.WriteLine("Storing a non-constant string 10000000 times");

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        // Loop 10,000,000 times
        for (int i = 0; i < 10000000; i++)
        {
            // Define the same string repeatedly
            string blah = "Hello! This Is A String!" + dynamicString;
            // Add it to the Dictionary at the index of i
            _dict.Add(i, blah);
        }

        stopwatch.Stop();

        Console.WriteLine("Memory Used: " + Process.GetCurrentProcess().PagedMemorySize64.ToString("n") + " bytes");
        Console.WriteLine("Elapsed milliseconds: " + stopwatch.ElapsedMilliseconds);

        Console.WriteLine("Press any key");
        Console.ReadKey();
    }
}

Running THIS code yields crappier results:

Repeated Late Bound String

Repeated Late Bound String

Note that we use nearly five times as much memory (1.379 gigs) as we did before! And that our application got considerably slower! Sadly, we can’t do much about the slower part because concatenating strings takes time and effort. However, we can intern the string to return our memory usage back to something realistic while adding minimal computational cost. We do this with the string.Intern(string) method:

class Program
{
    // The Dictionary that will be used to store our strings at an int 
    // (so that they are not de-referenced and GC'd)
    private static readonly IDictionary<int, string> _dict = new Dictionary<int, string>();

    static void Main(string[] args)
    {
        // Define a string which will be concatenated with another string later
        string dynamicString = "Some Other Large String";

        Console.WriteLine("Storing a non-constant string 10000000 times");

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        // Loop 10,000,000 times
        for (int i = 0; i < 10000000; i++)
        {
            // Define the same string repeatedly
            string blah = "Hello! This Is A String!" + dynamicString;
            // Add it to the Dictionary at the index of i
            // Intern string "blah" to save memory!
            _dict.Add(i, string.Intern(blah));
        }

        stopwatch.Stop();

        Console.WriteLine("Memory Used: " + Process.GetCurrentProcess().PagedMemorySize64.ToString("n") + " bytes");
        Console.WriteLine("Elapsed milliseconds: " + stopwatch.ElapsedMilliseconds);

        Console.WriteLine("Press any key");
        Console.ReadKey();
    }
}

And the results:

Repeated Late Bound String (Interned)

Repeated Late Bound String (Interned)

Note that by interning the late-bound string, we reduced memory usage considerably (272 megs, with only an additional 4 megs used for pointers) while adding only a minimal amount of additional computation (600 milliseconds)… 80% less memory used for the cost of an additional 20% computation is generally a good trade.

Now the icing on the cake. Compile and run your code in Release mode to allow for even more compiler optimizations:

Repeated Late Bound String (Interned) - Release Mode App

Repeated Late Bound String (Interned) - Release Mode App

Even less memory and even less interning costs! It’s win-win and just a friendly reminder to always release your code in Release mode. 🙂

So, who loves interns? You do. Remember: when you must use the same string repeatedly in an application (perhaps when storing a huge collection of User objects with a FirstName property, where many of your users are “David”, “John”, “Tim”, etc.), intern the string. Why create N copies of the exact same object, all using up your precious managed memory?