Tag Archive | c#

Determine MIME Type from File Name

I recently had a need, in an ASP.NET MVC3 application, to read raw HTML, CSS, JS, and image files from disk and return them to the user… A sort of “pass-through” if you will. Normally I’d have simply routed to a custom HTTP handler per file type or just allowed MVC3 to map existing files to supply its own .NET HTTP handlers and do all of this work for me, but in this case I needed the mapped “directory” to switch behind the scenes based on Session settings… So I ultimately had to feed these files through a Controller and Action Method to gain access to the Session.

One problem that came up was being able to determine the MIME type of the content that I’m reading from disk. This is done for you by the HTTP handlers provided in the .NET framework, but when you’re serving files through MVC Controllers, the default HTTP handlers are not used and thus you’re left to figure out the MIME types for yourself.

So, I began to investigate, using ILSpy, how the native/default ASP.NET HTTP handlers determine the MIME types. I came upon a class in the System.Web namespace called System.Web.MimeMapping – this class keeps a private, sealed dictionary of type MimeMappingDictionaryClassic (which extends a private abstract class called MimeMappingDictionaryBase) which holds all knows extensions and their associated MIME types… A sample of the decompiled code which populates it is below:

protected override void PopulateMappings()
{
    base.AddMapping(".323", "text/h323");
    base.AddMapping(".aaf", "application/octet-stream");
    base.AddMapping(".aca", "application/octet-stream");
    base.AddMapping(".accdb", "application/msaccess");
    base.AddMapping(".accde", "application/msaccess");
    base.AddMapping(".accdt", "application/msaccess");
    base.AddMapping(".acx", "application/internet-property-stream");
    base.AddMapping(".afm", "application/octet-stream");
    base.AddMapping(".ai", "application/postscript");
    base.AddMapping(".aif", "audio/x-aiff");
    base.AddMapping(".aifc", "audio/aiff");
    base.AddMapping(".aiff", "audio/aiff");

And so on… In total, there are 342 lines of known mappings!

Ultimately, my goal was to get a hold of this functionality in the easiest, most flexible way possible.

In .NET 4.5, MimeMapping exposes a public static method called GetMimeMapping which takes in a file name (or extension) and returns the appropriate MIME type from the aforementioned dictionary. Unfortunately my project is on .NET 4.0 and in that version of the framework this method is internal, not public (why, Microsoft, why?!) and thus was not available to me. So, I felt that I was left with 3 options:

1. Upgrade to .NET 4.5 (not possible at this time due to corporate politics and so on)

2. Copy and paste the entire list of mappings into a dictionary of my own and reference it (yuck!)

3. REFLECTION TO THE RESCUE!

So, with a short bit of code, you too can steal the functionality of the GetMimeMapping method, even if it isn’t public!

First, set up the reflection and cache the MethodInfo in an assembly that references the System.Web namespace. Below is a custom static class I built which wraps the reflective method:

/// <summary>
/// Exposes the Mime Mapping method that Microsoft hid from us.
/// </summary>
public static class MimeMappingStealer
{
    // The get mime mapping method info
    private static readonly MethodInfo _getMimeMappingMethod = null;

    /// <summary>
    /// Static constructor sets up reflection.
    /// </summary>
    static MimeMappingStealer()
    {
        // Load hidden mime mapping class and method from System.Web
        var assembly = Assembly.GetAssembly(typeof(HttpApplication));
        Type mimeMappingType = assembly.GetType("System.Web.MimeMapping");
        _getMimeMappingMethod = mimeMappingType.GetMethod("GetMimeMapping", 
            BindingFlags.Instance | BindingFlags.Static | BindingFlags.Public |
            BindingFlags.NonPublic | BindingFlags.FlattenHierarchy);
    }

    /// <summary>
    /// Exposes the hidden Mime mapping method.
    /// </summary>
    /// <param name="fileName">The file name.</param>
    /// <returns>The mime mapping.</returns>
    public static string GetMimeMapping(string fileName)
    {
        return (string)_getMimeMappingMethod.Invoke(null /*static method*/, new[] { fileName });
    }
}

Now, a quick test via a console application to ensure that it works:

static void Main(string[] args)
{
    var fileName1 = "whatever.js";
    var fileName2 = "somefile.css";
    var fileName3 = "myfile.html";

    Console.WriteLine("Output for " + fileName1 + " = " + MimeMappingStealer.GetMimeMapping(fileName1));
    Console.WriteLine("Output for " + fileName2 + " = " + MimeMappingStealer.GetMimeMapping(fileName2));
    Console.WriteLine("Output for " + fileName3 + " = " + MimeMappingStealer.GetMimeMapping(fileName3));

    Console.ReadKey();
}

And running the console application results in success!

GetMimeMapping Works

GetMimeMapping Works

Static vs Instance string.Equals Benchmark

A friend of mine commented on my last post asking about how much faster the static string.Equals method is than the instance string.Equals method. To satiate both of our curiosities, I have created this benchmarking application:

static void Main(string[] args)
{
    var stopwatch = new Stopwatch();

    string a = "hello";
    string b = "hi";

    stopwatch.Start();
    for (int i = 0; i < 10000000; i++)
    {
        a.Equals(b);
    }
    stopwatch.Stop();

    Console.WriteLine("Instance string.Equals over 10,000,000 iterations: " + stopwatch.ElapsedMilliseconds + " ms");

    stopwatch.Reset();

    stopwatch.Start();
    for (int i = 0; i < 10000000; i++)
    {
        string.Equals(a, b);
    }
    stopwatch.Stop();

    Console.WriteLine("Static string.Equals over 10,000,000 iterations: " + stopwatch.ElapsedMilliseconds + " ms");

    Console.ReadKey();
}

The results of 5 runs, where “I” is the instance method and “S” is the static method, and the times are in milliseconds:

I: 113
S: 100

I: 144
S: 96

I: 126
S: 89

I: 126
S: 94

I: 128
S: 97

And there you have it. Static string.Equals is reliably slightly faster… But unless you’re doing millions of comparisons, it probably doesn’t really matter much. It does, however, prevent the NullReferenceException mentioned in the last post when the string instance is null.

Static vs Instance string.Equals

As you may or may not know, static methods are usually faster than instance methods. This alone should be a good enough reason to use the static string.Equals method in .NET, but if that doesn’t do it for you, allow me to present a simple example.

string a = "hello";
string b = "hi";
bool result = a.Equals(b);

What is the expected result of these lines? A boolean value of false, of course. And it’d be true if the strings were identical. It’s also false if b is null. But what if a is null?

string a = null;
string b = "hi";
bool result = a.Equals(b);

The above code throws a NullReferenceException, since we are attempting to use the instance method of string.Equals and our instance is null. Effectively we’re calling null.Equals which of course is a NullReferenceException.

The static method is best to use in situations where either string (or both) can be null. Re-writing our code as:

string a = null;
string b = "hi";
bool result = string.Equals(a, b);

Allows the code to run identically in function to the original code but without ever throwing a NullReferenceException for any given string input.

TPL and Error Handling – Continuation Tasks

Two of my colleagues (one from work and one from a user group) kindly pointed out to me that in my last post I omitted Continuation Tasks as a means of Error Handling for the TPL. As such, I will expand upon my last post with an example of handling errors via a Continuation Task.

Continuing where we left off last, the following code will utilize a Task Continuation to handle errors within Tasks.

static void Main(string[] args)
{
    for (int i = 0; i < 5; i++)
    {
        // Initialize a Task which throws exception
        var task = new Task(() =>
        {
            throw new Exception("It broke!");
        });
        // Configure the Continuation to only fire on error
        task.ContinueWith(HandleError, TaskContinuationOptions.OnlyOnFaulted);

        // Start the Task
        task.Start();
    }

    Console.WriteLine("End of method reached");
    Console.ReadKey();
}

private static void HandleError(Task task)
{
    // The Task has an AggregateException
    var agex = task.Exception;
    if (agex != null)
    {
        // Output all actual Exceptions
        foreach (var ex in agex.InnerExceptions)
        {
            Console.WriteLine(ex.Message);
        }
    }
}

And the result:

The Output

The Output

By using a Continuation Task which wraps our private method, HandleError (a method whose signature is effectively Action<Task>), we can effectively handle errors in a more elegant, less inline way. This allows centralizing Task handling logic in such cases as where you’d always want to log to a file or database, for example. Note that there is overhead in the complexity of this architecture – since we receive an AggregateException, we must loop through it to analyze our individual errors.

Sorry that I missed this in my first post! 🙂

TPL and Error Handling

As of .NET 4.0, the TPL or Task Parallel Library is king when it comes to parallelization. It allows for smooth, easy multi-threading for any application. There is a slight learning curve, however, and a major part of this is understanding how Exceptions bubble-up while using the TPL.

Let’s partake in a simple example. This code will create and run a task that throws an Exception, and then attempt to catch it:

static void Main(string[] args)
{
    try
    {
        // Initialize a Task which throws exception
        var task = new Task(() =>
        {
            throw new Exception("Task broke!");
        });
        // Start the Task
        task.Start();
    }
    // Attempt to catch the Task Exception
    catch (Exception ex)
    {
        Console.WriteLine("Caught the TPL exception");
    }

    Console.WriteLine("End of method reached");
    Console.ReadKey();
}

When we run it, we get this result:

First Output

The Output

The Exception was not caught. This is because the Task is run on a different thread, which has its own Stack memory and execution path, which consists of the code inside of the Task and not much else. As a result, when the Task is run, the flow of control returns to the next line in our application, unaware of the Task or its outcome. Once the Exception in the Task is thrown, it is thrown in a different scope, effectively outside of our Main method. It’s almost as if we ran an entirely different application, but attempted to catch its Exception in this application. This just won’t work.

There is a way to work around this however. One interesting thing about the TPL is the Task.WaitAll method. It allows the control of flow of your executing Tasks/threads to return to your calling method. By adding this method to our code, we can make our main thread stall until the Task completes, which also enables it to catch the Task’s exception:

static void Main(string[] args)
{
    try
    {
        // Initialize a Task which throws exception
        var task = new Task(() =>
        {
            throw new Exception("Task broke!");
        });
        // Start the Task
        task.Start();
        // Wait for the Task to complete, thus keeping control of flow
        Task.WaitAll(task);
    }
    // Attempt to catch the Task Exception
    catch (Exception ex)
    {
        Console.WriteLine("Caught the TPL exception");
    }

    Console.WriteLine("End of method reached");
    Console.ReadKey();
}

The output is as follows:

Second Output

Second Output

This time we were able to catch the Exception. Of note, the Exception which is thrown by Tasks is typically the AggregateException which allows you to catch multiple, often asynchronous Exceptions as one Exception (which is easier for a single thread to handle). A quick demo of this functionality (let’s have 5 threads each throw an Exception):

static void Main(string[] args)
{
    try
    {
        // A List of Tasks
        var taskList = new List<Task>(5);

        for (int i = 0; i < 5; i++)
        {
            // Initialize a Task which throws exception
            var task = new Task(() =>
            {
                throw new Exception("It broke!");
            });
            // Start the Task
            task.Start();
            // Add to List
            taskList.Add(task);
        }
        // Wait for all Tasks to complete, thus keeping control of flow
        Task.WaitAll(taskList.ToArray());
    }
    // Attempt to catch the Task Exception
    catch (AggregateException agex)
    {
        Console.WriteLine("Caught the TPL exception(s)");
        // Output all actual Exceptions
        foreach (var ex in agex.InnerExceptions)
        {
            Console.WriteLine(ex.Message);
        }
    }

    Console.WriteLine("End of method reached");
    Console.ReadKey();
}

And the output:

Third Output

Third Output

So as you can see, there are ways to handle Tasks throwing Exceptions, if you have your calling thread pause and wait for the Tasks to complete. This, of course, is not always practical so your other option is to handle the exception within the Task’s scope – just like you would with regular, single-threaded code. In effect you treat the code within the Task as isolated, single-threaded code, and catch and handle Exceptions accordingly. A slight mod to our application will show you this:

static void Main(string[] args)
{
    for (int i = 0; i < 5; i++)
    {
        // Initialize a Task which throws exception
        var task = new Task(() =>
        {
            try
            {
                throw new Exception("It broke!");
            }
            catch (Exception ex)
            {
                // Output the Exception
                Console.WriteLine(ex.Message);
            }
        });
        // Start the Task
        task.Start();
        // Note: no longer waiting for Task to finish
    }

    Console.WriteLine("End of method reached");
    Console.ReadKey();
}

And the result of this change:

Fourth Output

Fourth Output

Note the interesting results of this. Our main thread did not wait for the Tasks to complete, and so it the Task results were never returned back to the calling thread’s flow of control. As a result, the “End of method reached” text actually came before the threads crashed, because it happened sooner.

Note also that only 4 of the 5 Exceptions were written to the Console. This is due to a race condition in your parallelized application: The Console is a single/static reference and so our 5 spawned threads plus 1 main thread all race to output to it. In this particular instance of the application being run, the Console.ReadKey method was executed after the first 4 Tasks had written their output, but before the 5th Task wrote its output. This does not mean that it was not handled; it simply means that, in a way that is classic to multi-threaded applications, we encountered a race condition. I ran this application another 10 or 20 times and saw many variations: sometimes 3 Exceptions were output, sometimes 4, and rarely all 5. This is a great example of a race condition within an application, and one which you could handle using the above strategy of Task.WaitAll prior to outputting your final Console.ReadKey statement and terminating your application.

It’s up to you individually to decide which style of TPL error handling makes the most sense in each application. It depends on the purpose of each thread and the implications of having your application wait for threads to complete, considered on a per-application basis.

One final note is that other parallel operations, such as Parallel.ForEach and Parallel LINQ (PLINQ) use AggregateException as well to catch their thrown Exceptions, and that the AggregateException offers a Flatten method for re-throwing nested AggregateExceptions as a one-level deep single AggregateException to simplify handling of them.

Compiler Tricks – Inferred Types

The .NET compiler is a terrific thing… After all, it turns your C# into an executable program!

One nice feature of the .NET compiler, which is becoming better each release, is inferred typing. I’d like to lay out a few short examples that might help you develop your programming standards and practices.

1. Inferring a type when creating an array.

// Create and initialize an array
var myArray = new int[] { 1, 2, 3 };

Becomes:

// Create and initialize an array
var myArray = new [] { 1, 2, 3 };

The compiler knows that your array members are integers, and thus infers the array type as int[].

2. Inferred types and matching Generic method parameters. Given a Generic method:

/// <summary>
/// A generic method that takes a thing of Type T as input.
/// </summary>
/// <typeparam name="T">The Type.</typeparam>
/// <param name="thing">The thing of Type T.</param>
private void MyGenericMethod<T>(T thing)
{
    // Do some stuff
}

This:

// Generic Type Parameters
MyGenericMethod<int>(1);

Becomes:

// Generic Type Parameters
MyGenericMethod(1);

No need to include the Type parameter in the method call because the Generic Type is the same Type as the parameter to the method, so the compiler figures it out.

3. Inferred types and the var keyword. A pretty classic example.

int myInteger = 3;

Becomes:

var myInteger = 3;

The compiler knows to assign the type of int to myInteger based on the evaluation of the assignment expression. Note that this can be annoying when trying to use interfaces:

var myList = new List<int>();

Assigns var to List, so if you want IList you need to do it the usual way:

IList<int> myList = new List<int>();

A short but sweet post.

Custom Output Caching with MVC3 and .NET 4.0 – Done Properly!

I came across a need at work today to re-implement some of the Output Caching for our MVC3 application which runs under .NET 4.0. I wanted to use standard Output Caching (via the OutputCacheAttribute class, why re-invent the well-working wheel?) but due to some of our requirements I needed more control over how my objects were cached. More specifically, I needed to cache them with a custom Cache Dependency. With a little bit of Google-Fu, I was delighted to learn of the Output Cache Provider functionality introduced in ASP.NET 4. I implemented a custom OutputCacheProvider, registered it in my Web.config file as the Default Cache Provider, and I was well on my way.

…Or so I thought. You see, in our application we are caching both regular (Parent) and partial (Child) Controller Action Method calls. That is to say, we’re caching regular calls to Controller Action Methods, as well as the outputs of Child Action Method calls which are invoked from uncached Parent Action Method calls. While testing, some strange behaviour showed me that my Child Action Method calls were not being cached by my shiny new custom Output Cache Provider. They were instead being cached by the Default Output Cache Provider, which I have no control over. I confirmed this by debugging and seeing that my Child Action Method calls were not hitting my Custom Output Cache Provider methods… What gives?

I did some more Googling and learned very little, but happened to come across this little tidbit of somewhat vague information. I also came across a few .NET bloggers that had solved this problem… how shall I say… VERY poorly. So, I’d like to tell you how to do it correctly.

In the .NET 4.0 edition of the MVC3 assemblies, the OutputCacheAttribute contains a static property called ChildActionCache which is of type ObjectCache. As you can see from the MSDN page (at least at the time of writing this), they aren’t exactly detailing what it is for or how it really works – or why you can’t just use the bloody OutputCacheAttribute for Child Action Method calls. So what is going on?

Well, after a little investigation, I discovered the reasoning behind the madness. Basically, from a high level view, the OutputCacheAttribute works in conjunction with a Caching Http Module (the OutputCacheModule class). Each HTTP Request is passed to the OutputCacheModule LONG before it reaches your MVC application (FYI, this is called kernel-level IIS caching), and if the Http Module can pull a cached Response for that particular Request out of the Cache, it will short circuit your application and simply render the response to the user, stopping further execution. When this happens, your application never even sees the request. Neat, huh? If it can’t find the request, it exits and lets your application do its thing… And whenever you’ve placed OutputCache on your Action Method, it will cache the response in the same format that the Http Module looks for. This allows for MUCH less work to be done by your application in caching things. Cool, right?

You may now see why you cannot cache Child Action Method calls using the regular old OutputCacheAttribute… Your MVC application needs to execute a Parent Action Method from which the Child Action Methods are executed. If your Child Action Method was cached in the same way as a Parent Action Method, the HttpModule would always perform a Cache-miss since the Child Request originates from the Parent and has a completely different method signature, parameters, etc. upon which the Cache Key is derived. How can you cache your Child Action Methods ahead of your MVC application when your MVC application needs to execute in order to generate the Child Action Method Requests? And so, OutputCacheAttribute only works in the traditional manner for Parent Action Method calls.

So, how do you “fix” this and handle the Caching of Custom Child Action Methods in the same way as Parent Action Methods? First, create a custom class that inherits from the MemoryCache object. In this class you’re going to override 2 methods:

/// <summary>
/// A Custom MemoryCache Class.
/// </summary>
public class CustomMemoryCache : MemoryCache
{
    public CustomMemoryCache(string name)
        : base(name)
    {

    }
    public override bool Add(string key, object value, DateTimeOffset absoluteExpiration, string regionName = null)
    {
        // Do your custom caching here, in my example I'll use standard Http Caching
        HttpContext.Current.Cache.Add(key, value, null, absoluteExpiration.DateTime, 
            System.Web.Caching.Cache.NoSlidingExpiration, System.Web.Caching.CacheItemPriority.Normal, null);

        return true;
    }

    public override object Get(string key, string regionName = null)
    {
        // Do your custom caching here, in my example I'll use standard Http Caching
        return HttpContext.Current.Cache.Get(key);
    }
}

You’re going to build your custom Output Cache Provider (mentioned at the beginning of this post) similarly, inheriting from the abstract class OutputCacheProvider. Here’s an example one:

/// <summary>
/// A Custom OutputCacheProvider Class.
/// </summary>
public class CustomOutputCacheProvider : OutputCacheProvider
{
    public override object Add(string key, object entry, DateTime utcExpiry)
    {
        // Do the same custom caching as you did in your 
        // CustomMemoryCache object
        var result = HttpContext.Current.Cache.Get(key);

        if (result != null)
        {
            return result;
        }

        HttpContext.Current.Cache.Add(key, entry, null, utcExpiry,
            System.Web.Caching.Cache.NoSlidingExpiration, System.Web.Caching.CacheItemPriority.Normal, null);

        return entry;
    }

    public override object Get(string key)
    {
        return HttpContext.Current.Cache.Get(key);
    }

    public override void Remove(string key)
    {
        HttpContext.Current.Cache.Remove(key);
    }

    public override void Set(string key, object entry, DateTime utcExpiry)
    {
        HttpContext.Current.Cache.Add(key, entry, null, utcExpiry,
            System.Web.Caching.Cache.NoSlidingExpiration, System.Web.Caching.CacheItemPriority.Normal, null);
    }
}

You’ll notice that Add and Set are similar, but that Add checks for and returns the object from Cache if it exists, before attempting any Caching. This is the expected behaviour of the Add method according to MSDN and thus you should code it as above.

Now your Web.config needs a few simple lines added in order to be configured to use your CustomOutputCacheProvider:

<system.web>
  <caching>
    <outputCache defaultProvider="CustomProvider">
      <providers>
        <clear/>
        <add name="CustomProvider" type="MyMvcApp.Caching.CustomOutputCacheProvider, MyMvcApp"/>
      </providers>
    </outputCache>
  </caching>
</system.web>

The defaultProvider segment above allows you to set the Named Output Cache Provider that should be used by default for all Output Caching.

With the classes and configuration in place, you’ve now configured all Parent Action Methods which are decorated with [OutputCache] to use your new Custom Output Cache Provider! But we still need to configure Child Action Methods to do the same Caching as Parent Action Methods. This is where your custom MemoryCache object comes into play. Modify your Global.asax to wire your CustomMemoryCache into the OutputCacheAttribute:

protected void Application_Start()
{
    // Register Custom Memory Cache for Child Action Method Caching
    OutputCacheAttribute.ChildActionCache = new CustomMemoryCache("My Cache");
}

As an FYI, for your CustomMemoryCache object, and MemoryCache objects in general, here’s some information how to configure them by Name using your Web.config or App.config – very useful. You’ll note that I named my MemoryCache “My Cache” above – the name isn’t optional, but it has no effect unless it matches a Named Cache entry in your Web.config or App.config file; if it does match, it will adhere to the rules of your Named Cache. If, on the other hand, your MemoryCache object doesn’t use Runtime Caching and instead writes to a database or other external source such as AppFabric, the Named Cache will have no effect since it applies only to in-process Runtime Caching.

And that’s it! You’ve got a fully custom Output Caching solution in .NET 4.0 for your MVC3 application that correctly leverages standard Microsoft hooks and components! Thanks for reading this long post – comments and criticisms welcomed as always.

LINQ and Deferred Execution

As of .NET 3.0, LINQ (and the often related Lambda Expressions) have been available for our use and abuse. LINQ stands for Language INtegrated Query, and is a method of modelling OO data in a more or less relational sense that is not unlike databases. And just like databases, it comes with a cost.

To offset this cost, LINQ uses Deferred Execution. Deferred Execution means that the code is not executed until it is needed. This means that the LINQ code that you write is not actually executed until you NEED to execute it – typically during an enumeration of the results.

An example. Let’s create an array of 1,000,000 integers, all of random values between 1 and 10000, and sort them in an ascending fashion using LINQ:

static void Main(string[] args)
{
    // Anytime that you know the size of your list, specify 
    // it in the constructor. This enables much more efficient 
    // processor and memory usage
    var myIntegers = new List<int>(1000000);

    // Initialize RNG
    var random = new Random();

    // Populate the list with random numbers
    for (int i = 0; i < 1000000; i++)
    {
        myIntegers.Add(random.Next(1, 10000));
    }

    var stopwatch = new Stopwatch();
    stopwatch.Start();

    // LINQ time, let's sort them
    var result = myIntegers.OrderBy(i => i);

    stopwatch.Stop();
    Console.WriteLine("LINQ OrderBy Time: " + stopwatch.ElapsedMilliseconds + " ms");

    Console.ReadKey();
}

And the output:

LINQ Order By Result

LINQ Order By Result

Note how little time it took to Order the results – only 0 ms! Seems a bit fishy that we sorted 1,000,000 integers in less than 1 millisecond, doesn’t it? This is because our application didn’t actually sort them at all. Via the power of Deferred Execution, your .NET application is intelligent enough to know not to actually execute any LINQ queries until you NEED to. Note that since we never did anything with the result, the sort never actually happened.

What did happen, however, was the creation of a .NET object called an Expression Tree. An Expression Tree is used for “meta programming” – basically writing code that writes code. LINQ automatically generates an Expression Tree for your query – which it can build upon as you tag on additional queries – that isn’t executed until it needs to be executed. This allows you to do all of the neat things that LINQ supports like joins, selecting based on a particular class property or value, and so on. The generation of this Expression Tree is actually all that has happened so far in our application, and it took approximately 0 ms – cheap!

Recall now that LINQ can be performed on 2 types of objects: IEnumerable<T> and IQueryable<T>. Since IEnumerable<T> exposes only one method, GetEnumerator, it is safe to say that LINQ queries against IEnumerable<T> objects are only actually executed whenever enumeration actually occurs. To prove this, let’s alter the code slightly by adding a ToList() call to our OrderBy() call – ToList() forces an enumeration of the collection to gather results, so our OrderBy will actually do the work of ordering things:

static void Main(string[] args)
{
    // Anytime that you know the size of your list, specify 
    // it in the constructor. This enables much more efficient 
    // processor and memory usage
    var myIntegers = new List<int>(1000000);

    // Initialize RNG
    var random = new Random();

    // Populate the list with random numbers
    for (int i = 0; i < 1000000; i++)
    {
        myIntegers.Add(random.Next(1, 10000));
    }

    var stopwatch = new Stopwatch();
    stopwatch.Start();

    // LINQ time, let's sort them
    // This time we force the enumeration with ToList()
    var result = myIntegers.OrderBy(i => i).ToList();

    stopwatch.Stop();
    Console.WriteLine("LINQ OrderBy Time: " + stopwatch.ElapsedMilliseconds + " ms");

    Console.ReadKey();
}

And the result:

LINQ Order By Then Any Result

LINQ Order By Then Any Result

A whopping 356 ms – seems a little more realistic for performing a sort than 0 ms!

Why is LINQ done this way? Why not execute each query immediately as the LINQ method is called? The answer is efficiency. I’d love to get further into the nitty gritty details of why, but Charlie Calvert explains it better than I probably would. I will, however, cover Expression Trees in a future post – they’re a ton of fun. 🙂

Make Your Debugging Life Easier

Sorry for the delay in posts, May has been a very busy month.

In order to accurately debug or profile an external assembly or library (AKA one you’re not directly compiling), you need the associated PDB files to accompany each of the DLLs. These files give the debugger some information about the compiled assembly so that your debugger or profiler can become aware of function names, line numbers, and other related meta data.

One thing that sucks is debugging and profiling native Microsoft .NET assemblies. When debugging an exception where you have no line number, or performance profiling an application and not knowing what method is being referred to, it’s easy to become very frustrated, very quickly. The latter scenario happened to me just this week. I was performance profiling my application at work and found that a large portion of the application’s time (~43%) was spent in a method called “clr.dll” (displayed here as memory addresses):

Performance Profiling - No Symbols

Performance Profiling - No Symbols

This was not exactly a useful indication of what REALLY happened. What am I supposed to do with the knowledge that over 40% of my application’s time is spent in a Microsoft assembly called “clr.dll”? Not much, which is a little concerning. I needed to know what was really happening!

Fortunately, there’s a solution for this very issue. A little known feature implemented in Visual Studio 2010 is the ability to connect to Microsoft’s Symbol Servers and obtain most of the debugging symbols for their assemblies and libraries!

Just go to Tools –> Options –> (expand) Debugging –> Symbols

Here, select/check the “Microsoft Symbol Servers” as a source for Symbols. Now, getting the symbols from Microsoft every time you debug or profile is going to be slow and painful (and it’ll even give you a pop-up saying as much once you check the Microsoft Symbol Servers), so be sure that you specify a directory for the “Cache symbols in this directory” input – it will keep a local copy of the PDBs and simply check for updates every so often. As a result, you get your regular debugging/profiling AND you can see the function names of the Microsoft assemblies!

Using this feature, I was able to re-evaluate my latest performance tests and see that the supposed “clr.dll” method was actually “TransparentProxyStub_CrossContext” – a method buried deep within the WCF framework:

Performance Profiling - Symbols

Performance Profiling - Symbols

A little Google-Fu and a discussion with a co-worker who is well-versed in WCF told me that my application was actually spending its time waiting for a reply from a WCF request. Since this was expected behaviour (the application calls out to a service for every request), it put my performance profiling mind at ease.

Take advantage of Microsoft’s PDBs, especially when the price is right – free. You’d be amazed how useful they are in your day-to-day debugging and profiling.

Who Loves Interns?

The topic at hand is interning. More specifically, string interning.

“What is string interning?” you ask? Good question. As you may or may not know, strings are immutable reference types. This means that they are read-only and a pointer will refer to the string’s location on the heap. Typically, a new string is created and stored within your application’s memory each time that you assign a string – even if the same string is defined repeatedly. What this means is that you can define the same string N times and have it take up the string’s memory N times. This sucks when dealing with repeating string data.

Enter string interning. String interning is the process by which you defer your string reference to the .NET runtime’s string intern pool, thereby conserving memory as N identical strings point to the same single reference. From MSDN):

“The common language runtime conserves string storage by maintaining a table, called the intern pool, that contains a single reference to each unique literal string declared or created programmatically in your program. Consequently, an instance of a literal string with a particular value only exists once in the system. For example, if you assign the same literal string to several variables, the runtime retrieves the same reference to the literal string from the intern pool and assigns it to each variable. The Intern method uses the intern pool to search for a string equal to the value of str. If such a string exists, its reference in the intern pool is returned. If the string does not exist, a reference to str is added to the intern pool, then that reference is returned.”

Up until this point, you may not have been aware of this pool or the string interning behaviour at all. This is understandable because the .NET compiler does a bloody good job of interning strings for you. In fact, any literal string will be automatically interned by the compiler. Almost any late-bound (at runtime) string, however, is not. A quick code example – let us define a program that simply creates 10,000,000 of the same string and assigns it to a Dictionary:

class Program
{
    // The Dictionary that will be used to store our strings at an int 
    // (so that they are not de-referenced and GC'd)
    private static readonly IDictionary<int, string> _dict = new Dictionary<int, string>();

    static void Main(string[] args)
    {
        Console.WriteLine("Storing a non-constant string 10000000 times");

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        // Loop 10,000,000 times
        for (int i = 0; i < 10000000; i++)
        {
            // Define the same string repeatedly
            string blah = "Hello! This Is A String!";
            // Add it to the Dictionary at the index of i
            _dict.Add(i, blah);
        }

        stopwatch.Stop();

        Console.WriteLine("Memory Used: " + Process.GetCurrentProcess().PagedMemorySize64.ToString("n") + " bytes");
        Console.WriteLine("Elapsed milliseconds: " + stopwatch.ElapsedMilliseconds);

        Console.WriteLine("Press any key");
        Console.ReadKey();
    }
}

Running this program results in the following output:

A Repeated Literal String

A Repeated Literal String

It uses a relatively small amount of memory (269 megs) and takes very little time, since the compiler detects that the string which we’re creating is a literal and thus interns it for us automatically. Now, let us make a slight change to the application by creating a late-bound string which won’t be interned automatically:

class Program
{
    // The Dictionary that will be used to store our strings at an int 
    // (so that they are not de-referenced and GC'd)
    private static readonly IDictionary<int, string> _dict = new Dictionary<int, string>();

    static void Main(string[] args)
    {
        // Define a string which will be concatenated with another string later
        string dynamicString = "Some Other Large String";

        Console.WriteLine("Storing a non-constant string 10000000 times");

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        // Loop 10,000,000 times
        for (int i = 0; i < 10000000; i++)
        {
            // Define the same string repeatedly
            string blah = "Hello! This Is A String!" + dynamicString;
            // Add it to the Dictionary at the index of i
            _dict.Add(i, blah);
        }

        stopwatch.Stop();

        Console.WriteLine("Memory Used: " + Process.GetCurrentProcess().PagedMemorySize64.ToString("n") + " bytes");
        Console.WriteLine("Elapsed milliseconds: " + stopwatch.ElapsedMilliseconds);

        Console.WriteLine("Press any key");
        Console.ReadKey();
    }
}

Running THIS code yields crappier results:

Repeated Late Bound String

Repeated Late Bound String

Note that we use nearly five times as much memory (1.379 gigs) as we did before! And that our application got considerably slower! Sadly, we can’t do much about the slower part because concatenating strings takes time and effort. However, we can intern the string to return our memory usage back to something realistic while adding minimal computational cost. We do this with the string.Intern(string) method:

class Program
{
    // The Dictionary that will be used to store our strings at an int 
    // (so that they are not de-referenced and GC'd)
    private static readonly IDictionary<int, string> _dict = new Dictionary<int, string>();

    static void Main(string[] args)
    {
        // Define a string which will be concatenated with another string later
        string dynamicString = "Some Other Large String";

        Console.WriteLine("Storing a non-constant string 10000000 times");

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        // Loop 10,000,000 times
        for (int i = 0; i < 10000000; i++)
        {
            // Define the same string repeatedly
            string blah = "Hello! This Is A String!" + dynamicString;
            // Add it to the Dictionary at the index of i
            // Intern string "blah" to save memory!
            _dict.Add(i, string.Intern(blah));
        }

        stopwatch.Stop();

        Console.WriteLine("Memory Used: " + Process.GetCurrentProcess().PagedMemorySize64.ToString("n") + " bytes");
        Console.WriteLine("Elapsed milliseconds: " + stopwatch.ElapsedMilliseconds);

        Console.WriteLine("Press any key");
        Console.ReadKey();
    }
}

And the results:

Repeated Late Bound String (Interned)

Repeated Late Bound String (Interned)

Note that by interning the late-bound string, we reduced memory usage considerably (272 megs, with only an additional 4 megs used for pointers) while adding only a minimal amount of additional computation (600 milliseconds)… 80% less memory used for the cost of an additional 20% computation is generally a good trade.

Now the icing on the cake. Compile and run your code in Release mode to allow for even more compiler optimizations:

Repeated Late Bound String (Interned) - Release Mode App

Repeated Late Bound String (Interned) - Release Mode App

Even less memory and even less interning costs! It’s win-win and just a friendly reminder to always release your code in Release mode. 🙂

So, who loves interns? You do. Remember: when you must use the same string repeatedly in an application (perhaps when storing a huge collection of User objects with a FirstName property, where many of your users are “David”, “John”, “Tim”, etc.), intern the string. Why create N copies of the exact same object, all using up your precious managed memory?