Tag Archive | memory

When to Compile your Regex

In .NET, it is a common misconception that compiled Regex always operates faster than uncompiled Regex after the initial cost which is incurred at the time of instantiation only. This is not always true.

Recently I wrote a URL rewriter, similar to Helicon but a little more powerful and flexible. It allows you to define rules in a config file, and then rewrite a captured input string to an output string, replacing parts of the URL via Regex capture groups along the way. An example of the configuration looks like this:

<add original="^/(.*)/customer(.*)$" rewritten="/$1/cust$2" redirect="false" ignorecase="true" />

What the above says is that if my URL is anything, then “/customer” then optionally anything else, rewrite it to the “/cust” version of the URL with the original inputs applied to the outputs.

Now, considering myself to be an efficient developer, I have always pre-compiled my Regex by adding the RegexOptions.Compiled option to the constructor. My rewriter code that parsed my XML file and created Regex’s looked like this:

Regex rewriteRuleRegex = null;

// Determine if we ignore case
if (rewriteConfiguration.IgnoreCase)
{
    rewriteRuleRegex = new Regex(rewriteConfiguration.Original, RegexOptions.Compiled | RegexOptions.IgnoreCase);
}
else
{
    rewriteRuleRegex = new Regex(rewriteConfiguration.Original, RegexOptions.Compiled);
}

I had 144 rules in my file, and decided to performance test the URL writer against approximately 100 unique requests. The result was that it sucked, REALLY, REALLY badly, with 87% of the request lifecycle being spent on Regex.Replace!

Precompiled Sucks

Precompiled Sucks

That’s INSANE! 87% is COMPLETELY unacceptable! After all I’d heard about the .NET Regex engine being efficient, this was freaking terrible! I spent a day pulling my hair out and reading a ton of articles, when I came across one from Jeff Atwood about when to Precompile and when not to. So, I took the Precompile flag off of my code:

Regex rewriteRuleRegex = null;

// Determine if we ignore case
if (rewriteConfiguration.IgnoreCase)
{
    rewriteRuleRegex = new Regex(rewriteConfiguration.Original, RegexOptions.IgnoreCase);
}
else
{
    rewriteRuleRegex = new Regex(rewriteConfiguration.Original);
}

And re-ran my tests:

Precompiled Sucks

Precompiled Sucks

MUCH better.

So what I learned is that if you CONTROL the input (AKA if it’s a small set of input that you know in advance), precompiling is good. If you don’t control the input (such as user generated URLs that are sent to your application), DO NOT PRECOMPILE. EVER.

One More Thing About List Binary Search

I wanted to point people to this link at DotNetPearls:

http://www.dotnetperls.com/binarysearch

They do an excellent, quick demonstration of List<T>.BinarySearch and show a graph that really drives home how much faster it is for large lists than a regular traversal!

Make Mostly Read, Seldom-Written Lists Much More Efficient

One of the many things that I do at work is run a full-blown Search Engine which I also developed from scratch. This Search Engine feeds all product related information to our websites. A search index consists of a pre-computed collection of products, their properties, a list of words that are correctly spelled, and some pre-computed faceted/guided navigation. A search index, until this week, took up approximately 10.7 gigs of memory. This was becoming too large as we added new products every single day.

As of writing this, it now takes only 4.8 gigs of memory and is only slightly (1-3%) less performant than before. How did I do it? Believe it or not, a very simple data structure and algorithm change.

In the Search Engine, a product’s properties are a key-value pairing of strings… Things like “isInStock” “1” or “color” “red” etc. We store the properties in a collection, per product. The collection was originally:

Dictionary<string, HashSet<string>> _entityProperties;

The key of the Dictionary was the property name and the HashSet of strings were the values for that property name (property names are not a “unique” key – a product could have multiple colors for example). I initially chose this data structure because we have a heavy need for DIRECT lookups to property names and values. Methods like HasProperties(string propertyName) and HasProperty(string propertyName, string propertyValue) are essential to the search engine’s function, and need to be performant. Thus, I figured that a Dictionary and HashSet would be best, since both offer O(1) lookup times and the index is read from 10000x more often than it is written to. O(1 + 1) is pretty good when it comes to complexity.

It turns out that there was a simpler, better data structure for my goals which also satisfies the performance requirements of the aforementioned methods.

As you may or may not know (and I learned the hard way), a HashSet<T> is actually not very efficient when you have only a few items in it. A List<T> is actually more performant for small collections (4 or fewer objects with simple GetHashCode() methods, such as strings, in my testing). This is true even though your average lookup/read case goes from O(1) to (1/2n) since you must traverse the List to find your desired object. The reason that List is faster is that there is no hash key computation, and the List<T> is basically an elastic array and thus takes less memory and has less overhead than a HashSet<T> with the same number of objects in it. Since my product properties typically only consist of 2 or 3 values for a given property name, I changed my data structure to this:

Dictionary<string, List<string>> _entityProperties;

This shaved approximately 10% off of the memory footprint and brought my memory usage down to 9.6 gigs. The performance was basically identical in all performance tests. This was better than my HashSet, but still not great. I wanted to do better. I was sure that somehow I could do better.

I spent the good part of this week trying – and failing – to design a more efficient data structure than the above. I tried a string Trie with nodes that pointed to another Trie, I tried SortedList<TKey, TValue> instead of the above, and everything else that I could think of. Yet no matter what I did, the memory stayed the same and the performance was the same or worse. It sucked. I was still sure that somehow I could do better.

Finally, Wednesday morning, I had a thought in the shower (where I do my best thinking): two dimensional Arrays suck. They are well documented to, in general, have worse memory usage metrics than a one dimensional array (a quick Google will fill you in). A Dictionary of Lists is certainly a two dimensional jagged Array of sorts, and it wasn’t doing what I wanted in terms of memory. So, I took another approach and changed my data structure wildly – I flattened it out and made it one dimensional:

List<KeyValuePair<string, string>> _entityProperties;

Seems insane, right? I go from a Dictionary with an O(1) key lookup to a linear List of all keys and values stored together. And yet, it did the trick for my memory: it went from 9.6 gigs to 4.8 gigs. Half of the amount of memory used. I was stoked.

I saved this memory by both interning strings and taking advantage of the KeyValuePair being a struct. Structs are a lot more efficient than reference types when the object is small, and a KeyValuePair is indeed quite small.

A new problem needed solving, however. Each product has around 60-100 properties associated with it, and I needed them to be accessed efficiently and quickly with near-direct lookups. Traversing the [now giant] List was not acceptable in terms of performance.

As it stood, I went from an O(1 + 1) data structure (key and value lookup costs for Dictionary and HashSet) to an O(1 + 1/2n) data structure (Dictionary and List) and finally to an O(n) data structure (List). And to top it all off, the n in the 1/2n was 3 or 4 where the n in the flat List of KeyValuePair was between 60 and 100. Truly I was getting worse performance with each improvement – at least theoretically. Still, the allure of the memory savings was too great to ignore and I wanted to use this data structure.

It then hit me: why not use BinarySearch on the List<T> to look up items quickly and keep the List sorted while I add AND be able to check for duplicates before adding? It was certainly worth a shot, since binary search is an O(log n) algorithm which is an improvement over the List’s O(n) traversal. So, I modified my Add(string propertyName, string propertyValue) method to keep the List sorted as things were added to it. This is surprisingly easy to do.

*note that from here on out I’m simplifying my code greatly to basic concepts from what actually exists in order to avoid giving away trade secrets or violating my NDA and being fired*

public void Add(string propertyName, string propertyValue)
{   
    // TODO: null checks, etc.

    var keyValuePair = new KeyValuePair<string, string>(propertyName, propertyValue);

    // Add property name and value
    // First find the identical item if it exists
    var result = _entityProperties.BinarySearch(keyValuePair, _entityPropertiesComparer);
    // If result is >= 0, already exists, done
    if (result >= 0)
    {
        return;
    }

    // If it does not exist, a one's complement of the returned int tells us WHERE to insert to maintain the sort order
    _entityProperties.Insert(~result, keyValuePair);
}

The secret here is two-fold:

One: I created a custom KeyValuePair<string, string> comparer class that implements IComparer<KeyValuePair<string, string>> and basically does a case-insensitive string compare of first the key strings, then the value strings. This IComparer is required by the List’s BinarySearch method to determine the ordering of objects in the List.

Two: the BinarySearch method returns a very useful value: a simple integer. If the int is < 0, it means that the item was not found in the List. If it is >= 0, it means that the item was found at the index of the value. If it returns a negative integer, it means that it was not found, and also that the proper index to insert the item at in order to keep the List sorted is the one’s complement of the value. A super useful return type, indeed. This allows you to add your elements to a List while preserving an order, at the cost of your add being an O(log n) operation instead of a List’s usual O(1) add operation. However, if you don’t add things as much as you read the List (we only adjust the index once a day for example, but read it thousands of times per hour), this can be worthwhile. Additionally, you could add everything in O(1) time and then do a final List Sort in order to sort the List for a single O(log n) cost if the order of elements does not matter until you’re done adding everything. In my case, the order mattered as I added to the List because I did not ever want to add duplicates (same property name and value). The HashSet handles this for me – Lists do not.

So, now my add costs O(log n) instead of O(n), but the payoff is now my lookups cost O(log n) instead of O(n) as well. I adjusted my earlier mentioned HasProperty and HasProperties methods accordingly:

public List<string> GetSpecificPropertyValues(string propertyName)
{
    // TODO: null checks, etc.

    List<string> result = new List<string>();

    // Binary search the property name - null is the smallest value of string for comparison
    var keyValuePair = new KeyValuePair<string, string>(propertyName, null);
    // One's complement the start index
    var startIndex = ~_entityProperties.BinarySearch(keyValuePair, _entityPropertiesComparer);

    for (int i = startIndex; i < _entityProperties.Count; i++)
    {
        // Leave the loop when the property name no longer matches
        if (!string.Equals(propertyName, _entityProperties[i].Key, StringComparison.OrdinalIgnoreCase))
        {
            // Leave the loop
            break;
        }
                    
        result.Add(_entityProperties[i].Value);
    }

    return result;
}

public bool HasProperty(string propertyName, string propertyValue)
{
    // TODO: null checks, etc.

    // Binary search the property name
    var keyValuePair = new KeyValuePair<string, string>(propertyName, propertyValue);
    var startIndex = _entityProperties.BinarySearch(keyValuePair, _entityPropertiesComparer);
    return startIndex >= 0;
}

public bool HasProperties(string propertyName)
{
    // TODO: null checks, etc.

    // Binary search the property name
    var keyValuePair = new KeyValuePair<string, string>(propertyName, null);
    // One's complement the start index
    var startIndex = ~_entityProperties.BinarySearch(keyValuePair, _entityPropertiesComparer);
    if (startIndex >= _entityProperties.Count)
    {
        return false;
    }

    // Check that the next element matches the property name
    return string.Equals(propertyName, _entityProperties[startIndex].Key, StringComparison.OrdinalIgnoreCase);
}

Suddenly, I have the same “direct lookup” methods available as I did with my Dictionary and HashSet/List structure, but in a flat List with O(log n) complexity.

This yielded 50% less memory usage and only a 1-3% increase in performance times. A very acceptable trade for the Search Engine.

If you have a List<T> with a lot of objects in it, and performance is key to your application, consider using BinarySearch and/or Sort to access it in a much more efficient way. As long as you can create an IComparer<T> where T is your List objects type, you’ll have a more efficient List.

Who Loves Interns?

The topic at hand is interning. More specifically, string interning.

“What is string interning?” you ask? Good question. As you may or may not know, strings are immutable reference types. This means that they are read-only and a pointer will refer to the string’s location on the heap. Typically, a new string is created and stored within your application’s memory each time that you assign a string – even if the same string is defined repeatedly. What this means is that you can define the same string N times and have it take up the string’s memory N times. This sucks when dealing with repeating string data.

Enter string interning. String interning is the process by which you defer your string reference to the .NET runtime’s string intern pool, thereby conserving memory as N identical strings point to the same single reference. From MSDN):

“The common language runtime conserves string storage by maintaining a table, called the intern pool, that contains a single reference to each unique literal string declared or created programmatically in your program. Consequently, an instance of a literal string with a particular value only exists once in the system. For example, if you assign the same literal string to several variables, the runtime retrieves the same reference to the literal string from the intern pool and assigns it to each variable. The Intern method uses the intern pool to search for a string equal to the value of str. If such a string exists, its reference in the intern pool is returned. If the string does not exist, a reference to str is added to the intern pool, then that reference is returned.”

Up until this point, you may not have been aware of this pool or the string interning behaviour at all. This is understandable because the .NET compiler does a bloody good job of interning strings for you. In fact, any literal string will be automatically interned by the compiler. Almost any late-bound (at runtime) string, however, is not. A quick code example – let us define a program that simply creates 10,000,000 of the same string and assigns it to a Dictionary:

class Program
{
    // The Dictionary that will be used to store our strings at an int 
    // (so that they are not de-referenced and GC'd)
    private static readonly IDictionary<int, string> _dict = new Dictionary<int, string>();

    static void Main(string[] args)
    {
        Console.WriteLine("Storing a non-constant string 10000000 times");

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        // Loop 10,000,000 times
        for (int i = 0; i < 10000000; i++)
        {
            // Define the same string repeatedly
            string blah = "Hello! This Is A String!";
            // Add it to the Dictionary at the index of i
            _dict.Add(i, blah);
        }

        stopwatch.Stop();

        Console.WriteLine("Memory Used: " + Process.GetCurrentProcess().PagedMemorySize64.ToString("n") + " bytes");
        Console.WriteLine("Elapsed milliseconds: " + stopwatch.ElapsedMilliseconds);

        Console.WriteLine("Press any key");
        Console.ReadKey();
    }
}

Running this program results in the following output:

A Repeated Literal String

A Repeated Literal String

It uses a relatively small amount of memory (269 megs) and takes very little time, since the compiler detects that the string which we’re creating is a literal and thus interns it for us automatically. Now, let us make a slight change to the application by creating a late-bound string which won’t be interned automatically:

class Program
{
    // The Dictionary that will be used to store our strings at an int 
    // (so that they are not de-referenced and GC'd)
    private static readonly IDictionary<int, string> _dict = new Dictionary<int, string>();

    static void Main(string[] args)
    {
        // Define a string which will be concatenated with another string later
        string dynamicString = "Some Other Large String";

        Console.WriteLine("Storing a non-constant string 10000000 times");

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        // Loop 10,000,000 times
        for (int i = 0; i < 10000000; i++)
        {
            // Define the same string repeatedly
            string blah = "Hello! This Is A String!" + dynamicString;
            // Add it to the Dictionary at the index of i
            _dict.Add(i, blah);
        }

        stopwatch.Stop();

        Console.WriteLine("Memory Used: " + Process.GetCurrentProcess().PagedMemorySize64.ToString("n") + " bytes");
        Console.WriteLine("Elapsed milliseconds: " + stopwatch.ElapsedMilliseconds);

        Console.WriteLine("Press any key");
        Console.ReadKey();
    }
}

Running THIS code yields crappier results:

Repeated Late Bound String

Repeated Late Bound String

Note that we use nearly five times as much memory (1.379 gigs) as we did before! And that our application got considerably slower! Sadly, we can’t do much about the slower part because concatenating strings takes time and effort. However, we can intern the string to return our memory usage back to something realistic while adding minimal computational cost. We do this with the string.Intern(string) method:

class Program
{
    // The Dictionary that will be used to store our strings at an int 
    // (so that they are not de-referenced and GC'd)
    private static readonly IDictionary<int, string> _dict = new Dictionary<int, string>();

    static void Main(string[] args)
    {
        // Define a string which will be concatenated with another string later
        string dynamicString = "Some Other Large String";

        Console.WriteLine("Storing a non-constant string 10000000 times");

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        // Loop 10,000,000 times
        for (int i = 0; i < 10000000; i++)
        {
            // Define the same string repeatedly
            string blah = "Hello! This Is A String!" + dynamicString;
            // Add it to the Dictionary at the index of i
            // Intern string "blah" to save memory!
            _dict.Add(i, string.Intern(blah));
        }

        stopwatch.Stop();

        Console.WriteLine("Memory Used: " + Process.GetCurrentProcess().PagedMemorySize64.ToString("n") + " bytes");
        Console.WriteLine("Elapsed milliseconds: " + stopwatch.ElapsedMilliseconds);

        Console.WriteLine("Press any key");
        Console.ReadKey();
    }
}

And the results:

Repeated Late Bound String (Interned)

Repeated Late Bound String (Interned)

Note that by interning the late-bound string, we reduced memory usage considerably (272 megs, with only an additional 4 megs used for pointers) while adding only a minimal amount of additional computation (600 milliseconds)… 80% less memory used for the cost of an additional 20% computation is generally a good trade.

Now the icing on the cake. Compile and run your code in Release mode to allow for even more compiler optimizations:

Repeated Late Bound String (Interned) - Release Mode App

Repeated Late Bound String (Interned) - Release Mode App

Even less memory and even less interning costs! It’s win-win and just a friendly reminder to always release your code in Release mode. 🙂

So, who loves interns? You do. Remember: when you must use the same string repeatedly in an application (perhaps when storing a huge collection of User objects with a FirstName property, where many of your users are “David”, “John”, “Tim”, etc.), intern the string. Why create N copies of the exact same object, all using up your precious managed memory?