Performance of the “parameterless” constructor of List<T>

I’ve always tried to specify the size of my List<T> collections whenever I can.  I do this just because I “know” that I will get better performance by specifying the collection size up front, but I never knew exactly how much difference it really made.  I know other developers use the parameterless constructor without giving it a second thought, and I can’t think of a time I’ve ever encountered a noticeable performance difference either way.

Just due to an abundance of curiosity, I decided to test it.

As we all know, generic collections are just a bunch of sweet, hot, sexy goodness sitting on top of the same old-fashioned arrays we all grew up with.  Because of this, List<T> doesn’t actually do anything we couldn’t always have done in the past with arrays and lots of our own custom code.  That being said, I love me some generic collections.

However, I think a big pitfall that many developers are falling into with generic collections is overusing the parameterless constructor for List<T>, and other generic collections.  As we all know, it’s not possible to resize an array once it’s been allocated.  The only way to grow an array is to actually create a second, larger, array and copy the contents from the first array into it and then, ideally, to free up the resources used by the original array so they can be returned to the system.

Since List<T> is just using arrays under the hood, it is subject to this same inherent restriction.  How, then, does List<T> handle having data continuously added to it when we never told it what size the array should be?  To my knowledge, this is not documented as part of any spec from Microsoft, so all we can do is test it and see; and keep in mind that it could change at any time for any future version of .NET.

List<T> provides a .Capacity() method which returns an integer value that represents the .Length of the current underlying array.  I’ll be using that method extensively as we test further.

To start with, lets verify that my last statement is actually true with a simple console app.  For reference, I will be targeting .NET 4.5.2 x64 in all of my examples.

static void Main(string[] args)
{
    var myList = new List(50);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);

    Console.ReadKey();
}

This gives us the expected output of 50 for List capacity, and in the old-school array world would have looked something like this.

int[] myArr = new int[50];

But what about the parameterless constructor for List<T>, what is its capacity?  Let’s modify our console app and find out.

static void Main(string[] args)
{
    var myList = new List();
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);

    Console.ReadKey();
}

As it turns out, this gives us an output of 0.  Huh, is that even possible with arrays?  Well, it turns out that it is; but as soon as you try to use your 0 length array you’ll find yourself facing a friendly IndexOutOfRangeException.

static void Main(string[] args)
{
    int[] myArr = new int[0];
    myArr[0] = 5; //IndexOutOfRangeException

    Console.ReadKey();
}

So what happens when we add an item to a List?  Fortunately, .NET knows it cant add an item to a 0-length array, so it creates a new array for us with a capacity of 4 prior to performing our first .Add().

static void Main(string[] args)
{
    var myList = new List();

    //Starts with 0 capacity
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);

    //Add an item to the List
    myList.Add(5);

    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Now our capacity is 4

    Console.ReadKey();
}

So the obvious question is, what happens if I create a List and add 5 items to it?  The answer is, .NET doubles the capacity from 4 to 8.

static void Main(string[] args)
{
    var myList = new List();

    //Starts with 0 capacity
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);

    //Add an item to the List
    myList.Add(1);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Now our capacity is 4

    myList.Add(2);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Capacity is still 4

    myList.Add(3);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Capacity is still 4

    myList.Add(4);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Capacity is still 4

    myList.Add(5);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Capacity is now 8

    Console.ReadKey();
}

So remember that since List is using arrays under the hood, what actually happened is we created a 4 element array, filled it, realized it was too small, created an 8 element array, copied the 4 elements from our first array into the new array, marked our first array as eligible for garbage collection, then added our 5th item to the new 8 element array.

So, we’ve seen the capacity go from 4 –> 8, what happens if we exceed 8?  Does the capacity grow by an additional 4 to 12, or does something else happen?  It turns out the capacity doubles again to 16.

static void Main(string[] args)
{
    var myList = new List();

    for (int i = 0; i < 8; i++)
    {
        myList.Add(i);
    }

    Console.WriteLine("myList Capacity: {0}", myList.Capacity); //8

    myList.Add(27);

    Console.WriteLine("myList Capacity: {0}", myList.Capacity); //16


    Console.ReadKey();
}

So lets think about what happened here again.

  1. We added our first element, .NET allocated a 4 element array and put our value in the first slot.
  2. We added 3 more elements to finish filling out the 4 element array.
  3. We added a 5th element, .NET allocated an 8 element array, copied our 4 elements into it, then added the 5th element to it.  Our 4 element array now needs to be garbage collected.
  4. We added 3 more elements to finish filling up the new 8 element array.
  5. We added a ninth element, .NET allocated a 16 element array, copied our 8 elements into it, then added the 9th element to it.  Our 8 element array now needs to be garbage collected.

We’re really making the system work a lot harder than it needs to, assuming we had some idea of how many values were going to be in our collection from the beginning.

This pattern of array-capacity doubling as the List grows continues regardless of the number of elements.  4, 8, 16, 32, 64, 128, 256, and so on.  This means that if you have a List with 100 elements in it that was created using the parameterless constructor it will have gone through 5 array growth/copy cycles and be using up enough memory to hold 128 elements rather than the 100 you are actually using.

You can calculate the number of growth cycles using the following formula.  Round the result up to the nearest whole number, which will be the number of array growth cycles required using the parameterless constructor for List to fill to the specified number of elements.

Log(NumElements / 4) / Log(2)

So, what is the actual performance impact of all of this array growing?  Up until know we’ve been using ints, so lets continue with that for now and find out.

static void Main(string[] args)
{
    var sw = new System.Diagnostics.Stopwatch();
    int count = 1000;
    int tests = 1000;

    var ticks = new List(tests);

    //First call outside of our stopwatch so JIT Compiling can happen
    BuildList(count);


    for (int i = 0; i < tests; i++)
    {
        sw.Reset();
        sw.Start();
        var result = BuildList(count);
        sw.Stop();

        //Console.WriteLine("Count: {0}", result.Count);
        //Console.WriteLine("Capacity: {0}", result.Capacity);

        ticks.Add(sw.ElapsedTicks);

        sw.Reset();
    }

    Console.WriteLine();
    Console.WriteLine("Avg Ticks: {0}", ticks.Average());

    Console.ReadKey();
}


public static List<int> BuildList(int count)
{
    var list = new List<int>();

    for (int i = 0; i < count; i++)
    {
        list.Add(i);
    }

    return list;
}

So a List of 1000 integers is giving an average (1000 samples) of 27 ticks on my system.  (I7-4790K, 16 GB 2400mhz DDR3 RAM).  Of course this is due to integers being such a simple and fast datatype.  Upping the “count” parameter to 100000 makes the numbers more meaninful.  100,000 integers tested comes in at an average (1000 samples) of 2409.186 ticks.  Now I’ll change the app to use a specified list size rather than parameterless.  This requires only a single change to the app.

public static List<int> BuildList(int count)
{
    var list = new List<int>(count); //Size of list specified up front

    for (int i = 0; i < count; i++)
    {
        list.Add(i);
    }

    return list;
}

This change results in an average of 1544.961 ticks, a sizable decrease!

Lets try this same test with a more complicated object.

static void Main(string[] args)
{
    var sw = new System.Diagnostics.Stopwatch();
    int count = 10000;
    int tests = 1000;

    var ticks = new List(tests);

    //First call outside of our stopwatch so JIT Compiling can happen
    BuildList(count);


    for (int i = 0; i < tests; i++)
    {
        sw.Reset();
        sw.Start();
        var result = BuildList(count);
        sw.Stop();

        //Console.WriteLine("Count: {0}", result.Count);
        //Console.WriteLine("Capacity: {0}", result.Capacity);

        ticks.Add(sw.ElapsedTicks);

        sw.Reset();
    }

    Console.WriteLine();
    Console.WriteLine("Avg Ticks: {0}", ticks.Average());

    Console.ReadKey();
}


public static List<Employee> BuildList(int count)
{
    var list = new List<Employee>();

    for (int i = 0; i < count; i++)
    {
        list.Add(new Employee(i));
    }

    return list;
}


public class Employee
{
    public int ID { get; set; }
    public string Name { get; set; }
    public string ExtraData1;
    public string ExtraData2;
    public string ExtraData3;

    public Employee(int id)
    {
        this.ID = id;

        Name = "This is a person's name";
        ExtraData1 = "This is some more data about this employee";
        ExtraData2 = "More data about the employee, because we couldn't fit everything we wanted in ExtraData1";
        ExtraData3 = "Yet another field of custom information about the employee.  If this were from an actual HR system these field names might even be accurate";
    }
}

10,000 employees loaded into a List,  averaging 4359.531 ticks using a parameterless constructor.  Changing to a constructor specifying the List size up front results in an average of 1440.092 ticks, clearly the better choice.  Here is a screenshot comparing memory usage of the parameterless vs specified count test of bogus Employee objects.  The top image is of the parameterless List.  You can clearly see the excessive memory usage spikes and extra garbage collection.

List_mem_use

From my testing, the more complex your object is that you are putting in the List, the greater benefit you will see by specifying the List capacity yourself.

However, a word of caution.  The capacity “doubling” rule works the same for specified capacities as it does for default capacity Lists.  That means if you create your own List with a capacity of 10,000 to load your 9,997 employees from the database, and an HR user adds 4 new employees, .NET will up your List to a capacity of 20,000.

The moral of the story is, when creating a List<T> it may be worthwhile to think back to older times and consider how you would have sized an array for the same task.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s