Performance of the “parameterless” constructor of List<T>

I’ve always tried to specify the size of my List<T> collections whenever I can.  I do this just because I “know” that I will get better performance by specifying the collection size up front, but I never knew exactly how much difference it really made.  I know other developers use the parameterless constructor without giving it a second thought, and I can’t think of a time I’ve ever encountered a noticeable performance difference either way.

Just due to an abundance of curiosity, I decided to test it.

As we all know, generic collections are just a bunch of sweet, hot, sexy goodness sitting on top of the same old-fashioned arrays we all grew up with.  Because of this, List<T> doesn’t actually do anything we couldn’t always have done in the past with arrays and lots of our own custom code.  That being said, I love me some generic collections.

However, I think a big pitfall that many developers are falling into with generic collections is overusing the parameterless constructor for List<T>, and other generic collections.  As we all know, it’s not possible to resize an array once it’s been allocated.  The only way to grow an array is to actually create a second, larger, array and copy the contents from the first array into it and then, ideally, to free up the resources used by the original array so they can be returned to the system.

Since List<T> is just using arrays under the hood, it is subject to this same inherent restriction.  How, then, does List<T> handle having data continuously added to it when we never told it what size the array should be?  To my knowledge, this is not documented as part of any spec from Microsoft, so all we can do is test it and see; and keep in mind that it could change at any time for any future version of .NET.

List<T> provides a .Capacity() method which returns an integer value that represents the .Length of the current underlying array.  I’ll be using that method extensively as we test further.

To start with, lets verify that my last statement is actually true with a simple console app.  For reference, I will be targeting .NET 4.5.2 x64 in all of my examples.

static void Main(string[] args)
{
    var myList = new List(50);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);

    Console.ReadKey();
}

This gives us the expected output of 50 for List capacity, and in the old-school array world would have looked something like this.

int[] myArr = new int[50];

But what about the parameterless constructor for List<T>, what is its capacity?  Let’s modify our console app and find out.

static void Main(string[] args)
{
    var myList = new List();
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);

    Console.ReadKey();
}

As it turns out, this gives us an output of 0.  Huh, is that even possible with arrays?  Well, it turns out that it is; but as soon as you try to use your 0 length array you’ll find yourself facing a friendly IndexOutOfRangeException.

static void Main(string[] args)
{
    int[] myArr = new int[0];
    myArr[0] = 5; //IndexOutOfRangeException

    Console.ReadKey();
}

So what happens when we add an item to a List?  Fortunately, .NET knows it cant add an item to a 0-length array, so it creates a new array for us with a capacity of 4 prior to performing our first .Add().

static void Main(string[] args)
{
    var myList = new List();

    //Starts with 0 capacity
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);

    //Add an item to the List
    myList.Add(5);

    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Now our capacity is 4

    Console.ReadKey();
}

So the obvious question is, what happens if I create a List and add 5 items to it?  The answer is, .NET doubles the capacity from 4 to 8.

static void Main(string[] args)
{
    var myList = new List();

    //Starts with 0 capacity
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);

    //Add an item to the List
    myList.Add(1);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Now our capacity is 4

    myList.Add(2);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Capacity is still 4

    myList.Add(3);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Capacity is still 4

    myList.Add(4);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Capacity is still 4

    myList.Add(5);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Capacity is now 8

    Console.ReadKey();
}

So remember that since List is using arrays under the hood, what actually happened is we created a 4 element array, filled it, realized it was too small, created an 8 element array, copied the 4 elements from our first array into the new array, marked our first array as eligible for garbage collection, then added our 5th item to the new 8 element array.

So, we’ve seen the capacity go from 4 –> 8, what happens if we exceed 8?  Does the capacity grow by an additional 4 to 12, or does something else happen?  It turns out the capacity doubles again to 16.

static void Main(string[] args)
{
    var myList = new List();

    for (int i = 0; i < 8; i++)
    {
        myList.Add(i);
    }

    Console.WriteLine("myList Capacity: {0}", myList.Capacity); //8

    myList.Add(27);

    Console.WriteLine("myList Capacity: {0}", myList.Capacity); //16


    Console.ReadKey();
}

So lets think about what happened here again.

  1. We added our first element, .NET allocated a 4 element array and put our value in the first slot.
  2. We added 3 more elements to finish filling out the 4 element array.
  3. We added a 5th element, .NET allocated an 8 element array, copied our 4 elements into it, then added the 5th element to it.  Our 4 element array now needs to be garbage collected.
  4. We added 3 more elements to finish filling up the new 8 element array.
  5. We added a ninth element, .NET allocated a 16 element array, copied our 8 elements into it, then added the 9th element to it.  Our 8 element array now needs to be garbage collected.

We’re really making the system work a lot harder than it needs to, assuming we had some idea of how many values were going to be in our collection from the beginning.

This pattern of array-capacity doubling as the List grows continues regardless of the number of elements.  4, 8, 16, 32, 64, 128, 256, and so on.  This means that if you have a List with 100 elements in it that was created using the parameterless constructor it will have gone through 5 array growth/copy cycles and be using up enough memory to hold 128 elements rather than the 100 you are actually using.

You can calculate the number of growth cycles using the following formula.  Round the result up to the nearest whole number, which will be the number of array growth cycles required using the parameterless constructor for List to fill to the specified number of elements.

Log(NumElements / 4) / Log(2)

So, what is the actual performance impact of all of this array growing?  Up until know we’ve been using ints, so lets continue with that for now and find out.

static void Main(string[] args)
{
    var sw = new System.Diagnostics.Stopwatch();
    int count = 1000;
    int tests = 1000;

    var ticks = new List(tests);

    //First call outside of our stopwatch so JIT Compiling can happen
    BuildList(count);


    for (int i = 0; i < tests; i++)
    {
        sw.Reset();
        sw.Start();
        var result = BuildList(count);
        sw.Stop();

        //Console.WriteLine("Count: {0}", result.Count);
        //Console.WriteLine("Capacity: {0}", result.Capacity);

        ticks.Add(sw.ElapsedTicks);

        sw.Reset();
    }

    Console.WriteLine();
    Console.WriteLine("Avg Ticks: {0}", ticks.Average());

    Console.ReadKey();
}


public static List<int> BuildList(int count)
{
    var list = new List<int>();

    for (int i = 0; i < count; i++)
    {
        list.Add(i);
    }

    return list;
}

So a List of 1000 integers is giving an average (1000 samples) of 27 ticks on my system.  (I7-4790K, 16 GB 2400mhz DDR3 RAM).  Of course this is due to integers being such a simple and fast datatype.  Upping the “count” parameter to 100000 makes the numbers more meaninful.  100,000 integers tested comes in at an average (1000 samples) of 2409.186 ticks.  Now I’ll change the app to use a specified list size rather than parameterless.  This requires only a single change to the app.

public static List<int> BuildList(int count)
{
    var list = new List<int>(count); //Size of list specified up front

    for (int i = 0; i < count; i++)
    {
        list.Add(i);
    }

    return list;
}

This change results in an average of 1544.961 ticks, a sizable decrease!

Lets try this same test with a more complicated object.

static void Main(string[] args)
{
    var sw = new System.Diagnostics.Stopwatch();
    int count = 10000;
    int tests = 1000;

    var ticks = new List(tests);

    //First call outside of our stopwatch so JIT Compiling can happen
    BuildList(count);


    for (int i = 0; i < tests; i++)
    {
        sw.Reset();
        sw.Start();
        var result = BuildList(count);
        sw.Stop();

        //Console.WriteLine("Count: {0}", result.Count);
        //Console.WriteLine("Capacity: {0}", result.Capacity);

        ticks.Add(sw.ElapsedTicks);

        sw.Reset();
    }

    Console.WriteLine();
    Console.WriteLine("Avg Ticks: {0}", ticks.Average());

    Console.ReadKey();
}


public static List<Employee> BuildList(int count)
{
    var list = new List<Employee>();

    for (int i = 0; i < count; i++)
    {
        list.Add(new Employee(i));
    }

    return list;
}


public class Employee
{
    public int ID { get; set; }
    public string Name { get; set; }
    public string ExtraData1;
    public string ExtraData2;
    public string ExtraData3;

    public Employee(int id)
    {
        this.ID = id;

        Name = "This is a person's name";
        ExtraData1 = "This is some more data about this employee";
        ExtraData2 = "More data about the employee, because we couldn't fit everything we wanted in ExtraData1";
        ExtraData3 = "Yet another field of custom information about the employee.  If this were from an actual HR system these field names might even be accurate";
    }
}

10,000 employees loaded into a List,  averaging 4359.531 ticks using a parameterless constructor.  Changing to a constructor specifying the List size up front results in an average of 1440.092 ticks, clearly the better choice.  Here is a screenshot comparing memory usage of the parameterless vs specified count test of bogus Employee objects.  The top image is of the parameterless List.  You can clearly see the excessive memory usage spikes and extra garbage collection.

List_mem_use

From my testing, the more complex your object is that you are putting in the List, the greater benefit you will see by specifying the List capacity yourself.

However, a word of caution.  The capacity “doubling” rule works the same for specified capacities as it does for default capacity Lists.  That means if you create your own List with a capacity of 10,000 to load your 9,997 employees from the database, and an HR user adds 4 new employees, .NET will up your List to a capacity of 20,000.

The moral of the story is, when creating a List<T> it may be worthwhile to think back to older times and consider how you would have sized an array for the same task.

Quick Post: List<int> Intersect Performance

I noticed an interesting performance difference earlier today, so I thought I’d throw together a quick post about it just in case it could be useful for anyone else.  In a project that I’m working on, I have the need to do a good amount of intersect operations on separate collections of integers.  I happened to notice that the performance of this was affected by the order in which the collections were specified.  So lets jump right in and take a look.  I’m doing all of my tests with Visual Studio 2013 in a .NET 4.5 console application compiling for x64.  CPU: Intel Xeon E3-1230 V2 with HT enabled.

using System;
using System.Collections.Generic;
using System.Linq;

namespace CompareSpeed
{
    class Program
    {
        static void Main(string[] args)
        {
            int largeColSize = 10000;
            int smallColSize = 500;
            int iterations = 25000;

            Console.WriteLine("Total Rows: {0}", (largeColSize + smallColSize).ToString("#,0"));
            Console.WriteLine("Ratio of small to large: {0}", (decimal)smallColSize / (decimal)largeColSize);
            Console.WriteLine();

            var rand = new System.Random();
            var largeCollection = new List<int>(largeColSize);
            var smallCollection = new List<int>(smallColSize);

            //Fill large collection
            for (int i = 0; i < largeColSize; i++)
            {
                largeCollection.Add(rand.Next(int.MinValue, int.MaxValue));
            }

            //Fill small collection
            for (int i = 0; i < smallColSize; i++)
            {
                smallCollection.Add(rand.Next(int.MinValue, int.MaxValue));
            }


            var sw = new System.Diagnostics.Stopwatch();

            /*  CODE
             *  GOES
             *  HERE
             *  BASED
             *  ON
             *  TEST
             */

            Console.ReadKey();
        }
    }
}

One of the following code blocks gets placed in the commented section above based on which test I was running.

//Intersect large to small, single-threaded
sw.Start();
for (int i = 0; i < iterations; i++)
    largeCollection.Intersect(smallCollection).ToList();
sw.Stop();
Console.WriteLine("Intersect Lg->Sm Single Thread:\t {0}", sw.ElapsedMilliseconds);

//Intersect small to large, single-threaded
sw.Start();
for (int i = 0; i < iterations; i++)
    smallCollection.Intersect(largeCollection).ToList();
sw.Stop();
Console.WriteLine("Intersect Sm->Lg Single Thread:\t {0}", sw.ElapsedMilliseconds);

//Intersect large to small, parallel
sw.Start();
for (int i = 0; i < iterations; i++)
    largeCollection.AsParallel().Intersect(smallCollection.AsParallel()).ToList();
sw.Stop();
Console.WriteLine("Intersect Lg->Sm Parallel:\t {0}", sw.ElapsedMilliseconds);

//Intersect small to large, parallel
sw.Start();
for (int i = 0; i < iterations; i++)
    smallCollection.AsParallel().Intersect(largeCollection.AsParallel()).ToList();
sw.Stop();
Console.WriteLine("Intersect Sm->Lg Parallel:\t {0}", sw.ElapsedMilliseconds);

Here are the results.  All time are in milliseconds.  Green indicates shortest duration, orange indicates longest duration.

Results

Based on this testing, I think there’s a strong case to be made for performing an intersect using the larger collection on the left side.

Using the Managed Extensibility Framework (MEF) to implement a third-party plugin solution. Part 3 – Developing the plugins

Now that my application supports plugins, I’m ready to publish my plugin SDK out for other developers to make use of.  If I take a look in my .\bin\Debug folder for the MyPluginApp project (yes, I’m staying in Debug mode for the purposes of this example) I see the following.

Debug Folder Contents

MyPluginApp.exe of course is my Windows Forms Application executable, but in addition to that there is a file called MyPluginApp.Plugins.dll.  This .dll is the compiled version of the Class Library that I added to my solution in Part 2 named MyPluginApp.Plugins.  This is the .dll that I will need to make available to 3rd party developers (henceforth I will refer to this as my “SDK”).  Along, of course, with some thorough documentation outlining what exactly I am expecting them to implement and what metadata I am expecting.

Now I am going to create a new solution from the perspective of a 3rd party developer who wants to develop a plugin for my application.  I start by creating a new “Class Library” project, because I want my code to compile to a .dll that I can place in the Plugins directory for my application.  I’m going to call my new solution MyPluginApp_StarterPlugins.

Once my new solution and project have been created, I take the MyPluginApp.Plugins.dll file that I downloaded as part of the application developer’s SDK *wink wink* and place it in my project directory.  I then add the file to my project, making sure to set the Build Action to ‘None’ and Copy to Output Directory to ‘Do Not Copy.’  The step of explicitly adding the .dll file to my solution is not required, however I think it adds some clarity to what the purpose of my project is.  Now that I’ve physically gotten the SDK into the appropriate place, I need to add a reference to it within the project.  I do this by adding a reference to the actual .dll file that I just moved into my project’s directory.  Also, just to keep Visual Studio from copying the .dll to my bin\{Debug}{Release} folders, I will set “Copy Local” property of the reference to false.  I also need to add a reference to System.Windows.Forms because my intention is to use MessageBox when my plugins perform their tasks.  Finally, I need to add a reference to System.ComponentModel.Composition in order to gain access to the necessary MEF namespaces.

Project Directory

Project Directory Contents

Solution Explorer Containing File

Solution Explorer View

Reference

Solution Explorer References

I’m going to develop two simple plugins within this project.  The first will be called AdditionPlugin, and will add the two numbers together.  The second will be called MultiplicationPlugin, and will multiply the two numbers together.  These plugins will both impliment the IPlugin interface which we defined in the MyPluginApp.Plugins projects and is available to me as a 3rd party developer within the .dll that I have added to my project.

Now I’m ready to define my first plugin.  First I will create the AdditionPlugin.  I rename the Class1.cs default file to AdditionPlugin.cs and replace its contents with the following:

using System;

using System.ComponentModel.Composition; //MEF namespace for Exporting
using System.Windows.Forms; 

namespace MyPluginApp.Plugins
{
    [Export(typeof(IPlugin))] //**1
    [ExportMetadata("Name", "Addition Plugin")] //**2
    public class AdditionPlugin : IPlugin
    {
        public void HandleNumbers(decimal num1, decimal num2)
        {
            MessageBox.Show(string.Format("{0} + {1} = {2}", num1, num2, num1 + num2), "Addition Plugin");
        }
    }


    
    //**1 - This will allow the CompositionContainer in MyPluginApp know that this class is of type IPlugin, which will be loaded 
    //  when we call .GetExports<IPlugin, IPluginMetaData() in the LoadPlugins() method.

    //**2 - This is how MetaData is exported by the Plugin.  The exported metadata needs to match with the metadata interface we defined in MyPluginApp.
    //  In this case we have only 1 defined field in IPluginMetaData called "Name", so that is the only peice of metadata that we want to export.
}

Now if this plugin is enabled (remember the Checked List Box in MyPluginApp?), its HandleNumbers method will be added to the MyButtonClicked event, which gets Invoked every time the button on my form gets clicked.

Similarly to the AdditionPlugin, I add a new .cs file to the project and name it MultiplicationPlugin.cs.  Here are the contents of that file:

using System;
using System.ComponentModel.Composition; //MEF namespace for Exporting
using System.Windows.Forms;

namespace MyPluginApp.Plugins
{
    [Export(typeof(IPlugin))]
    [ExportMetadata("Name", "Multiplication Plugin")]
    public class MultiplicationPlugin : IPlugin
    {
        public void HandleNumbers(decimal num1, decimal num2)
        {
            MessageBox.Show(string.Format("{0} * {1} = {2}", num1, num2, num1 * num2), "Multiplication Plugin");
        }
    }
}

Now I can build my solution.  After the build is complete, I find the following in MyPluginApp_StarterPlugins bin\Debug folder:

image

MyPluginApp_StarterPlugins.dll needs only to be placed in the \Plugins directory of MyPluginApp prior to MyPluginApp being started up.  Here is what MyPluginApp looks like after I do just that.

MyPluginApp with loaded plugins

Clicking on MyButton at this point still yields the familiar output from before Plugin support was added.

MyPluginApp Output no plugins enabled

Clicking OK on the MessageBox dialog returns me to the application.  Now I change Number 1 to “8,” check both boxes, and then click MyButton.

MyPluginApp Enabled Plugins

MyPluginApp - App Output

MyPluginApp - Addition Plugin Output

MyPluginApp - Multiplication Plugin Output

Unchecking the box next to a plugin prior to clicking MyButton again causes that plugin to become unsubscribed from the event, so its MessageBox no longer pops up.

I’ve certainly only begun to scratch the surface of MEF in this series of posts.  But I hope that this simple tutorial showing a method of developing plugin .dll files completely independently of your original application’s code/solution will be of some value to someone.

I have posted a .zip file containing both complete solutions Here.

Using the Managed Extensibility Framework (MEF) to implement a third-party plugin solution. Part 2–Plugin Support in the Application

Now that I have a feature-complete application from my perspective, I am ready to add plugin support so that other developers can extend the functionality of my application.  I don’t want to make my application open source, so I want to just expose enough information about my application to them that they are able to create plugins.

In order to give my application access to MEF, I need to add a reference to the appropriate namespace within my windows forms application.  The namespace I need to reference is System.ComponentModel.Composition.  Once I’ve done that, I need to add a new project to my solution of type “Class Library.”  This project will be my “Plugin API” and will compile to a .dll file that I can distribute to developers as a part of my SDK so that they can create plugins that adhere to my standard.  After creating my new project, I need add a reference to that project from within my Windows Forms project.

At this point, my application solution looks like this:

Solution Explorer

Here are the contents of IPlugin.cs in the MyPluginApp.Plugins project:

using System;

namespace MyPluginApp.Plugins
{
    //This interface defines what we expect a plugin developer to impliment in order for our application to make use of their plugin
    public interface IPlugin
    {
        //This method's signature matches the MyButtonClickedEventHandler in our application
        void HandleNumbers(decimal num1, decimal num2);
    }

    //MEF allows for metadata to be exported along with the object itself, we need a separate interface to receive any metatdata that we expect
    // plugin developers to include as well.
    public interface IPluginMetaData
    {
        string Name { get;}
    }
}

Next I need to update my UI so that it is able to display the list of available plugins and allow the user enable/disable them at will.  I’ve opted to use a checked list box to achieve that goal.  The plugins will load initially disabled, and the user simply needs to check the box next to the plugin in order to enable it, similarly unchecking the box will cause the plugin to be disabled.

Here is my New UI

MyPluginApp

And now the code for the form, including the code to enable plugin support.  New lines of code are highlighted for your convenience.

using System;
using System.Windows.Forms;

using System.Collections.Generic;
using MyPluginApp.Plugins;  //Our plugin namespace
using System.ComponentModel.Composition.Hosting; //MEF namespace

namespace MyPluginApp
{
    public partial class Form1 : Form
    {
        private delegate void MyButtonClickedEventHandler(decimal num1, decimal num2);
        private event MyButtonClickedEventHandler MyButtuttonClicked;

        private readonly Dictionary<string, IPlugin> _plugins = new Dictionary<string, IPlugin>();

        public Form1()
        {
            InitializeComponent();

            this.LoadPlugins();
        }

        //the Leave event handler for both text boxes
        private void textNum_Leave(object sender, EventArgs e)
        {
            decimal d;
            var textNum = (TextBox) sender;

            if (!decimal.TryParse(textNum.Text, out d))
            {
                textNum.Text = "5";
            }
        }

        //The KeyDown event handler for both text boxes
        private void txtNum_KeyDown(object sender, KeyEventArgs e)
        {
            if (e.KeyData == Keys.Enter || e.KeyData == Keys.Return)
            {
                btnMyButton.Focus(); //fires the Leave event for the active text box
                btnMyButton.PerformClick();
            }
        }

        //OnClick event handler for the button
        private void btnMyButton_Click(object sender, EventArgs e)
        {
            decimal num1;
            decimal num2;

            MessageBox.Show(string.Format("You entered {0} and {1}", txtNum1.Text, txtNum2.Text));

            if (decimal.TryParse(txtNum1.Text, out num1) && decimal.TryParse(txtNum2.Text, out num2))
            {
                //If our new event has any subscribers, invoke the event.
                if(this.MyButtuttonClicked != null)
                    this.MyButtuttonClicked.Invoke(num1, num2);
            }
        }

        private void LoadPlugins()
        {
            var pluginDir = System.IO.Path.GetDirectoryName(Application.ExecutablePath) + @"\Plugins\";

            //Create plugin directory if it doesn't exist
            System.IO.Directory.CreateDirectory(pluginDir);

            //DirectoryCatalog gets a list of all .dll files in our plugin directory
            var catalog = new DirectoryCatalog(pluginDir, "*.dll");

            //CompositionContainer parses all of the dlls in the catalog and identifies their exports
            var container = new CompositionContainer(catalog);

            //Iterate through each export that matches our IPlugin interface and add it to our plugin collection
            foreach (var plugin in container.GetExports<IPlugin, IPluginMetaData>())
            {
                this._plugins.Add(plugin.Metadata.Name, plugin.Value);
                this.clbPlugins.Items.Add(plugin.Metadata.Name, false);
            }

            //Best practice per MS is to dispose the CompositionContainer as soon as we are finished with it.
            container.Dispose();
            catalog.Dispose();
        }

        //ItemCheck event for clbPlugins (checked list box)
        private void clbPlugins_ItemCheck(object sender, ItemCheckEventArgs e)
        {
            var item = clbPlugins.Items[e.Index].ToString();


            if (e.NewValue == CheckState.Checked)
            {
                //If the user checked the box for the plugin, subscribe the plugin's handler to the event
                this.MyButtuttonClicked += this._plugins[item].HandleNumbers;
            }
            else
            {
                //If the user unchecked the box for the plugin, unsubscribe the plugin's handler to the event
                this.MyButtuttonClicked -= this._plugins[item].HandleNumbers;
            }
        }
    }
}

This concludes post #2.  In Post #3 I will create a few example plugins in their own solution without access to any of the code contained in Form1.cs, just as I would expect a 3rd party developer to be able to do.

Using the Managed Extensibility Framework (MEF) to implement a third-party plugin solution. Part 1–Introduction

I recently started working on a project that would benefit greatly from the ability to support plugins that were developed by someone other than myself or anyone else who necessarily had access to our source code. Having never really done anything like this myself I set off and tried to research it. I found a lot of information about loading compiled .dll files, using reflection and Activator.CreateInstance and so on. Then in the comments of those posts I’d see someone say “Why don’t you use MEF?” and that’s it; no other information.

In a quick nutshell, here’s what Wikipedia has to say about MEF as of August 2014:

Managed Extensibility Framework (MEF) is a component of .NET Framework 4.0 aiming to create lightweight, extensible applications. It allows application developers to discover and use extensions with no configuration required. It also lets extension developers easily encapsulate code and avoid fragile hard dependencies. MEF also allows extensions to be reused across applications. MEF was introduced as a part of .NET 4.0 and Silverlight 4.

Additionally, here is the MSDN page for your reference.

In my research, I found a lot of great examples of using MEF to quickly load external dll files and instantiate the types that they had defined within them. The problem was, all of the examples I found had all of the projects (application and plugins) contained within a single solution. I wanted to know how to enable someone else to take my plugin specification and implement it, without me having to distribute my entire solution to them. I never did find a true tutorial on that aspect of MEF, but I did figure out a way that works and that is the way I am going to present in this series. If anyone happens upon this series and has a better method, I’d be happy to learn from you!

In this series of posts I am going to go through the process of creating a .NET 4.0 Windows Forms application that implements a simple plugin system.  The plugins system will load any available plugins on startup and allow the user to enable/disable them.  I will then create an entirely separate solution that creates some example plugins that can be loaded an used by my original program.  This should simulate the workflow that I would expect a third-party developer to go through to develop plugins for my application.

The series will consist of 3 posts.  The first post, which you are now reading, will serve as the introduction and cover the initial application (without plugin support).  In the second post, I will add plugin handling capabilities to the application, and in the third post I will create a couple sample plugins within their own solution.

I’m going to use Visual Studio 2010 for this project.  I will include zipped archives for both solutions at the end of the series.

To start out, I have created a simple application that allows the user to enter two numbers, then reports back to them what they entered.

Application

The Application

Result of pressing the button

Result of Button Press

And the code for my application

using System;
using System.Windows.Forms;

namespace MyPluginApp
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        //the Leave event handler for both text boxes
        private void textNum_Leave(object sender, EventArgs e)
        {
            decimal d;
            var textNum = (TextBox) sender;

            if (!decimal.TryParse(textNum.Text, out d))
            {
                textNum.Text = "5";
            }
        }

        //The KeyDown event handler for both text boxes
        private void txtNum_KeyDown(object sender, KeyEventArgs e)
        {
            if (e.KeyData == Keys.Enter || e.KeyData == Keys.Return)
            {
                btnMyButton.Focus(); //fires the Leave event for the active text box
                btnMyButton.PerformClick();
            }
        }

        //OnClick event handler for the button
        private void btnMyButton_Click(object sender, EventArgs e)
        {
            MessageBox.Show(string.Format("You entered {0} and {1}", txtNum1.Text, txtNum2.Text));
        }
    }
}

This concludes the first post of the series.  In the next post I will add the capability for my application to load and utilize plugins.

Renaming poorly named constraints in MS SQL

I recently found myself in a situation where I was working on tuning SQL Server performance in a SQL Server 2008 R2 environment. What made this environment different than other environments that I had worked on in the past, was that the environment had ~400 databases that were all (essentially) identical in structure. The company sells a remote-hosted solution, so whenever a new client comes on board they simply spin each client off their own appropriately structured database that they can then begin filling with their own data.

In the process of tuning the SQL performance, there were many instances where I needed to make changes to Primary and Foreign keys, unfortunately many of the keys did not have the same name in each database; because when the tables were originally created the constraints were not explicitly named; and were instead named by SQL Server itself.

Consider these simple examples.

--Implicit
CREATE TABLE dbo.Users
(
	id_Users int IDENTITY(1,1) NOT NULL PRIMARY KEY
	, UserName nvarchar(20) NOT NULL
	, UserPassword nvarchar(20) NOT NULL
);

CREATE TABLE dbo.UserLoginHistory
(
	id_Users int NOT NULL FOREIGN KEY REFERENCES dbo.Users (id_Users)
	, LoginTime datetime2 NOT NULL
);
CREATE CLUSTERED INDEX cidx_dbo_UserLoginHistory ON dbo.UserLoginHistory (id_Users);

select name, 'pk' AS [type] FROM sys.key_constraints WHERE parent_object_id = OBJECT_ID('dbo.Users')
union all
select name, 'fk' AS [type] FROM sys.foreign_keys WHERE parent_object_id = OBJECT_ID('dbo.UserLoginHistory');

And here is an example of explicitly defined primary and foreign keys:

--Explicit
CREATE TABLE dbo.Users
(
	id_Users int IDENTITY(1,1) NOT NULL
	, UserName nvarchar(20) NOT NULL
	, UserPassword nvarchar(20) NOT NULL
	, CONSTRAINT pk_dbo_Users PRIMARY KEY CLUSTERED (id_Users)
);

CREATE TABLE dbo.UserLoginHistory
(
	id_Users int NOT NULL
	, LoginTime datetime2 NOT NULL
	, CONSTRAINT fk_dbo_UserLoginHistory_dbo_Users FOREIGN KEY (id_Users) REFERENCES dbo.Users (id_Users)
);
CREATE CLUSTERED INDEX cidx_dbo_UserLoginHistory ON dbo.UserLoginHistory (id_Users);

select name, 'pk' AS [type] FROM sys.key_constraints WHERE parent_object_id = OBJECT_ID('dbo.Users')
union all
select name, 'fk' AS [type] FROM sys.foreign_keys WHERE parent_object_id = OBJECT_ID('dbo.UserLoginHistory');

And the results from each:

Implicit
type	name
pk	PK__Users__B3F21DFE5BED93EA
fk	FK__UserLogin__id_Us__5ECA0095

Explicit
type	name
pk	pk_dbo_Users
fk	fk_dbo_UserLoginHistory_dbo_Users

When you allow SQL Server to name your constraints for you, it will generate a name that is partially based on the object name, then append a set of random hex characters to the end. So in the case of the system I was working in, this resulted in 400 different names for the same constraints. So any maintenance I wanted to perform against those constraints would first have to dynamically identify the name of the constraint before the logic could proceed. Further, any future maintenance would require the same level of effort. I opted to create these two simple scripts which would name all Primary and Foreign keys based on a defined naming convention, thus bringing all of the databases into sync and making future maintenance much easier, so long as future development does not use implicit constraint creation techniques.

-------------------------------------------
--Rename all system generated primary keys using the following format
-- pk_schema_tablename
-------------------------------------------
DECLARE @oldname nvarchar(776)
        , @newname sysname;

DECLARE key_curse CURSOR FAST_FORWARD FOR (SELECT
                                                '[' + SCHEMA_NAME(kc.schema_id) + '].[' + kc.name + ']' AS [oldname]
                                                , 'pk_' + SCHEMA_NAME(kc.schema_id) + '_' + OBJECT_NAME(kc.parent_object_id) AS [newname]
                                                FROM sys.key_constraints kc
                                                WHERE
                                                    kc.type = 'PK'
                                                    AND kc.name <> 'pk_' + SCHEMA_NAME(kc.schema_id) + '_' + OBJECT_NAME(kc.parent_object_id));

OPEN key_curse

    FETCH NEXT FROM key_curse INTO @oldname, @newname;

    WHILE(@@FETCH_STATUS = 0)
    BEGIN

        print substring(@oldname, charindex('.', @oldname) + 1, len(@oldname)) + '     ' + @newname;
        EXEC sp_rename @objname = @oldname, @newname = @newname, @objtype = 'OBJECT';

        FETCH NEXT FROM key_curse INTO @oldname, @newname;
    END

CLOSE key_curse;
DEALLOCATE key_curse;
GO

-------------------------------------------
--Rename all system generated foreign keys using the following format
-- fk_schema1_tablename1_schema2_tablename2
-- where schema/table 1 = the ReferencING table
-- and schema/table 2 = the ReferencED table
-------------------------------------------
DECLARE @oldname nvarchar(776)
		, @newname sysname;

DECLARE key_curse CURSOR FAST_FORWARD FOR (select
											'[' + SCHEMA_NAME(o_ing.schema_id) + '].[' + fk.name + ']' AS [oldname]
											, 'fk_' + SCHEMA_NAME(o_ing.schema_id) + '_' + o_ing.name + '_' + SCHEMA_NAME(o_ed.schema_id) + '_' + o_ed.name AS [newname]
											FROM sys.foreign_keys fk
											JOIN sys.objects o_ing ON o_ing.object_id = fk.parent_object_id
											JOIN sys.objects o_ed ON o_ed.object_id = fk.referenced_object_id
											WHERE
												fk.name <> 'fk_' + SCHEMA_NAME(o_ing.schema_id) + '_' + o_ing.name + '_' + SCHEMA_NAME(o_ed.schema_id) + '_' + o_ed.name); --The format of our standard key

OPEN key_curse

	FETCH NEXT FROM key_curse INTO @oldname, @newname;

	WHILE(@@FETCH_STATUS = 0)
	BEGIN

		print SUBSTRING(@oldname, charindex('.', @oldname) + 1, len(@oldname)) + '     ' + @newname;
		EXEC sp_rename @objname = @oldname, @newname = @newname, @objtype = 'OBJECT';

		FETCH NEXT FROM key_curse INTO @oldname, @newname;
	END

CLOSE key_curse;
DEALLOCATE key_curse;
GO

There was already a system in place to apply a SQL script to every client database, so there was no need to employ sp_MSForeachdb (Just Kidding!) Aaron Bertrand’s sp_foreachdb.

I haven’t tested this script against any other versions of MSSQL, but I think it should work fine on anything >= SQL Server 2005.

In closing, to all you developers out there, please take the time to explicitly name your constraints!

Index lookup or hash table? Part 2

I ended my last blog post touting the performance advantages of index lookup versus a hash table for data loading as well as data retrieving, however I added a caveat regarding the issue of removing items from the collection. With the hash table it’s pretty straightforward, Microsoft has provided a built-in method for removing an item based on its key that is relatively inexpensive to use. Removing an item from the index collection requires that we iterate through all of the items that came after the item we just removed and update their index value to represent their new position within the collection. If we have a separate client / server design, we must then also synchronize those index changes back to the client so that future calls from the client end up referencing the correct object.

Just how expensive is this maintenance; could index lookup still be a viable solution for collections that have a moderate to heavy amount of removals? Lets find out.

I’m using the same objects and junk csv file as before for my data set. See the files page to acquire the JunkData file for your own testing. Recall that this junk data set is 1 million rows of comma-separated data, the first column of which serves as a numeric row identifier, while the subsequent 6 columns contain random string data. For reference, here are the HashObject and IndexObject definitions again for reference.

    //This class can be used to represent a single row
    sealed class HashObject
    {
        public int ID { get; set; }
        public string Col1 { get; set; }
        public string Col2 { get; set; }
        public string Col3 { get; set; }
        public string Col4 { get; set; }
        public string Col5 { get; set; }
        public string Col6 { get; set; }

        public HashObject(int id, string col1, string col2, string col3, string col4, string col5, string col6)
        {
            this.ID = id;
            this.Col1 = col1;
            this.Col2 = col2;
            this.Col3 = col3;
            this.Col4 = col4;
            this.Col5 = col5;
            this.Col6 = col6;
        }
    }

    //This class can be used to represent a single row
    sealed class IndexObject
    {
        //Added int property to record the object's index location within the List
        public int index { get; set; }

        public int ID { get; set; }
        public string Col1 { get; set; }
        public string Col2 { get; set; }
        public string Col3 { get; set; }
        public string Col4 { get; set; }
        public string Col5 { get; set; }
        public string Col6 { get; set; }

        public IndexObject(int id, string col1, string col2, string col3, string col4, string col5, string col6)
        {
            this.ID = id;
            this.Col1 = col1;
            this.Col2 = col2;
            this.Col3 = col3;
            this.Col4 = col4;
            this.Col5 = col5;
            this.Col6 = col6;
        }
    }

And here is the code I am using to test performance of item removal for both the index lookup and hashtable methods.

    class Program
    {
        static void Main(string[] args)
        {
            var hash = new Dictionary<int, HashObject>();
            var index = new List<IndexObject>();

            DateTime start;
            TimeSpan duration;

            using (var sr = new System.IO.StreamReader(@".\junkdata.csv"))
            {
                while (sr.Peek() != -1)
                {
                    var line = sr.ReadLine();

                    var fields = line.Split(',');

                    var hashobject = new HashObject(Convert.ToInt32(fields[0]), fields[1], fields[2], fields[3], fields[4], fields[5], fields[6]);
                    hash.Add(hashobject.ID, hashobject);

                    var indexobject = new IndexObject(Convert.ToInt32(fields[0]), fields[1], fields[2], fields[3], fields[4], fields[5], fields[6]);
                    indexobject.index = index.Count;
                    index.Add(indexobject);
                }
            }


            //For the removal speed test we are going to need two collections, since the ID/Index #s do not match up directly
            var idx_removes = new List<int>();
            var hash_removes = new List<int>();

            //Get the list of Keys in the hash table
            var hashkeys = hash.Keys.ToList();


            //Identify set of items to remove

            //End identify set of items to remove

            start = DateTime.Now;
            foreach (var i in hash_removes)
            {
                RemoveHash(i, hash);
            }
            duration = DateTime.Now - start;
            Console.WriteLine("hashtable: {0}", duration.Ticks);


            start = DateTime.Now;
            foreach (var i in idx_removes)
            {
                RemoveIndex(i, index);
            }
            duration = DateTime.Now - start;
            Console.WriteLine("index: {0}", duration.Ticks);



            Console.ReadKey();
        }

        static void RemoveHash(int ID, Dictionary<int, HashObject> hash)
        {
            hash.Remove(ID);
        }

        static void RemoveIndex(int idx, List<IndexObject> index)
        {
            //First we remove the indicated item
            index.RemoveAt(idx);

            //Removing the item will "shift" all  of the following items 1 position towards the
            //  "front" of the List.  Since the old "idx" value was removed, the value that now sits
            //  in the position of idx will also need to have its index value decrimented.
            for (int i = idx; i < index.Count; i++)
            {
                index[i].index--;
            }   
        }
    }

Take special note of lines 37-39 in the code above. From here on out will post several different code snippets to place in that section, rather than posting the code for the entire program over and over. These snippets will test the effects of several different scenarios.

  • Removing an item at or near the “end” of the collection
  • Removing an item at or near the “beginning” of the collection
  • Removing an item at or near the “middle” of the collection

Once again for all of my tests I will be working in Visual Studio 2010 with a .NET 4.0 Command-Line Application in debug mode.

We’ll start with two different tests that remove items near the end of the collection. Test 1 will actually remove the “last” item, and Test 2 will remove an item that is 20000 positions from the end. Here is the code snippet for Test 1.

    idx_removes.Add(index.Count - 1);
    hash_removes.Add(hashkeys[index.Count - 1]);

And here are the results

hashtable: 0
index: 0

Moving right along to test #2, here’s the code.

    idx_removes.Add(index.Count - 20001);
    hash_removes.Add(hashkeys[index.Count - 20001]);

And the results

hashtable: 10032
index: 10032

As long as we’re working with items near the end of the collection, the index collection and hash table seem to be neck and neck.

In the next scenario, we’ll test the reverse situation from what we just tested. We’ll remove the “first” item, as well as an item that is 20000 positions into the collection from the beginning. Here is the code snipped for the first test.

    idx_removes.Add(0);
    hash_removes.Add(hashkeys[0]);

And here are the results

hashtable: 10032
index: 1003200

Moving right on to the second test of this scenario. Here’s the code.

    idx_removes.Add(19999);
    hash_removes.Add(hashkeys[19999]);

And the results

hashtable: 10032
index: 1003200

It’s beginning to become clear that the Remove method built into .NET’s hash table implementation has a fairly low cost regardless of the items position within the collection, while the index collection suffers from a variable amount of overhead that depends on how near to the end of the collection the item being removed is. For our final scenario, we’ll perform a removal from the middle of the collection.

    idx_removes.Add(499999);
    hash_removes.Add(hashkeys[499999]);

And the results

hashtable: 10012
index: 490588

The hash table is the clear winner when it comes to removing items. While index lookup does have an advantage when it comes to the initial load and lookup of items, if you will be doing any significant number of remove operations you may want to look towards the hash table for your needs. However if item removals are rare, index lookup may still be the better choice because of its clear advantage in lookup speed.

As always, questions or comments are welcome.
-Alan