Category Archives: .NET

General .NET

Performance of the “parameterless” constructor of List<T>

I’ve always tried to specify the size of my List<T> collections whenever I can.  I do this just because I “know” that I will get better performance by specifying the collection size up front, but I never knew exactly how much difference it really made.  I know other developers use the parameterless constructor without giving it a second thought, and I can’t think of a time I’ve ever encountered a noticeable performance difference either way.

Just due to an abundance of curiosity, I decided to test it.

As we all know, generic collections are just a bunch of sweet, hot, sexy goodness sitting on top of the same old-fashioned arrays we all grew up with.  Because of this, List<T> doesn’t actually do anything we couldn’t always have done in the past with arrays and lots of our own custom code.  That being said, I love me some generic collections.

However, I think a big pitfall that many developers are falling into with generic collections is overusing the parameterless constructor for List<T>, and other generic collections.  As we all know, it’s not possible to resize an array once it’s been allocated.  The only way to grow an array is to actually create a second, larger, array and copy the contents from the first array into it and then, ideally, to free up the resources used by the original array so they can be returned to the system.

Since List<T> is just using arrays under the hood, it is subject to this same inherent restriction.  How, then, does List<T> handle having data continuously added to it when we never told it what size the array should be?  To my knowledge, this is not documented as part of any spec from Microsoft, so all we can do is test it and see; and keep in mind that it could change at any time for any future version of .NET.

List<T> provides a .Capacity() method which returns an integer value that represents the .Length of the current underlying array.  I’ll be using that method extensively as we test further.

To start with, lets verify that my last statement is actually true with a simple console app.  For reference, I will be targeting .NET 4.5.2 x64 in all of my examples.

static void Main(string[] args)
{
    var myList = new List(50);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);

    Console.ReadKey();
}

This gives us the expected output of 50 for List capacity, and in the old-school array world would have looked something like this.

int[] myArr = new int[50];

But what about the parameterless constructor for List<T>, what is its capacity?  Let’s modify our console app and find out.

static void Main(string[] args)
{
    var myList = new List();
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);

    Console.ReadKey();
}

As it turns out, this gives us an output of 0.  Huh, is that even possible with arrays?  Well, it turns out that it is; but as soon as you try to use your 0 length array you’ll find yourself facing a friendly IndexOutOfRangeException.

static void Main(string[] args)
{
    int[] myArr = new int[0];
    myArr[0] = 5; //IndexOutOfRangeException

    Console.ReadKey();
}

So what happens when we add an item to a List?  Fortunately, .NET knows it cant add an item to a 0-length array, so it creates a new array for us with a capacity of 4 prior to performing our first .Add().

static void Main(string[] args)
{
    var myList = new List();

    //Starts with 0 capacity
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);

    //Add an item to the List
    myList.Add(5);

    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Now our capacity is 4

    Console.ReadKey();
}

So the obvious question is, what happens if I create a List and add 5 items to it?  The answer is, .NET doubles the capacity from 4 to 8.

static void Main(string[] args)
{
    var myList = new List();

    //Starts with 0 capacity
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);

    //Add an item to the List
    myList.Add(1);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Now our capacity is 4

    myList.Add(2);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Capacity is still 4

    myList.Add(3);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Capacity is still 4

    myList.Add(4);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Capacity is still 4

    myList.Add(5);
    Console.WriteLine("myList Capacity: {0}", myList.Capacity);  //Capacity is now 8

    Console.ReadKey();
}

So remember that since List is using arrays under the hood, what actually happened is we created a 4 element array, filled it, realized it was too small, created an 8 element array, copied the 4 elements from our first array into the new array, marked our first array as eligible for garbage collection, then added our 5th item to the new 8 element array.

So, we’ve seen the capacity go from 4 –> 8, what happens if we exceed 8?  Does the capacity grow by an additional 4 to 12, or does something else happen?  It turns out the capacity doubles again to 16.

static void Main(string[] args)
{
    var myList = new List();

    for (int i = 0; i < 8; i++)
    {
        myList.Add(i);
    }

    Console.WriteLine("myList Capacity: {0}", myList.Capacity); //8

    myList.Add(27);

    Console.WriteLine("myList Capacity: {0}", myList.Capacity); //16


    Console.ReadKey();
}

So lets think about what happened here again.

  1. We added our first element, .NET allocated a 4 element array and put our value in the first slot.
  2. We added 3 more elements to finish filling out the 4 element array.
  3. We added a 5th element, .NET allocated an 8 element array, copied our 4 elements into it, then added the 5th element to it.  Our 4 element array now needs to be garbage collected.
  4. We added 3 more elements to finish filling up the new 8 element array.
  5. We added a ninth element, .NET allocated a 16 element array, copied our 8 elements into it, then added the 9th element to it.  Our 8 element array now needs to be garbage collected.

We’re really making the system work a lot harder than it needs to, assuming we had some idea of how many values were going to be in our collection from the beginning.

This pattern of array-capacity doubling as the List grows continues regardless of the number of elements.  4, 8, 16, 32, 64, 128, 256, and so on.  This means that if you have a List with 100 elements in it that was created using the parameterless constructor it will have gone through 5 array growth/copy cycles and be using up enough memory to hold 128 elements rather than the 100 you are actually using.

You can calculate the number of growth cycles using the following formula.  Round the result up to the nearest whole number, which will be the number of array growth cycles required using the parameterless constructor for List to fill to the specified number of elements.

Log(NumElements / 4) / Log(2)

So, what is the actual performance impact of all of this array growing?  Up until know we’ve been using ints, so lets continue with that for now and find out.

static void Main(string[] args)
{
    var sw = new System.Diagnostics.Stopwatch();
    int count = 1000;
    int tests = 1000;

    var ticks = new List(tests);

    //First call outside of our stopwatch so JIT Compiling can happen
    BuildList(count);


    for (int i = 0; i < tests; i++)
    {
        sw.Reset();
        sw.Start();
        var result = BuildList(count);
        sw.Stop();

        //Console.WriteLine("Count: {0}", result.Count);
        //Console.WriteLine("Capacity: {0}", result.Capacity);

        ticks.Add(sw.ElapsedTicks);

        sw.Reset();
    }

    Console.WriteLine();
    Console.WriteLine("Avg Ticks: {0}", ticks.Average());

    Console.ReadKey();
}


public static List<int> BuildList(int count)
{
    var list = new List<int>();

    for (int i = 0; i < count; i++)
    {
        list.Add(i);
    }

    return list;
}

So a List of 1000 integers is giving an average (1000 samples) of 27 ticks on my system.  (I7-4790K, 16 GB 2400mhz DDR3 RAM).  Of course this is due to integers being such a simple and fast datatype.  Upping the “count” parameter to 100000 makes the numbers more meaninful.  100,000 integers tested comes in at an average (1000 samples) of 2409.186 ticks.  Now I’ll change the app to use a specified list size rather than parameterless.  This requires only a single change to the app.

public static List<int> BuildList(int count)
{
    var list = new List<int>(count); //Size of list specified up front

    for (int i = 0; i < count; i++)
    {
        list.Add(i);
    }

    return list;
}

This change results in an average of 1544.961 ticks, a sizable decrease!

Lets try this same test with a more complicated object.

static void Main(string[] args)
{
    var sw = new System.Diagnostics.Stopwatch();
    int count = 10000;
    int tests = 1000;

    var ticks = new List(tests);

    //First call outside of our stopwatch so JIT Compiling can happen
    BuildList(count);


    for (int i = 0; i < tests; i++)
    {
        sw.Reset();
        sw.Start();
        var result = BuildList(count);
        sw.Stop();

        //Console.WriteLine("Count: {0}", result.Count);
        //Console.WriteLine("Capacity: {0}", result.Capacity);

        ticks.Add(sw.ElapsedTicks);

        sw.Reset();
    }

    Console.WriteLine();
    Console.WriteLine("Avg Ticks: {0}", ticks.Average());

    Console.ReadKey();
}


public static List<Employee> BuildList(int count)
{
    var list = new List<Employee>();

    for (int i = 0; i < count; i++)
    {
        list.Add(new Employee(i));
    }

    return list;
}


public class Employee
{
    public int ID { get; set; }
    public string Name { get; set; }
    public string ExtraData1;
    public string ExtraData2;
    public string ExtraData3;

    public Employee(int id)
    {
        this.ID = id;

        Name = "This is a person's name";
        ExtraData1 = "This is some more data about this employee";
        ExtraData2 = "More data about the employee, because we couldn't fit everything we wanted in ExtraData1";
        ExtraData3 = "Yet another field of custom information about the employee.  If this were from an actual HR system these field names might even be accurate";
    }
}

10,000 employees loaded into a List,  averaging 4359.531 ticks using a parameterless constructor.  Changing to a constructor specifying the List size up front results in an average of 1440.092 ticks, clearly the better choice.  Here is a screenshot comparing memory usage of the parameterless vs specified count test of bogus Employee objects.  The top image is of the parameterless List.  You can clearly see the excessive memory usage spikes and extra garbage collection.

List_mem_use

From my testing, the more complex your object is that you are putting in the List, the greater benefit you will see by specifying the List capacity yourself.

However, a word of caution.  The capacity “doubling” rule works the same for specified capacities as it does for default capacity Lists.  That means if you create your own List with a capacity of 10,000 to load your 9,997 employees from the database, and an HR user adds 4 new employees, .NET will up your List to a capacity of 20,000.

The moral of the story is, when creating a List<T> it may be worthwhile to think back to older times and consider how you would have sized an array for the same task.

Quick Post: List<int> Intersect Performance

I noticed an interesting performance difference earlier today, so I thought I’d throw together a quick post about it just in case it could be useful for anyone else.  In a project that I’m working on, I have the need to do a good amount of intersect operations on separate collections of integers.  I happened to notice that the performance of this was affected by the order in which the collections were specified.  So lets jump right in and take a look.  I’m doing all of my tests with Visual Studio 2013 in a .NET 4.5 console application compiling for x64.  CPU: Intel Xeon E3-1230 V2 with HT enabled.

using System;
using System.Collections.Generic;
using System.Linq;

namespace CompareSpeed
{
    class Program
    {
        static void Main(string[] args)
        {
            int largeColSize = 10000;
            int smallColSize = 500;
            int iterations = 25000;

            Console.WriteLine("Total Rows: {0}", (largeColSize + smallColSize).ToString("#,0"));
            Console.WriteLine("Ratio of small to large: {0}", (decimal)smallColSize / (decimal)largeColSize);
            Console.WriteLine();

            var rand = new System.Random();
            var largeCollection = new List<int>(largeColSize);
            var smallCollection = new List<int>(smallColSize);

            //Fill large collection
            for (int i = 0; i < largeColSize; i++)
            {
                largeCollection.Add(rand.Next(int.MinValue, int.MaxValue));
            }

            //Fill small collection
            for (int i = 0; i < smallColSize; i++)
            {
                smallCollection.Add(rand.Next(int.MinValue, int.MaxValue));
            }


            var sw = new System.Diagnostics.Stopwatch();

            /*  CODE
             *  GOES
             *  HERE
             *  BASED
             *  ON
             *  TEST
             */

            Console.ReadKey();
        }
    }
}

One of the following code blocks gets placed in the commented section above based on which test I was running.

//Intersect large to small, single-threaded
sw.Start();
for (int i = 0; i < iterations; i++)
    largeCollection.Intersect(smallCollection).ToList();
sw.Stop();
Console.WriteLine("Intersect Lg->Sm Single Thread:\t {0}", sw.ElapsedMilliseconds);

//Intersect small to large, single-threaded
sw.Start();
for (int i = 0; i < iterations; i++)
    smallCollection.Intersect(largeCollection).ToList();
sw.Stop();
Console.WriteLine("Intersect Sm->Lg Single Thread:\t {0}", sw.ElapsedMilliseconds);

//Intersect large to small, parallel
sw.Start();
for (int i = 0; i < iterations; i++)
    largeCollection.AsParallel().Intersect(smallCollection.AsParallel()).ToList();
sw.Stop();
Console.WriteLine("Intersect Lg->Sm Parallel:\t {0}", sw.ElapsedMilliseconds);

//Intersect small to large, parallel
sw.Start();
for (int i = 0; i < iterations; i++)
    smallCollection.AsParallel().Intersect(largeCollection.AsParallel()).ToList();
sw.Stop();
Console.WriteLine("Intersect Sm->Lg Parallel:\t {0}", sw.ElapsedMilliseconds);

Here are the results.  All time are in milliseconds.  Green indicates shortest duration, orange indicates longest duration.

Results

Based on this testing, I think there’s a strong case to be made for performing an intersect using the larger collection on the left side.

Using the Managed Extensibility Framework (MEF) to implement a third-party plugin solution. Part 3 – Developing the plugins

Now that my application supports plugins, I’m ready to publish my plugin SDK out for other developers to make use of.  If I take a look in my .\bin\Debug folder for the MyPluginApp project (yes, I’m staying in Debug mode for the purposes of this example) I see the following.

Debug Folder Contents

MyPluginApp.exe of course is my Windows Forms Application executable, but in addition to that there is a file called MyPluginApp.Plugins.dll.  This .dll is the compiled version of the Class Library that I added to my solution in Part 2 named MyPluginApp.Plugins.  This is the .dll that I will need to make available to 3rd party developers (henceforth I will refer to this as my “SDK”).  Along, of course, with some thorough documentation outlining what exactly I am expecting them to implement and what metadata I am expecting.

Now I am going to create a new solution from the perspective of a 3rd party developer who wants to develop a plugin for my application.  I start by creating a new “Class Library” project, because I want my code to compile to a .dll that I can place in the Plugins directory for my application.  I’m going to call my new solution MyPluginApp_StarterPlugins.

Once my new solution and project have been created, I take the MyPluginApp.Plugins.dll file that I downloaded as part of the application developer’s SDK *wink wink* and place it in my project directory.  I then add the file to my project, making sure to set the Build Action to ‘None’ and Copy to Output Directory to ‘Do Not Copy.’  The step of explicitly adding the .dll file to my solution is not required, however I think it adds some clarity to what the purpose of my project is.  Now that I’ve physically gotten the SDK into the appropriate place, I need to add a reference to it within the project.  I do this by adding a reference to the actual .dll file that I just moved into my project’s directory.  Also, just to keep Visual Studio from copying the .dll to my bin\{Debug}{Release} folders, I will set “Copy Local” property of the reference to false.  I also need to add a reference to System.Windows.Forms because my intention is to use MessageBox when my plugins perform their tasks.  Finally, I need to add a reference to System.ComponentModel.Composition in order to gain access to the necessary MEF namespaces.

Project Directory

Project Directory Contents

Solution Explorer Containing File

Solution Explorer View

Reference

Solution Explorer References

I’m going to develop two simple plugins within this project.  The first will be called AdditionPlugin, and will add the two numbers together.  The second will be called MultiplicationPlugin, and will multiply the two numbers together.  These plugins will both impliment the IPlugin interface which we defined in the MyPluginApp.Plugins projects and is available to me as a 3rd party developer within the .dll that I have added to my project.

Now I’m ready to define my first plugin.  First I will create the AdditionPlugin.  I rename the Class1.cs default file to AdditionPlugin.cs and replace its contents with the following:

using System;

using System.ComponentModel.Composition; //MEF namespace for Exporting
using System.Windows.Forms; 

namespace MyPluginApp.Plugins
{
    [Export(typeof(IPlugin))] //**1
    [ExportMetadata("Name", "Addition Plugin")] //**2
    public class AdditionPlugin : IPlugin
    {
        public void HandleNumbers(decimal num1, decimal num2)
        {
            MessageBox.Show(string.Format("{0} + {1} = {2}", num1, num2, num1 + num2), "Addition Plugin");
        }
    }


    
    //**1 - This will allow the CompositionContainer in MyPluginApp know that this class is of type IPlugin, which will be loaded 
    //  when we call .GetExports<IPlugin, IPluginMetaData() in the LoadPlugins() method.

    //**2 - This is how MetaData is exported by the Plugin.  The exported metadata needs to match with the metadata interface we defined in MyPluginApp.
    //  In this case we have only 1 defined field in IPluginMetaData called "Name", so that is the only peice of metadata that we want to export.
}

Now if this plugin is enabled (remember the Checked List Box in MyPluginApp?), its HandleNumbers method will be added to the MyButtonClicked event, which gets Invoked every time the button on my form gets clicked.

Similarly to the AdditionPlugin, I add a new .cs file to the project and name it MultiplicationPlugin.cs.  Here are the contents of that file:

using System;
using System.ComponentModel.Composition; //MEF namespace for Exporting
using System.Windows.Forms;

namespace MyPluginApp.Plugins
{
    [Export(typeof(IPlugin))]
    [ExportMetadata("Name", "Multiplication Plugin")]
    public class MultiplicationPlugin : IPlugin
    {
        public void HandleNumbers(decimal num1, decimal num2)
        {
            MessageBox.Show(string.Format("{0} * {1} = {2}", num1, num2, num1 * num2), "Multiplication Plugin");
        }
    }
}

Now I can build my solution.  After the build is complete, I find the following in MyPluginApp_StarterPlugins bin\Debug folder:

image

MyPluginApp_StarterPlugins.dll needs only to be placed in the \Plugins directory of MyPluginApp prior to MyPluginApp being started up.  Here is what MyPluginApp looks like after I do just that.

MyPluginApp with loaded plugins

Clicking on MyButton at this point still yields the familiar output from before Plugin support was added.

MyPluginApp Output no plugins enabled

Clicking OK on the MessageBox dialog returns me to the application.  Now I change Number 1 to “8,” check both boxes, and then click MyButton.

MyPluginApp Enabled Plugins

MyPluginApp - App Output

MyPluginApp - Addition Plugin Output

MyPluginApp - Multiplication Plugin Output

Unchecking the box next to a plugin prior to clicking MyButton again causes that plugin to become unsubscribed from the event, so its MessageBox no longer pops up.

I’ve certainly only begun to scratch the surface of MEF in this series of posts.  But I hope that this simple tutorial showing a method of developing plugin .dll files completely independently of your original application’s code/solution will be of some value to someone.

I have posted a .zip file containing both complete solutions Here.

Using the Managed Extensibility Framework (MEF) to implement a third-party plugin solution. Part 2–Plugin Support in the Application

Now that I have a feature-complete application from my perspective, I am ready to add plugin support so that other developers can extend the functionality of my application.  I don’t want to make my application open source, so I want to just expose enough information about my application to them that they are able to create plugins.

In order to give my application access to MEF, I need to add a reference to the appropriate namespace within my windows forms application.  The namespace I need to reference is System.ComponentModel.Composition.  Once I’ve done that, I need to add a new project to my solution of type “Class Library.”  This project will be my “Plugin API” and will compile to a .dll file that I can distribute to developers as a part of my SDK so that they can create plugins that adhere to my standard.  After creating my new project, I need add a reference to that project from within my Windows Forms project.

At this point, my application solution looks like this:

Solution Explorer

Here are the contents of IPlugin.cs in the MyPluginApp.Plugins project:

using System;

namespace MyPluginApp.Plugins
{
    //This interface defines what we expect a plugin developer to impliment in order for our application to make use of their plugin
    public interface IPlugin
    {
        //This method's signature matches the MyButtonClickedEventHandler in our application
        void HandleNumbers(decimal num1, decimal num2);
    }

    //MEF allows for metadata to be exported along with the object itself, we need a separate interface to receive any metatdata that we expect
    // plugin developers to include as well.
    public interface IPluginMetaData
    {
        string Name { get;}
    }
}

Next I need to update my UI so that it is able to display the list of available plugins and allow the user enable/disable them at will.  I’ve opted to use a checked list box to achieve that goal.  The plugins will load initially disabled, and the user simply needs to check the box next to the plugin in order to enable it, similarly unchecking the box will cause the plugin to be disabled.

Here is my New UI

MyPluginApp

And now the code for the form, including the code to enable plugin support.  New lines of code are highlighted for your convenience.

using System;
using System.Windows.Forms;

using System.Collections.Generic;
using MyPluginApp.Plugins;  //Our plugin namespace
using System.ComponentModel.Composition.Hosting; //MEF namespace

namespace MyPluginApp
{
    public partial class Form1 : Form
    {
        private delegate void MyButtonClickedEventHandler(decimal num1, decimal num2);
        private event MyButtonClickedEventHandler MyButtuttonClicked;

        private readonly Dictionary<string, IPlugin> _plugins = new Dictionary<string, IPlugin>();

        public Form1()
        {
            InitializeComponent();

            this.LoadPlugins();
        }

        //the Leave event handler for both text boxes
        private void textNum_Leave(object sender, EventArgs e)
        {
            decimal d;
            var textNum = (TextBox) sender;

            if (!decimal.TryParse(textNum.Text, out d))
            {
                textNum.Text = "5";
            }
        }

        //The KeyDown event handler for both text boxes
        private void txtNum_KeyDown(object sender, KeyEventArgs e)
        {
            if (e.KeyData == Keys.Enter || e.KeyData == Keys.Return)
            {
                btnMyButton.Focus(); //fires the Leave event for the active text box
                btnMyButton.PerformClick();
            }
        }

        //OnClick event handler for the button
        private void btnMyButton_Click(object sender, EventArgs e)
        {
            decimal num1;
            decimal num2;

            MessageBox.Show(string.Format("You entered {0} and {1}", txtNum1.Text, txtNum2.Text));

            if (decimal.TryParse(txtNum1.Text, out num1) && decimal.TryParse(txtNum2.Text, out num2))
            {
                //If our new event has any subscribers, invoke the event.
                if(this.MyButtuttonClicked != null)
                    this.MyButtuttonClicked.Invoke(num1, num2);
            }
        }

        private void LoadPlugins()
        {
            var pluginDir = System.IO.Path.GetDirectoryName(Application.ExecutablePath) + @"\Plugins\";

            //Create plugin directory if it doesn't exist
            System.IO.Directory.CreateDirectory(pluginDir);

            //DirectoryCatalog gets a list of all .dll files in our plugin directory
            var catalog = new DirectoryCatalog(pluginDir, "*.dll");

            //CompositionContainer parses all of the dlls in the catalog and identifies their exports
            var container = new CompositionContainer(catalog);

            //Iterate through each export that matches our IPlugin interface and add it to our plugin collection
            foreach (var plugin in container.GetExports<IPlugin, IPluginMetaData>())
            {
                this._plugins.Add(plugin.Metadata.Name, plugin.Value);
                this.clbPlugins.Items.Add(plugin.Metadata.Name, false);
            }

            //Best practice per MS is to dispose the CompositionContainer as soon as we are finished with it.
            container.Dispose();
            catalog.Dispose();
        }

        //ItemCheck event for clbPlugins (checked list box)
        private void clbPlugins_ItemCheck(object sender, ItemCheckEventArgs e)
        {
            var item = clbPlugins.Items[e.Index].ToString();


            if (e.NewValue == CheckState.Checked)
            {
                //If the user checked the box for the plugin, subscribe the plugin's handler to the event
                this.MyButtuttonClicked += this._plugins[item].HandleNumbers;
            }
            else
            {
                //If the user unchecked the box for the plugin, unsubscribe the plugin's handler to the event
                this.MyButtuttonClicked -= this._plugins[item].HandleNumbers;
            }
        }
    }
}

This concludes post #2.  In Post #3 I will create a few example plugins in their own solution without access to any of the code contained in Form1.cs, just as I would expect a 3rd party developer to be able to do.

Using the Managed Extensibility Framework (MEF) to implement a third-party plugin solution. Part 1–Introduction

I recently started working on a project that would benefit greatly from the ability to support plugins that were developed by someone other than myself or anyone else who necessarily had access to our source code. Having never really done anything like this myself I set off and tried to research it. I found a lot of information about loading compiled .dll files, using reflection and Activator.CreateInstance and so on. Then in the comments of those posts I’d see someone say “Why don’t you use MEF?” and that’s it; no other information.

In a quick nutshell, here’s what Wikipedia has to say about MEF as of August 2014:

Managed Extensibility Framework (MEF) is a component of .NET Framework 4.0 aiming to create lightweight, extensible applications. It allows application developers to discover and use extensions with no configuration required. It also lets extension developers easily encapsulate code and avoid fragile hard dependencies. MEF also allows extensions to be reused across applications. MEF was introduced as a part of .NET 4.0 and Silverlight 4.

Additionally, here is the MSDN page for your reference.

In my research, I found a lot of great examples of using MEF to quickly load external dll files and instantiate the types that they had defined within them. The problem was, all of the examples I found had all of the projects (application and plugins) contained within a single solution. I wanted to know how to enable someone else to take my plugin specification and implement it, without me having to distribute my entire solution to them. I never did find a true tutorial on that aspect of MEF, but I did figure out a way that works and that is the way I am going to present in this series. If anyone happens upon this series and has a better method, I’d be happy to learn from you!

In this series of posts I am going to go through the process of creating a .NET 4.0 Windows Forms application that implements a simple plugin system.  The plugins system will load any available plugins on startup and allow the user to enable/disable them.  I will then create an entirely separate solution that creates some example plugins that can be loaded an used by my original program.  This should simulate the workflow that I would expect a third-party developer to go through to develop plugins for my application.

The series will consist of 3 posts.  The first post, which you are now reading, will serve as the introduction and cover the initial application (without plugin support).  In the second post, I will add plugin handling capabilities to the application, and in the third post I will create a couple sample plugins within their own solution.

I’m going to use Visual Studio 2010 for this project.  I will include zipped archives for both solutions at the end of the series.

To start out, I have created a simple application that allows the user to enter two numbers, then reports back to them what they entered.

Application

The Application

Result of pressing the button

Result of Button Press

And the code for my application

using System;
using System.Windows.Forms;

namespace MyPluginApp
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        //the Leave event handler for both text boxes
        private void textNum_Leave(object sender, EventArgs e)
        {
            decimal d;
            var textNum = (TextBox) sender;

            if (!decimal.TryParse(textNum.Text, out d))
            {
                textNum.Text = "5";
            }
        }

        //The KeyDown event handler for both text boxes
        private void txtNum_KeyDown(object sender, KeyEventArgs e)
        {
            if (e.KeyData == Keys.Enter || e.KeyData == Keys.Return)
            {
                btnMyButton.Focus(); //fires the Leave event for the active text box
                btnMyButton.PerformClick();
            }
        }

        //OnClick event handler for the button
        private void btnMyButton_Click(object sender, EventArgs e)
        {
            MessageBox.Show(string.Format("You entered {0} and {1}", txtNum1.Text, txtNum2.Text));
        }
    }
}

This concludes the first post of the series.  In the next post I will add the capability for my application to load and utilize plugins.

Index lookup or hash table? Part 2

I ended my last blog post touting the performance advantages of index lookup versus a hash table for data loading as well as data retrieving, however I added a caveat regarding the issue of removing items from the collection. With the hash table it’s pretty straightforward, Microsoft has provided a built-in method for removing an item based on its key that is relatively inexpensive to use. Removing an item from the index collection requires that we iterate through all of the items that came after the item we just removed and update their index value to represent their new position within the collection. If we have a separate client / server design, we must then also synchronize those index changes back to the client so that future calls from the client end up referencing the correct object.

Just how expensive is this maintenance; could index lookup still be a viable solution for collections that have a moderate to heavy amount of removals? Lets find out.

I’m using the same objects and junk csv file as before for my data set. See the files page to acquire the JunkData file for your own testing. Recall that this junk data set is 1 million rows of comma-separated data, the first column of which serves as a numeric row identifier, while the subsequent 6 columns contain random string data. For reference, here are the HashObject and IndexObject definitions again for reference.

    //This class can be used to represent a single row
    sealed class HashObject
    {
        public int ID { get; set; }
        public string Col1 { get; set; }
        public string Col2 { get; set; }
        public string Col3 { get; set; }
        public string Col4 { get; set; }
        public string Col5 { get; set; }
        public string Col6 { get; set; }

        public HashObject(int id, string col1, string col2, string col3, string col4, string col5, string col6)
        {
            this.ID = id;
            this.Col1 = col1;
            this.Col2 = col2;
            this.Col3 = col3;
            this.Col4 = col4;
            this.Col5 = col5;
            this.Col6 = col6;
        }
    }

    //This class can be used to represent a single row
    sealed class IndexObject
    {
        //Added int property to record the object's index location within the List
        public int index { get; set; }

        public int ID { get; set; }
        public string Col1 { get; set; }
        public string Col2 { get; set; }
        public string Col3 { get; set; }
        public string Col4 { get; set; }
        public string Col5 { get; set; }
        public string Col6 { get; set; }

        public IndexObject(int id, string col1, string col2, string col3, string col4, string col5, string col6)
        {
            this.ID = id;
            this.Col1 = col1;
            this.Col2 = col2;
            this.Col3 = col3;
            this.Col4 = col4;
            this.Col5 = col5;
            this.Col6 = col6;
        }
    }

And here is the code I am using to test performance of item removal for both the index lookup and hashtable methods.

    class Program
    {
        static void Main(string[] args)
        {
            var hash = new Dictionary<int, HashObject>();
            var index = new List<IndexObject>();

            DateTime start;
            TimeSpan duration;

            using (var sr = new System.IO.StreamReader(@".\junkdata.csv"))
            {
                while (sr.Peek() != -1)
                {
                    var line = sr.ReadLine();

                    var fields = line.Split(',');

                    var hashobject = new HashObject(Convert.ToInt32(fields[0]), fields[1], fields[2], fields[3], fields[4], fields[5], fields[6]);
                    hash.Add(hashobject.ID, hashobject);

                    var indexobject = new IndexObject(Convert.ToInt32(fields[0]), fields[1], fields[2], fields[3], fields[4], fields[5], fields[6]);
                    indexobject.index = index.Count;
                    index.Add(indexobject);
                }
            }


            //For the removal speed test we are going to need two collections, since the ID/Index #s do not match up directly
            var idx_removes = new List<int>();
            var hash_removes = new List<int>();

            //Get the list of Keys in the hash table
            var hashkeys = hash.Keys.ToList();


            //Identify set of items to remove

            //End identify set of items to remove

            start = DateTime.Now;
            foreach (var i in hash_removes)
            {
                RemoveHash(i, hash);
            }
            duration = DateTime.Now - start;
            Console.WriteLine("hashtable: {0}", duration.Ticks);


            start = DateTime.Now;
            foreach (var i in idx_removes)
            {
                RemoveIndex(i, index);
            }
            duration = DateTime.Now - start;
            Console.WriteLine("index: {0}", duration.Ticks);



            Console.ReadKey();
        }

        static void RemoveHash(int ID, Dictionary<int, HashObject> hash)
        {
            hash.Remove(ID);
        }

        static void RemoveIndex(int idx, List<IndexObject> index)
        {
            //First we remove the indicated item
            index.RemoveAt(idx);

            //Removing the item will "shift" all  of the following items 1 position towards the
            //  "front" of the List.  Since the old "idx" value was removed, the value that now sits
            //  in the position of idx will also need to have its index value decrimented.
            for (int i = idx; i < index.Count; i++)
            {
                index[i].index--;
            }   
        }
    }

Take special note of lines 37-39 in the code above. From here on out will post several different code snippets to place in that section, rather than posting the code for the entire program over and over. These snippets will test the effects of several different scenarios.

  • Removing an item at or near the “end” of the collection
  • Removing an item at or near the “beginning” of the collection
  • Removing an item at or near the “middle” of the collection

Once again for all of my tests I will be working in Visual Studio 2010 with a .NET 4.0 Command-Line Application in debug mode.

We’ll start with two different tests that remove items near the end of the collection. Test 1 will actually remove the “last” item, and Test 2 will remove an item that is 20000 positions from the end. Here is the code snippet for Test 1.

    idx_removes.Add(index.Count - 1);
    hash_removes.Add(hashkeys[index.Count - 1]);

And here are the results

hashtable: 0
index: 0

Moving right along to test #2, here’s the code.

    idx_removes.Add(index.Count - 20001);
    hash_removes.Add(hashkeys[index.Count - 20001]);

And the results

hashtable: 10032
index: 10032

As long as we’re working with items near the end of the collection, the index collection and hash table seem to be neck and neck.

In the next scenario, we’ll test the reverse situation from what we just tested. We’ll remove the “first” item, as well as an item that is 20000 positions into the collection from the beginning. Here is the code snipped for the first test.

    idx_removes.Add(0);
    hash_removes.Add(hashkeys[0]);

And here are the results

hashtable: 10032
index: 1003200

Moving right on to the second test of this scenario. Here’s the code.

    idx_removes.Add(19999);
    hash_removes.Add(hashkeys[19999]);

And the results

hashtable: 10032
index: 1003200

It’s beginning to become clear that the Remove method built into .NET’s hash table implementation has a fairly low cost regardless of the items position within the collection, while the index collection suffers from a variable amount of overhead that depends on how near to the end of the collection the item being removed is. For our final scenario, we’ll perform a removal from the middle of the collection.

    idx_removes.Add(499999);
    hash_removes.Add(hashkeys[499999]);

And the results

hashtable: 10012
index: 490588

The hash table is the clear winner when it comes to removing items. While index lookup does have an advantage when it comes to the initial load and lookup of items, if you will be doing any significant number of remove operations you may want to look towards the hash table for your needs. However if item removals are rare, index lookup may still be the better choice because of its clear advantage in lookup speed.

As always, questions or comments are welcome.
-Alan

Index Lookup or hash table?

I recently shared out the link to my blog to a few friends and colleagues. After all, that’s how you get readers, right? Anyhow, my best friend, who happens to be a PowerShell guru, was talking to me this morning about my Index Lookup post; and his question was, “why didn’t you just use a hashtable?”

We don’t, at least I don’t, think of the Hashtable very often anymore, ever since .NET 2.0 came along and swept us off our feet with the wonderful happiness that is generics. So when a PowerShell guy says “Hashtable,” what you probably will think of from a .NET perspective is a Dictionary<TKey, TValue>; this is simply the generic version of the Hashtable of old. In simplest terms, a Hashtable provides the ability to “map” a key to a specific value; and then quickly retrieve the value when you supply the key. For an in-depth discussion of hash tables, see Wikipedia, a good discussion of the Dictionary<TKey, TValue> specifically can be found on MSDN here.

I have to admit, my confidence in my Index Lookup post was somewhat shaken when he asked me about it so matter-of-factly. Had I made a mistake, was a hash table a better solution? He and I began collaborating for a little over an hour this morning testing various scenarios, which will be the subject of this post. First, we needed a data set, so I did something I should have done years ago; I generated a CSV file containing 1 million rows of junk data, the first column of which is a semi-sequential integer ID, and then 6 columns of random string data. I’ll post a link to this dataset on the files page. The file looks like 1 million rows of this:

1,tfxnvwcyycrgth,ggqbrhvgd,ltpdhe,udygxoarwof,aiozgrspzejyyukjdm,ryelqidjspjvsqid
2,xtjloonptahderoju,zvwxqciaapjr,prpljdic,rvxiwfnyv,ycjhyzdgdoo,gbsrlpfffllpdaaigc
4,btwtfgjqlgx,nutpvpe,ysqgmqtexzvzabme,imlcytzifx,klfsbjoidqgpmfsmk,fmhofmdtrzzryhsytm
5,cxqndbtcxyxggokz,nvfsphsojlr,wolgsizoyenlkfvttsd,hglyganpyzuq,dbjla,fklnjiuijmyv
8,idrpdnzsujpzjx,gdlhtexdw,bpmrddaddhyllgya,lbvnzllbgbqclkauw,xnaq,pvfpkvs
10,jsjlbbf,ebqxc,bsxyqpfbnv,ugstwagnwhjcyytq,smksdetnsledzhfoqhv,iiajjgdsqya
11,yfiblcqbzgepckwzen,rydqklai,abwtlgetyedoqtnbdl,zjdocqjy,oxpkttbyjkdvgcba,tafqpgatjn
13,nudchjeapsvxsprthl,hcnrwitezqorjfe,phbnwxalsy,naxnth,guzcgtoffjablr,eclfxwjcczye
15,dbdvjtyk,nbhzqrqkrnl,wvbnkuwrwciphbdgfrs,dspkyxfrmrilshh,bhzzbsp,tsspnjezus
16,dgrpmoffeuwwzuyfx,trlderffqdcnjqz,dzjsmkwnkpw,supdggbbjabzjgxbc,lgwnrywffiuxto,jmvnzgcoyccxkjfmv

This structure can be represented in the following objects for a storage in a Dictionary and List respectively

    //This class can be used to represent a single row
    sealed class HashObject
    {
        public int ID { get; set; }
        public string Col1 { get; set; }
        public string Col2 { get; set; }
        public string Col3 { get; set; }
        public string Col4 { get; set; }
        public string Col5 { get; set; }
        public string Col6 { get; set; }

        public HashObject(int id, string col1, string col2, string col3, string col4, string col5, string col6)
        {
            this.ID = id;
            this.Col1 = col1;
            this.Col2 = col2;
            this.Col3 = col3;
            this.Col4 = col4;
            this.Col5 = col5;
            this.Col6 = col6;
        }
    }

    //This class can be used to represent a single row
    sealed class IndexObject
    {
        //Added int property to record the object's index location within the List
        public int index { get; set; }

        public int ID { get; set; }
        public string Col1 { get; set; }
        public string Col2 { get; set; }
        public string Col3 { get; set; }
        public string Col4 { get; set; }
        public string Col5 { get; set; }
        public string Col6 { get; set; }

        public IndexObject(int id, string col1, string col2, string col3, string col4, string col5, string col6)
        {
            this.ID = id;
            this.Col1 = col1;
            this.Col2 = col2;
            this.Col3 = col3;
            this.Col4 = col4;
            this.Col5 = col5;
            this.Col6 = col6;
        }
    }

Clearly there are two different metrics to test. The first is the amount of memory used by each method, the second is the amount of time required to access a given number of objects. We’ll start with testing memory. .NET doesn’t give us many good ways to determine the amount of memory a given object or collection is using, so for our purposes I am going to simply use task manager to determine the amount of memory that the process is using. I will be running all of my code with Visual Studio 2010 as a .NET Framework 4.0 command-line application in debug mode.

First, we’ll measure the memory usage of the hash table implementation. I’ve omitted the HashObject definition for brevity.

    class Program
    {
        static void Main(string[] args)
        {
            var hash = new Dictionary<string, HashObject>();

            using (var sr = new System.IO.StreamReader(@".\junkdata.csv"))
            {
                while (sr.Peek() != -1)
                {
                    var line = sr.ReadLine();

                    var fields = line.Split(',');

                    var hashobject = new HashObject(Convert.ToInt32(fields[0]), fields[1], fields[2], fields[3], fields[4], fields[5], fields[6]);
                    hash.Add(hashobject.ID, hashobject);
                }
            }            

            Console.ReadKey();
        }
    }

I repeated this process three times and took averages of both memory usage, and the amount of time required to fully load the .csv into the hashtable. The average amount of memory allocated to my application’s process was 327212 KB, and the average amount of time to load was 2763 MS. Now let’s repeat the process with the “index” version. Again, I will omit the IndexObject definition for brevity sake.

    class Program
    {
        static void Main(string[] args)
        {
            var index = new List<IndexObject>();

            using (var sr = new System.IO.StreamReader(@".\junkdata.csv"))
            {
                while (sr.Peek() != -1)
                {
                    var line = sr.ReadLine();

                    var fields = line.Split(',');

                    var indexobject = new IndexObject(Convert.ToInt32(fields[0]), fields[1], fields[2], fields[3], fields[4], fields[5], fields[6]);
                    indexobject.index = index.Count;
                    index.Add(indexobject);
                }
            }            

            Console.ReadKey();
        }
    }

Again the process was executed three times and the results averaged; memory usage was 310432 KB on average and time to load was an average of 2673 MS. In both cases the index method seems to be more effecient, but is 16780 KB of memory and 90 MS of cpu time worth it? What if the hash table can more quickly satisfy item lookups? Let’s continue onward and find out.

Here is the code snipped I am using to get a distributed list of items to perform a lookup on. Changing the modulo factor (the ‘7’ on line 11 below) to a larger number will decrease the sample size, while decreasing it will increase the sample size.

    //For the lookup speed test we are going to need two collections, since the ID/Index #s do not match up directly
    var idx_lookups = new List<int>();
    var hash_lookups = new List<int>();

    //Get the list of Keys in the hash table
    var hashkeys = hash.Keys.ToList();

    for (int i = 0; i < index.Count; i++)
    {
        //This will give us an approximately 14% sample size
        if (i % 7 == 0)
        {
            idx_lookups.Add(i);

            //This will add the corresponding "ID" to the hash_lookups
            // collection, so both the index and hash method will be looking
            // up the same items
            hash_lookups.Add(hashkeys[i]);
        }
    }

Here is the full code of my program at this point, I’m excluding the definitions for HashObject and IndexObject to save space, and they have not changed from how they were originally posted above. This code will load both the hash table, and my index lookup list; then identify approximately 14% of the records and look them up in both the hash table and the list by index. The amount of time in Ticks (1 Tick = 100 nanoseconds) will be printed to the screen for each method.

    class Program
    {
        static void Main(string[] args)
        {
            var hash = new Dictionary<int, HashObject>();
            var index = new List<IndexObject>();

            DateTime start;

            using (var sr = new System.IO.StreamReader(@".\junkdata.csv"))
            {
                while (sr.Peek() != -1)
                {
                    var line = sr.ReadLine();

                    var fields = line.Split(',');
                    
                    var hashobject = new HashObject(Convert.ToInt32(fields[0]), fields[1], fields[2], fields[3], fields[4], fields[5], fields[6]);
                    hash.Add(hashobject.ID, hashobject);
                    
                    var indexobject = new IndexObject(Convert.ToInt32(fields[0]), fields[1], fields[2], fields[3], fields[4], fields[5], fields[6]);
                    indexobject.index = index.Count;
                    index.Add(indexobject);
                }
            }


            //For the lookup speed test we are going to need two collections, since the ID/Index #s do not match up directly
            var idx_lookups = new List<int>();
            var hash_lookups = new List<int>();

            //Get the list of Keys in the hash table
            var hashkeys = hash.Keys.ToList();

            for (int i = 0; i < index.Count; i++)
            {
                //This will give us an approximately 14% sample size
                if (i % 7 == 0)
                {
                    idx_lookups.Add(i);

                    //This will add the corresponding "ID" to the hash_lookups
                    // collection, so both the index and hash method will be looking
                    // up the same items
                    hash_lookups.Add(hashkeys[i]);
                }
            }

            start = DateTime.Now;
            foreach (var i in hash_lookups)
            {
                var obj = hash[i];
            }
            Console.WriteLine("hashtable: {0}", (DateTime.Now - start).Ticks);


            start = DateTime.Now;
            foreach (var i in idx_lookups)
            {
                var obj = index[i];
            }
            Console.WriteLine("index: {0}", (DateTime.Now - start).Ticks);

            
            Console.ReadKey();
        }
    }

On my system, this is the output:

hashtable: 50000
index: 20000

These results seem to remain consistent across different modulo values, with the index lookup method always being between 2 and 2.5 times faster than the hash table lookup.

So in summary

  • The index lookup method uses less memory
  • The index lookup method loads more quickly
  • The index lookup method is able to access items more quickly

The index lookup method does have one big disadvantage to the hash table, which I should probably mention in all fairness since I’ve been wailing on the hash table pretty hard so far. Removing items from the collection is a very dicey proposition, since it would invalidate all of your index properties that came “after” the item you removed. It’s up to you to decide which method is most appropriate for your given development situation.

Questions? Comments? Just leave a post.

Until next time!
-Alan