Recently in .NET Category

Generic Accumulators in C#, Redux

|

It turns out one of the most-trafficked pages on this site is my discussion of generic accumulators in C#. It occurs to me that it could use a bit of an update, as some newer features like lambdas and the predefined Func<> family simplifies things quite a bit:

class Program
{
    public static Func<T, T> MakeAccumulator<T>(T start, Func<T, T, T> addFunction)
    {
        return inc => start = addFunction(start, inc);
    }

    static void Main(string[] args)
    {
        var intAccumulator = MakeAccumulator(0, (i, j) => i + j);

        Debug.Assert(0 == intAccumulator(0));
        Debug.Assert(1 == intAccumulator(1));
        Debug.Assert(11 == intAccumulator(10));
        Debug.Assert(55 == intAccumulator(44));

        var floatAccumulator = MakeAccumulator(0.0, (i, j) => i + j);

        Debug.Assert(0 == floatAccumulator(0.0));
        Debug.Assert(0.1 == floatAccumulator(0.1));
        Debug.Assert(1.1 == floatAccumulator(1.0));
        Debug.Assert(5.5 == floatAccumulator(4.4));

        var stringAccumulator = MakeAccumulator("", (i, j) => i + j);

        Debug.Assert("" == stringAccumulator(""));
        Debug.Assert("ZYZ" == stringAccumulator("ZYZ"));
        Debug.Assert("ZYZZY" == stringAccumulator("ZY"));
        Debug.Assert("ZYZZYVA" == stringAccumulator("VA"));

        Console.WriteLine("Success!");
        Console.ReadLine();
    }
}

So there's that. Still not terribly useful, but I do like shortening code.

Normal function calls are easy to write; you call DoSomething( ), it executes and returns, and you continue on your merry way.

Asynchronous function calls seem, at first blush, only a little more difficult — instead of calling DoSomething() and executing your follow-up code after it returns, you pass in a callback: BeginDoSomething( Action callback ).

So, problem solved, let's go home. Unless you need a return value, that is. But even then it seems simple; to turn a synchronous method like int CalculateSomething( ) into an asynchronous method, you just pass it a delegate that takes a parameter: void BeginCalculateSomething( Action<int> callback ).

So is that it? Nope. Because all of that is wrong.

Even though the original DoSomething( ) method had no return type, it still had a return path — it could throw an exception. Let's imagine that BeginDoSomething looks something like this:


public void BeginDoSomething(Action callback)
{
    PrepareForLongRunningOperation( );
    
    ThreadPool.QueueUserWorkItem(
        delegate
            {
                LongRunningOperation();

                callback();
            });
}

A handy way to think about this sort of thing is to figure out where a thrown exception would emerge.

If something goes wrong within the call to PrepareForLongRunningOperation, that happens in the same context as the calling code — any exceptions will throw up to the calling code and come out of its call to BeginDoSomething. The same applies to the call to ThreadPool.QueueUserWorkItem — no problem there.

But what if LongRunningOperation throws?

LongRunningOperation would throw up into whatever internal part of the ThreadPool implementation actually launched it. That exception can't come out of ThreadPool.QueueUserWorkItem, because by the time the asynchronous anonymous delegate is running ThreadPool.QueueUserWorkItem has already returned. And since the exception can't come out of ThreadPool.QueueUserWorkItem, it also can't come out of BeginDoSomething — which means there's no way for the calling code to get the exception.

There are two main approaches to this problem — error handlers and completion calls.

Error Handlers

Instead of passing your begin method one callback, pass two: a callback to be invoked if everything goes to plan, and an exception-accepting callback to which errors will be passed.


public void BeginDoSomething(Action callback, Action<Exception> errorHandler)
{
    PrepareForLongRunningOperation( );
    
    ThreadPool.QueueUserWorkItem(
        delegate
            {
                try
                {
                    LongRunningOperation();
                    callback();
                }
                catch(Exception ex)
                {
                    errorHandler(ex);
                }
            });
}

// Sample usage:

BeginDoSomething(
    delegate
    {
		// Do something now that we're done
    },
    delegate(Exception ex)
    {
		// Do something with the error
    });

There are strengths and weaknesses to this approach. The biggest strength of this model is that it forces the calling code to think about error handling — the prompt for it is right there in the method signature. Error handling tends to fall through the cracks in any sort of code, but it's especially easy to overlook in an asynchronous context (It's also a lot more dangerous in an asynchronous context, because often dropping a callback invocation will cause a process to spin forever, sucking down resources and accomplishing nothing).

Separating the success case from the failure case may be either a strength or a weakness, depending on the particular task. Sometimes it makes your code much cleaner, but it often happens that your success and error handler need to share context and implementation, which can make for some very ugly code.

Completion Calls

Instead of just invoking a parameter-less Action callback or a single-parameter Action<TReturn> callback, your code calls a single-parameter callback and passes it an Action or Func<TReturn> that the callback in turn invokes.


public void BeginDoSomething(Action<Action> callback)
{
    PrepareForLongRunningOperation( );
    
    ThreadPool.QueueUserWorkItem(
        delegate
            {
                try
                {
                    LongRunningOperation();
                    callback(delegate { });
                }
                catch(Exception ex)
                {
                    callback(delegate { throw ex; });
                }
            });
}

// Sample Usage:

BeginDoSomething(
    delegate(Action complete)
    {
        try
        {
            complete();
        }
        catch (Exception ex)
        {
            // Do something with the error
            return;
        }
        // Do something now that we're done
	})

At first blush, this seems like a much clumsier solution; you're essentially trusting the calling code to call your completion method. That's true, at least in this case.

Where completion calls really shine are for asynchronous calls returning values; instead of calling their callback and handing in an Action, you call their callback and give it a Func<TReturn>, which they then must invoke to get their result. That gives you an opportunity to throw exceptions that they can't cleverly bypass:


public void BeginCalculateSomething(Action<Func<int>> callback)
{
    PrepareForLongRunningCalculation();

    ThreadPool.QueueUserWorkItem(
        delegate
        {
            try
            {
                int result = LongRunningCalculation();
                callback(() => result);
            }
            catch (Exception ex)
            {
                callback(delegate { throw ex; });
            }
        });
}

// Sample Usage:

BeginCalculateSomething(
    delegate(Func<int> complete)
    {
        int result;
        try
        {
            result = complete();
        }
        catch (Exception ex)
        {
            // Do something with the error
            return;
        }
        // Do something with the content of result;
	})

Personally I prefer completion calls, mainly because the the pattern works so well for return values. In practice, any time you need this sort of a pattern it's because you care about return values; if you need to ensure that X happens after Y, it's generally because X depends on the result of Y. If X doesn't depend on Y, that's often a sign that you're being too linear in your analysis and that the tasks should be happening in parallel.

Microsoft seems to have collectively reached the same conclusion; IHttpAsyncHandler, the asynchronous methods off of SqlCommand, and the asynchronous forms of WebRequest all use the completion call approach.

Functional Programming

|
public void BeginGetSingle(TIdentityCriteria identityCriteria, CompletionCallback<TItem> callback)
{
    TFilterCriteria filterCriteria =
        CriteriaUtilities.UpgradeCriteria<TIdentityCriteria, TFilterCriteria>(identityCriteria);

    RestClient.BeginGet<TItem>(
        CriteriaUtilities.CriteriaToUrl(
            filterCriteria,
            m_Map,
            m_ServiceUrlBase,
            r => typeof (TItem).FullName.Equals(r.OutputPayloadClass) && r.AllowedVerbs.Contains(Verb.Get)),
        completionFunction => callback(() => completionFunction().Payload));        
}

My function takes a function and when done it calls that function, passing a function that the calling function calls for the result.

Now in fact my function calls another function taking a function accepting a function to call for its result, and to that function it passes a function which when called calls the first function passing a new function that when called calls the function that was passed to the function that my function passed to the other function, thereby returning the result of that function to the function that called my function.

And they say you can't write Lisp code in C#.

Constant Documentation

|

The only C++ feature I seriously miss in other languages is const. Not in the simple 'this variable is a constant' usage, where it's nothing but a type-checked #define — no, I miss the const keyword in the interaction of const methods and const objects.

• • •

First some background — I don't expect folks who haven't spent time with C++ to know much about this, and a surprising number of veteran C++ coders seem to be unaware of it as well. In C++, const can be many things. One role is as a type modifier that means "I'm not going to change this". A common usage is to accept a const references:


int calculateAverage( const std::vector<int> & values );

That const means "I'm just going to look at it, I'm not going to mutate it" (or, if you like thinking functionally, "I'm a real function, hand me my immutable input" — more on that angle later). What does that mean, in terms of actual code? It means the body of that function can't call (for instance) values.push_back( 12 ) and mutate the data it was handed. And when I say "can't" I don't mean "shouldn't", I mean the compiler will actually flag that as an error and stop you. If you keep trying it'll even call you names.

Now obviously the body of the function still needs a way to call methods on that vector — it'll need some to get at the contents, whether it's running a for loop from zero to values.size( ) and calling operator[] in between, or looping from begin() to end(), or whatever. So what makes those methods okay, and push_back, pop_back, insert, swap and various other mutators verboten?

The answer (as you've doubtless cleverly deduced already) is the const keyword. In C++ const isn't just a type modifier for variables and parameters, it's also a permitted modifier on methods too. The size( ) method on an array is declared like this:


size_type size() const

That const means "you can call me on a constant object, object reference, or object pointer! I promise I'm safe!". And that isn't an idle promise, it's a promise enforced by the compiler. If you declare a const method in a class and then try to alter member variables, the compiler's going to call it an error (unless that member is declared as volatile, but that's getting into advanced C++ and beyond what I'm interested in talking about here).

The final crucial piece of the big, tasty, onst-in-C-plus-plus-pie is the fact that you can overload by const-ness. The aforementioned std::vector actually declares not one but two operator []s (or should that be 'operators []'?), one of which returns a reference to the space in the underlying array, and one of which returns the simple value.

• • •

So that's what I mean by const when I say "I like const in C++".

The usual justification for this feature set is efficiency; the compiler can differentiate between const operator[], which returns a simple value, and non-const operator[], which must return either a reference to an internal data structure or a proxy object, neither of which is necessarily easy or fast.

But me, I like const for a different reason. And that's documentation.

• • •

Backwards digression, the second: Defining the types and parameters of a function or method isn't a fundamental necessity; you could define a language where functions access the parameters they were passed by indexing into an array and assuming things are where they belong. It's very easy to do this in JavaScript, for example — the language doesn't verify parameters, and a function is free to get them out of the local arguments array:


function sayHello() {
	alert("Hello, " + arguments[0] + "!");
}
sayHello('World');

There are two reasons to specify type parameters; to tell the compiler the semantics of the function, and to tell the human reading the code the semantics of the function. I'm phrasing those in parallel for a reason, because they're two aspects of the same principle; function argument lists exist to document the code. This just happens to be one of the (distressingly rare) cases where the documentation is so intrinsic to the language that we don't realize it's documentation.

• • •

There's something special in that. Microsoft recognizes how important self-documenting code is; that's how IntelliSense works, and I think you (or I, at least) could argue very plausibly that (a) the entire .NET architecture can be viewed as a means to improve and extend IntelliSense and (b) this may well represent Microsoft's biggest strategic advantage over its competitors. The core idea that makes IntelliSense work is that the best documentation of the code is the implementation itself; the more information you can mine out of that, the better.

• • •

The back-end of one of my employer's new (award-winning!) products is a suite of RESTful web service APIs. One of the core parts of any such system is the routing engine; given the URI of an incoming request, how do we map that to the bunch of code that represents the appropriate response? Our system uses XML-formatted resource maps that contain the information necessary to do that; a resource map contains not only URI paths, but the parameters to extract from those URI paths and the types of those parameters, as well as the required and optional query parameters and their types... it's pretty rich and pretty nifty.

But the really nifty part is that I had a moment of insight one day and realized I could hardcode a mock resource into the router that ran the resource map through an XSL transformer and returned a pretty (or at least, legible) HTML page that documents the system. Every URI, every parameter, all right there.

And — and this is the important bit — nobody ever has to worry about updating this documentation. It's automatically derived from the actual implementation of the system; to add something new to the system, you need to add it to the resource map, and bam, there it is in the help page.

• • •

This sort of implementation-derived documentation is the primary advantage that strong, static typing has over more dynamic approaches. IntelliSense for a more dynamic language means your either execute the code as it's being typed and use real-time introspection (as proposed by Steve Yegge in a talk at Stanford — and probably other places, but that the first time I saw it in print) or you start trying to layer a more static type system on top of them. Either way has its problems. Running the code so you can IntelliSense it has sandboxing problems, particularly if you're doing anything that's concurrent or distributed, and laying a static type system on top of a dynamic type system is kind of an advanced, compiler-powered form of missing the point, if you ask me. I have a lot of vaguely-connected thoughts on that last point that I should probably bundle together at some point. The Readers' Digest version is that adding classes to JavaScript is like adding wheels to a unicycle, in that your 'addition' has the unavoidable side effect of squashing what made the original unique. But that's a rant for another day.

Languages like C#, C++, and Java don't have those problems when it comes to rich syntax acting as embedded documentation. It's still not easy, but it's a well-studied problem, and there's existing code you can point to and say "here, here's where that feature goes!".

And it is happening, albeit slowly.

Generics in Java are widely (and correctly) criticized for not being integrated into the VM the way .NET 2.0 generics are, but on the other hand they do serve as a form of compiler-checked documentation; this ArrayList is of Marklars, and that ArrayList is of Widgets. The compiler will force me to keep those types distinct, and I can look a method signature and know what it expects. Yes, it's really just an annotation for the programmer and the compiler and doesn't actually change the generated code, but in a sense that's exactly the pointconst in C++ doesn't change the generated code either most of the time, it's just letting programmers express both constraints and intent in a richer and more robust fashion.

Critics of static typing frequently argue that it provides a false sense of security, and they're right to an extent. But that doesn't mean that static typing is bad, it simply means that static typing is insufficient. Choosing between compiler-powered semantic tests and human-written unit tests is akin to choosing between seat belts and airbags — the correct choice, anticlimactic though it may seem, is both.

• • •

So what's my point? Good languages let me do more than just write code, they let me codify what I expect my code to do and how I expect other code to call it. It lets me express in a simple and clear form every assumption I'm making, and then it explicitly checks my assumptions with all the thoroughness it can muster. That's important today for writing good code, and it's only going to get more important.

Imagine syntax-level support for concurrency.

There are methods in the next version of iClan that are designed to only be called on the main thread; any time I want to activate them from elsewhere I call performSelectorOnMainThread:withObject:waitUntilDone: (boy that's a mouthful). If Objective-C had syntax for concurrency, I could mark my methods as 'mainthreadonly', and the compiler could stop me from calling them from elsewhere. Or better yet, the compiler could note from the headers that the UI-related methods I'm calling in those methods are mainthreadonly, and from that deduce which of my methods must be mainthreadonly, and either warn me for not labeling them or just cascade the analysis all the way up the call graph.

Similar story: I recently built a lockless queue in C#; it's built around Eric Lippert's immutable queue code, and it uses Interlocked.CompareExchange to swap a new immutable value for the old immutable value (Or rather, it repeatedly tries to make that swap, regenerating a new new immutable queue from the new old immutable queue if it fails... It's fun code, I should write about it someday — it turns out not to be terribly useful, but fun nonetheless.). The exercise would've been a lot easier on a lot of levels if C# gave the const keyword all the magic powers it has in C++.

You'll notice both of those examples have a concurrency angle, and that's not a coincidence. All of the self-anointed experts agree that the age of concurrency is upon us, and for a change the experts who aren't self-anointed and are too smart and humble to actually call themselves experts and the nobodies like myself seem to be in pretty general agreement with them. And the thing about concurrency and distributed system is, well, they're hard. Maybe not hard like NP-hard hard, but still pretty bloody difficult. And the more the compiler and the runtime and the VM and the OS can do to check our work for us, the better.

Generic Accumulators in C#

|
This was written in C# 2.0; if you're using 3.0 you might want to check out my updated version

I fell down a chain of links today and wound up reading one of Paul Graham's numerous pro-Lisp essays. I responded as I typically do by attempting to implement his examples in whatever language I have handy, which today was C# 2.0.

The example Graham cites in the appendix is a function for generating accumulators:

(defun foo (n)
  (lambda (i) (incf n i)))

For those of me who always find reading Lisp akin to reading the opening chapters of the Bible (but with parentheses standing in for the voluminous 'begats'), he also helpfully provides equivalents in a few other languages, such as JavaScript:

function foo(n) { 
  return function (i) { 
		   return n += i } }

An accumulator is a function that maintains internal state; when you invoke it with a new value it adds that value to its internal state and returns the new sum. An example usage would be to initialize it to zero and invoke it with each value in an array of integers; the result of the final invocation would be the sum of the integers -- or rather, the accumulation of the values.

For instance:

var myAcc = foo(0); // Initialize to zero
myAcc(1); // 0 + 1 = 1
myAcc(10); // 0 + 1 + 10 = 11
myAcc(44); // 0 + 1 + 10 + 44 = 55

So, can we build that in C#? The answer is an enthusiastic 'kinda'!

[TestFixture]
public class AccumulatorTest
{
	private delegate int Accumulator(int inc);

	private static Accumulator MakeAccumulator(int start)
	{
		int n = start;

		return delegate(int inc)
		{
			n = n + inc;
			return n;
		};
	}

	[Test]
	public void TestAccumulator()
	{
		Accumulator acc = MakeAccumulator(0);

		Assert.AreEqual(0 + 0, acc(0));
		Assert.AreEqual(0 + 0 + 1, acc(1));
		Assert.AreEqual(0 + 0 + 1 + 10, acc(10));
		Assert.AreEqual(0 + 0 + 1 + 10 + 44, acc(44));
	}
}

It's not actually that long -- the TestAccumulator function at the bottom is a quickie NUnit test case to verify it's doing what I think it should. It's still longer than the Lisp or JavaScript version though.

But there's one other major difference from those versions -- typing. The C# version only works on integers, whereas the Lisp and JavaScript version will work on anything that can be added. But if we try to genericize the C# code we'll run into problems at the n = n + inc statement, because you can't add two objects and thus can't add two generic objects without a generic constraint guaranteeing you can. And since there's no IAddable interface that integers and floats and strings and everything implement (and since C# still doesn't have a more generalized form of where clause that would let us specify constraints as duck types rather than static types), we're basically SOL with a generic C# accumulator.

Or are we? (Bum-bum-BUM!)

What if we passed our MakeAccumulator function two parameters: a starting value, and a delegate function to use for incrementing?

[TestFixture]
public class AccumulatorTest
{
	private delegate T Accumulator<T>(T inc);

	private delegate T AccumulationDelegate<T>(T a, T b);

	private static Accumulator<T> MakeAccumulator<T>(T start, AccumulationDelegate<T> accumulationDelegate)
	{
		T n = start;

		return delegate(T inc)
		{
			n = accumulationDelegate(n, inc);
			return n;
		};
	}

	[Test]
	public void TestIntAccumulator()
	{
		Accumulator<int> acc = MakeAccumulator(0, delegate(int a, int b) { return a + b; });

		Assert.AreEqual(0 + 0, acc(0));
		Assert.AreEqual(0 + 0 + 1, acc(1));
		Assert.AreEqual(0 + 0 + 1 + 10, acc(10));
		Assert.AreEqual(0 + 0 + 1 + 10 + 44, acc(44));
	}

	[Test]
	public void TestFloatAccumulator()
	{
		Accumulator<float> acc = MakeAccumulator((float)0, delegate(float a, float b) { return a + b; });

		Assert.AreEqual(0 + 0, acc(0));
		Assert.AreEqual(0 + 0 + 1, acc(1));
		Assert.AreEqual(0 + 0 + 1 + 10, acc(10));
		Assert.AreEqual(0 + 0 + 1 + 10 + 44, acc(44));
	}

	[Test]
	public void TestStringAccumulator()
	{
		Accumulator<string> acc = MakeAccumulator("", delegate(string a, string b) { return a + b; });

		Assert.AreEqual("" + "", acc(""));
		Assert.AreEqual("" + "" + "marklar", acc("marklar"));
		Assert.AreEqual("" + "" + "marklar" + "smurf", acc("smurf"));
		Assert.AreEqual("" + "" + "marklar" + "smurf" + "foo", acc("foo"));
	}
}

Yay!

So what is it good for? Absolutely nothing! But it sure was fun!

In C++, I could write a template like the following:


std::map<string, string> PropertyList;

template<class T>
T GetProperty(const string & propertyName) {
return T.Parse(PropertyList[propertyName]);
}

And it would work so long as it was used properly; if you tried to fetch a property of a type that lacked a Parse method, you'd fail to compile. The error message would be a hideous mess, but as long as you the programmer made sure that all your Ts could .Parse, you were golden.

Compare and contrast with C# (2.0):


Dictionary<String, String> PropertyList;

T GetProperty<T>(String propertyName)
{
return T.Parse(PropertyList[propertyName]);
}

That won't work, because the .NET compiler wants to be sure that every potential T that this generic method could ever meet has a .Parse method. Really sure. This appears to be because the .NET 2.0 runtime contains intrinsic support for generics, which is actually a really neat feature; in C++ each instantiation of a template becomes its own chunk of code, and the space requirements grow far faster than most programmers expect. In this sense at least C# generics are far superior, as they propagate the savings in time and effort from the programmer all the way through to the execution environment, while C++ saves the programmer some time but makes the end user pay full freight.

So what's Microsoft's solution to the problem? Pretty much what you'd expect from a company that likes solving every problem with a database: they added a 'where' clause.


T GetProperty<T>(String propertyName) where T : IParseable
{
return T.Parse(PropertyList[propertyName]);
}

Oh, neat! That earns me two things: first, the IDE will stop me if I try to invoke this for a type that doesn't implement the IParseable interface and will do so with a polite error message; second, I can now call any method defined by the IParseable interface within my generic method.

Problem solved, right? Nope! Because there isn't an IParseable. Ain't no such beasty. And if I need to support arbitrary built-in types for my property-fetching system (and it turns out that in the situation that inspired this missive, I do) I'm pretty-much S-O-L. Even if I could jump forward in time to get a release copy of the new Orcas build of Visual Studio and upgrade to C# 3.0 with those shiny new extension-methods (which, by the way, Objective-C has had for ages under the name 'Categories' -- but that's another rant) I'd still be screwed. Extension methods would allow me to put a .Parse method on any and every class I want, but they don't provide me a mechanism to inject interfaces into existing classes. The best I could do is inject .Parse into the object base class and then override it appropriately for each case I care about. Of course that depends on the fact that built-in methods with the same signature as extension methods override the extension, and from what I've read that particular behavior's still a little bit up in the air.

It also leaves me in a situation that's worse than C++ started from, at least from a maintenance standpoint; I can now attempt to query any object of any type using my GetProperty method, and it'll compile just fine. But if it's a type for which my generic object.Parse injection won't work, I'll find out at run time instead of compile time. Really the best thing I could do would be to make my default object.Parse method throw a NotImplementedException with a helpful error message, and I think anyone who's worked on a large project will agree that that kind of solution is a special new kind of suck.

So what's the solution? Well, it's to cheat:


T GetProperty<T>(String propertyName) where T : IParseable
{
(T)typeof(T).GetMethod("Parse", new Type[] { typeof(string) }).Invoke(null, new object[] { PropertyList[propertyName] });
}

I feel dirty writing that, but at least it compiles and works -- as long as you remember not to pass in something without a Parse method, because that's a one-way-trip to UnhandledExceptionville.

Pages