May 2009 Archives

Dear Reader,

Your code sucks.

It's okay though, so does everyone else's code, including mine. But a big part of improving is recognizing what you're doing wrong, and a big part of that is having random bald guys berate you. So in the spirit of improving the world, here are five thoughts on writing code that sucks less. Heavily C#-centric because that's what I've been reading lately, but I think most of it's meaningful in any environment modulo some terminological adjustments.

1. Dumb Comments Are Dumb

A comment that doesn't say anything useful is worse than worthless; it's actively harmful.

If you have a property named Marklar with a comment telling me that it 'gets or sets the Marklar', you haven't told me anything of value, and reading your useless comment has cost me precious seconds of my life that I'll never get back. You've also probably raised my blood pressure a bit, but I suppose that's more on me than on you.

Now if a Marklar is an odd or confusing type of thing, and you can't reasonably expect everyone to know what it is or why it's there, by all means explain it. Similarly, if this property has unusual behavior in some respect — perhaps it's lazily loaded — mention that in a comment. But don't just put a bloody comment on every property because some never-been-employed professor once told you that that's the right thing to do. It isn't. If someone can't tell you that the EyeColor property lets you get or set the color of the receiver's eyes, they have no business writing code in the first place.

You should only put comments where they're needed, and generally if a comment is legitimately needed it's a sign that you're writing bad code. Don't name the property Age and then add a comment explaining that it's in seconds; name it AgeInSeconds and skip the comment entirely. That saves you typing, saves me reading, and has the further benefit of being completely impossible for future programmers to miss unless they're severely mentally impaired (in which case, see above, they have no business writing code in the first place).

Similarly, if you feel compelled to add a comment explaining that the next five lines of code are stochastically approximating the chronosynclastic infundibulumation of the yada, yada... that's your sign that those five lines should be extracted into a method with an appropriate name like 'AproximateTechnobabble'. That makes it easier for me to read the original method because I can temporarily ignore the hard stuff, while also making it easier for me to understand the hard stuff later on if I need to (since it's now isolated with clear, labeled inputs).

Finally, dumb comments are dumb not just in the code, but everywhere. If your checkin note is "I changed the twelfth character of line 37 from an 'e' to a 'z'", you have so completely and cleanly missed the point of checkin notes that I can't even scrape together a proper disparaging simile. Checkin notes are there so I can get a high-level understanding of what you've checked in, without having to open the bloody file. "Ran a match along a patch of bark" is a lousy checkin note when what you actually did was "burned down the forest". Tell me about the forest, I couldn't care less about the trees.

2. Every Time You Type '#region' I Die A Little Inside

Regions suck. If I were king I'd ban 'em outright, so sincere and total is my dislike. They're a trick you play on yourself to let you pretend that your bad code is good, and they tell the world that you're a good enough coder to know a three-thousand-line file stinks, but that you're either not good enough to fix it or you're just plain too lazy. In either situation the solution is to step up, not to add a region and sweep part of the bad code under the rug.

A special note on regions within methods: wrapping a region around your twelve constructors to hide the excess complexity is a venal sin. Putting a region inside your method to hide 3,000 of its 4,000 lines is a mortal sin.

Some people might ask why extracting methods (like AproximateTechnobabble above) is good but collapsing regions within a method is evil. The simple answer is encapsulation; a region isn't a new scope, so it has access to all the fields and variables of the method it's within. That greatly increases the difficulty of understanding the code, because now I need to actively read through and figure out how the code inside the region interacts with the code outside the region. An extracted method doesn't have the problem, because I can look at exactly what parameters are passed in and exactly what comes out, without having to trace through and worry about side effects. Regions superficially seem to simplify the complexity of your code; extracted methods actually do.

3. Static Methods Are Your Friends, And You Should Want More Of Them

Static methods are pure functions, in the mathematical sense; they're black boxes that take some input and return some output. Remember where I said methods are better than regions because they simplify complexity? The same logic recommends static methods over non-static methods. Static methods help you make sure you're not weaving a tangled web of side effects and dependencies with each method call.

Imagine coming upon a call to myComplexMarklar.CalculateInfundibulumation(). Now, if you don't know what infundibulumation is (incidentally, the state of being or the act of becoming funnel-like or akin to a funnel), and have no idea how you would calculate it (no idea), that method doesn't tell you a whole lot. Now imagine stepping through that rather lengthy method and figuring out how it works. You are potentially entering a world of pain.

Now imagine a static method, Marklar.CalculateInfundibulumation(float topRadius, float bottomRadius, float height), and further imagine that the non-static method works by calling the static method and passing along the three appropriate values. It has instantly become a whole lot easier for me to understand what infundibulumation is, because now I know that no matter how gnarly the math is it's just a function of those three values. Now not only can I better understand that particular method, but a lot of the meaning of the Marklar class in general has become more clear.

In an object-oriented language, every instance method is getting more parameters than you've listed in the declaration line; it's also getting access to all of the instance's member fields. Imagine seeing that written out — would you deliberately write a method that takes a hundred parameters but only uses two of them? That might be a good sign that it's time for some judiciously-applied static methods.

Static methods have other benefits. For one, they're a lot more reusable; if the static CalculateInfundibulumation( tR, bR, h ) method is general, you might find you need it in another class. Depending on your situation, you could either make it public and call it right out of Marklar, move it to a new common base class, or move it to a utility class. Each of those solutions would be far more complicated if it were non-static. I've been in situations where I needed a complex non-static method from a pre-existing class that I couldn't reasonably modify, and ended up constructing an instance with a bunch of bogus data, setting the handful of fields that the calculation actually depends on, and calling the instance method on that. That's hideous — ugliness wrapped in brittleness wrapped in unmaintainable complexity. Don't make me do that again.

Static methods also lend themselves incredibly well to a form of optimization known as memoization. The root of memoization is basically the realization that the next time you run the same calculation, you'll reach the same result (yeah, I know, 'duh', right?). The tie-in to static methods is that static methods let you know precisely what the inputs are to the calculation you're running, while instance methods don't. Memoizing a static method is 'easy' (in the sense of being a solved problem), while memoizing instance methods can be a buggy pain in the butt.

3b. Subverting Static Methods Is Bad And People Who Do It Should Be Forced To Write Pascal

Sometimes you'll see a static class, full of static fields and static variables and static methods. That's not what I'm talking about when I say static is good. That's not good, it's the other thing. Bad. It's a crappy-ass degenerate-global-variable-singleton. Don't do that. Smart people hate singletons.


4. Properties Are Properties, Methods Are Methods

A property is intended to represent an attribute of an object, while a method models an action that the object may take or that may be taken upon the object. These are different concepts, and you need to think about which is appropriate for a particular case. If you have a property with a verb in the name, you're almost certainly thinking about things the wrong way.

Some people will tell you that properties should just be get/set wrappers around your fields. Others will say it's okay to do some field validation. Others go hog wild. It's hard to draw bright lines because the simple fact is that it's sometimes a judgement call. A good heuristic is to think like someone using your class. The default assumption is that properties are simple get/set wrappers around a private field, and everything you do to violate that assumption nudges the scale towards building a method.

Let's say your class has a ServerConnection property that returns some sort of communications channel to a remote server. As a caller, my mindset is that this connection already exists, and the property is giving me access to it. But if only a subset of the calling code ever needs this connection, it might make sense to lazily connect when the property is first accessed rather than connecting in the constructor and keeping around a connection that may never actually be needed. But that's changing the semantics of the property a little bit, because the mindset of the calling code is still 'this already exists, the property is just handing it to me', and the calling code may not expect this property to take a long time to execute. Maybe now there should be a progress bar, or the call should happen in another thread.

Like I said, judgement call. In situations like that one, asking ten developers will get you fifteen different opinions. There isn't a 'right' answer, it's just a matter of getting it right enough. All anyone can ask is that you think.

But in other cases, it's totally cut and dried. I've seen code with a LastSaveDate property that saves to the database and returns DateTime.Now. If you're not physically ill at the thought of that, you need to figure why and get sick fast.

5. Public Methods Are Contracts, Which Should Not Be Entered Lightly

When you make a method public, you're inviting the world to use it and you're promising that it will continue to do exactly what it does now. Forever.

Let's say you're creating a CarFactory class that creates new Car objects. CarFactory.CreateCar( ) was starting to get a little long, so you started extracting out helper methods: CreateTransmission( ), CreateBody( ), CreateWheel( ), etc. That's good, keep doing it.

Now let's say you decided that since CreateCar( ) was public that you might as well make CreateTransmission( ) et al public too. That's no-so-much with the good. Unless you know right now that someone else needs to create a transmission without creating a full car, leave that method private.

To the calling code, a CarFactory and a Car are black boxes. The calling code doesn't need to know that a car contains a transmission, and next year when you change CarFactory and Car to build fully-electric vehicles, which don't need a transmission, you won't have to worry about supporting that public method that a bunch of your coworkers have just spent twelve months linking against.

There's a more general lesson here: Share the hot dog, not the making thereof.

Which, come to think of it, is a good moral on which to end any story.

Twit, Twit, Twit

|

I 'got' Twitter pretty quickly, because I was actually using it before it existed. I used to carry on lossy low-priority conversations with people via AIM status messages; it was a good way to say stuff that wasn't important enough to warrant a message and also to broadcast things in the vein of "hey, here's a fun link" without needing to directly spam people.

I'm pretty sure this is roughly what the designers of Twitter had in mind, incidentally. If you take a look at the XML or JSON feeds provided by the Twitter API, each individual tweet is actually called a 'status' — the URI for a particular tweet is <username>/status/<some_big_number> — the <OL> element that contains the tweets on Twitter HTML page is of CSS class 'statuses'... the list goes on. Obviously the original intent of the tool doesn't define what the tool is actually used for — I don't think AIM away messages were intended as an out-of-band communications protocol, at least at first — but it's interesting nonetheless.

What's also interesting is that Twitter users seem to fall into two separate and largely non-overlapping groups. Group A are people using it much like I am, as a non-ACKed broadcast medium for random thoughts that aren't important enough for a reliable messaging protocol like IM, email, or a blog post. Group B are the people who see Twitter as a marketing/communication tool that will allow them to get the word out to members of Group A, for whatever value of 'the word' is meaningful to them.

I see three ways this can go: either the two groups diverge completely and group B develops a self-sustaining ecosystem of their own; or group B recognizes that marketing in this context is futile and dwindles and dies; or group B is so persistent in their efforts that group A abandons Twitter and finds a new bicycle.

If you get the 'new bicycle' thing, you're definitely in group A, though not getting it doesn't necessarily relegate you to group B.

But if you know you're in group B and want to understand group A, then you should figure out what it means, why you don't get it, and what you can do to make sure you get to autotune the new new bicycle... and then despair of ever marketing effectively on something like Twitter.

Constant Documentation

|

The only C++ feature I seriously miss in other languages is const. Not in the simple 'this variable is a constant' usage, where it's nothing but a type-checked #define — no, I miss the const keyword in the interaction of const methods and const objects.

• • •

First some background — I don't expect folks who haven't spent time with C++ to know much about this, and a surprising number of veteran C++ coders seem to be unaware of it as well. In C++, const can be many things. One role is as a type modifier that means "I'm not going to change this". A common usage is to accept a const references:


int calculateAverage( const std::vector<int> & values );

That const means "I'm just going to look at it, I'm not going to mutate it" (or, if you like thinking functionally, "I'm a real function, hand me my immutable input" — more on that angle later). What does that mean, in terms of actual code? It means the body of that function can't call (for instance) values.push_back( 12 ) and mutate the data it was handed. And when I say "can't" I don't mean "shouldn't", I mean the compiler will actually flag that as an error and stop you. If you keep trying it'll even call you names.

Now obviously the body of the function still needs a way to call methods on that vector — it'll need some to get at the contents, whether it's running a for loop from zero to values.size( ) and calling operator[] in between, or looping from begin() to end(), or whatever. So what makes those methods okay, and push_back, pop_back, insert, swap and various other mutators verboten?

The answer (as you've doubtless cleverly deduced already) is the const keyword. In C++ const isn't just a type modifier for variables and parameters, it's also a permitted modifier on methods too. The size( ) method on an array is declared like this:


size_type size() const

That const means "you can call me on a constant object, object reference, or object pointer! I promise I'm safe!". And that isn't an idle promise, it's a promise enforced by the compiler. If you declare a const method in a class and then try to alter member variables, the compiler's going to call it an error (unless that member is declared as volatile, but that's getting into advanced C++ and beyond what I'm interested in talking about here).

The final crucial piece of the big, tasty, onst-in-C-plus-plus-pie is the fact that you can overload by const-ness. The aforementioned std::vector actually declares not one but two operator []s (or should that be 'operators []'?), one of which returns a reference to the space in the underlying array, and one of which returns the simple value.

• • •

So that's what I mean by const when I say "I like const in C++".

The usual justification for this feature set is efficiency; the compiler can differentiate between const operator[], which returns a simple value, and non-const operator[], which must return either a reference to an internal data structure or a proxy object, neither of which is necessarily easy or fast.

But me, I like const for a different reason. And that's documentation.

• • •

Backwards digression, the second: Defining the types and parameters of a function or method isn't a fundamental necessity; you could define a language where functions access the parameters they were passed by indexing into an array and assuming things are where they belong. It's very easy to do this in JavaScript, for example — the language doesn't verify parameters, and a function is free to get them out of the local arguments array:


function sayHello() {
	alert("Hello, " + arguments[0] + "!");
}
sayHello('World');

There are two reasons to specify type parameters; to tell the compiler the semantics of the function, and to tell the human reading the code the semantics of the function. I'm phrasing those in parallel for a reason, because they're two aspects of the same principle; function argument lists exist to document the code. This just happens to be one of the (distressingly rare) cases where the documentation is so intrinsic to the language that we don't realize it's documentation.

• • •

There's something special in that. Microsoft recognizes how important self-documenting code is; that's how IntelliSense works, and I think you (or I, at least) could argue very plausibly that (a) the entire .NET architecture can be viewed as a means to improve and extend IntelliSense and (b) this may well represent Microsoft's biggest strategic advantage over its competitors. The core idea that makes IntelliSense work is that the best documentation of the code is the implementation itself; the more information you can mine out of that, the better.

• • •

The back-end of one of my employer's new (award-winning!) products is a suite of RESTful web service APIs. One of the core parts of any such system is the routing engine; given the URI of an incoming request, how do we map that to the bunch of code that represents the appropriate response? Our system uses XML-formatted resource maps that contain the information necessary to do that; a resource map contains not only URI paths, but the parameters to extract from those URI paths and the types of those parameters, as well as the required and optional query parameters and their types... it's pretty rich and pretty nifty.

But the really nifty part is that I had a moment of insight one day and realized I could hardcode a mock resource into the router that ran the resource map through an XSL transformer and returned a pretty (or at least, legible) HTML page that documents the system. Every URI, every parameter, all right there.

And — and this is the important bit — nobody ever has to worry about updating this documentation. It's automatically derived from the actual implementation of the system; to add something new to the system, you need to add it to the resource map, and bam, there it is in the help page.

• • •

This sort of implementation-derived documentation is the primary advantage that strong, static typing has over more dynamic approaches. IntelliSense for a more dynamic language means your either execute the code as it's being typed and use real-time introspection (as proposed by Steve Yegge in a talk at Stanford — and probably other places, but that the first time I saw it in print) or you start trying to layer a more static type system on top of them. Either way has its problems. Running the code so you can IntelliSense it has sandboxing problems, particularly if you're doing anything that's concurrent or distributed, and laying a static type system on top of a dynamic type system is kind of an advanced, compiler-powered form of missing the point, if you ask me. I have a lot of vaguely-connected thoughts on that last point that I should probably bundle together at some point. The Readers' Digest version is that adding classes to JavaScript is like adding wheels to a unicycle, in that your 'addition' has the unavoidable side effect of squashing what made the original unique. But that's a rant for another day.

Languages like C#, C++, and Java don't have those problems when it comes to rich syntax acting as embedded documentation. It's still not easy, but it's a well-studied problem, and there's existing code you can point to and say "here, here's where that feature goes!".

And it is happening, albeit slowly.

Generics in Java are widely (and correctly) criticized for not being integrated into the VM the way .NET 2.0 generics are, but on the other hand they do serve as a form of compiler-checked documentation; this ArrayList is of Marklars, and that ArrayList is of Widgets. The compiler will force me to keep those types distinct, and I can look a method signature and know what it expects. Yes, it's really just an annotation for the programmer and the compiler and doesn't actually change the generated code, but in a sense that's exactly the pointconst in C++ doesn't change the generated code either most of the time, it's just letting programmers express both constraints and intent in a richer and more robust fashion.

Critics of static typing frequently argue that it provides a false sense of security, and they're right to an extent. But that doesn't mean that static typing is bad, it simply means that static typing is insufficient. Choosing between compiler-powered semantic tests and human-written unit tests is akin to choosing between seat belts and airbags — the correct choice, anticlimactic though it may seem, is both.

• • •

So what's my point? Good languages let me do more than just write code, they let me codify what I expect my code to do and how I expect other code to call it. It lets me express in a simple and clear form every assumption I'm making, and then it explicitly checks my assumptions with all the thoroughness it can muster. That's important today for writing good code, and it's only going to get more important.

Imagine syntax-level support for concurrency.

There are methods in the next version of iClan that are designed to only be called on the main thread; any time I want to activate them from elsewhere I call performSelectorOnMainThread:withObject:waitUntilDone: (boy that's a mouthful). If Objective-C had syntax for concurrency, I could mark my methods as 'mainthreadonly', and the compiler could stop me from calling them from elsewhere. Or better yet, the compiler could note from the headers that the UI-related methods I'm calling in those methods are mainthreadonly, and from that deduce which of my methods must be mainthreadonly, and either warn me for not labeling them or just cascade the analysis all the way up the call graph.

Similar story: I recently built a lockless queue in C#; it's built around Eric Lippert's immutable queue code, and it uses Interlocked.CompareExchange to swap a new immutable value for the old immutable value (Or rather, it repeatedly tries to make that swap, regenerating a new new immutable queue from the new old immutable queue if it fails... It's fun code, I should write about it someday — it turns out not to be terribly useful, but fun nonetheless.). The exercise would've been a lot easier on a lot of levels if C# gave the const keyword all the magic powers it has in C++.

You'll notice both of those examples have a concurrency angle, and that's not a coincidence. All of the self-anointed experts agree that the age of concurrency is upon us, and for a change the experts who aren't self-anointed and are too smart and humble to actually call themselves experts and the nobodies like myself seem to be in pretty general agreement with them. And the thing about concurrency and distributed system is, well, they're hard. Maybe not hard like NP-hard hard, but still pretty bloody difficult. And the more the compiler and the runtime and the VM and the OS can do to check our work for us, the better.

Pages