Solving Embarrassingly Obvious Problems In Erlang

There are two main primary reasons I like Erlang:

  • The Erlang VM is an operating system for your code
  • The Erlang language encourages rigor in problem solving

I’ll be discussing the second point here.

Refactoring Using Functions

Here’s an expression I often use in life:

if
    deliciousness(Beer) >= ?EXCELLENT ->
        drink(Beer);
    true ->
        obstain
end

This is an example of Erlang’s if expression. The true clause is effectively the else clause in other languages — it always matches and is therefore evaluated when the previous clauses are false.

If you find this syntactically slightly hard to process, you’re not alone.

Aside from illustrating the point that you may want to avoid if expressions in Erlang altogether, this examples serves a higher purpose: it’s a perfect straw man we can refactor to make things obvious.

Here’s another version, which replaces the if expression with a chain of function calls:

 maybe_drink(if_standard_met(deliciousness(Beer), ?EXCELLENT), Beer)

This version is equivalent to the first, but formalizes the logic of the “deliciousness comparison” using the function if_standard_met. That logic was implicit in the first version, but explicit in this version.

Here’s the process I used for the refactor:

  • I read the first version and asked, “what is actually going on here?”
  • After some thought I concluded that “some standard is being applied to the activity of drinking beer”
  • Using functions, I wrote what I thought is actually going on
  • I read the final version a few times and became all the more convinced of its truth
  • I poured myself a Smuttynose Pale Ale and ran some tests

This technique — decomposing complex functions into smaller, constituent functions — can be used to recast hard-to-solve problems into a series of embarassingly obvious problems. Erlang in particular lends itself well to this.

Myths About Erlang

In the Peabody Award winning video The Ghetto, the protagonist is confronted with several terrible obstacles that stand between him and the mythical technology he seeks.

Here are some obstacles one might face in Erlang:

  • Single assignment variables
  • No loops
  • No return statement
  • Bizarre if expressions

I wonder. How many developers have fled in horror, having encountered these on their journey to obtain Erlang?

It’s a pity, because each of these obstacles is, in fact, a feature that would have helped them become better software developers!

The gist is this:

  • The problems presented by each obstacle can be solved using functions
  • If you don’t use functions to overcome these obstacles, your code will indeed look terrible, even hideous
  • If you take the effort to write functions that clearly describe your problems, your code will look great
  • In Erlang, great looking code is often great, while terrible looking code is always terrible

If you adopt this One True Religion, you’ll discover something amazing: you can generally spot quality code by looking at a single metric — lines of code per function.

If your functions have more than a few lines of code (four is a good benchmark), you need to look closely at them — chances are you have an opportunity to factor them into smaller, more tightly focused functions.

But why?

Functions let you symbolically describe the problem you’re solving. If your functions contain too many lines, chances are you’re solving multiple problems without articulating them clearly. If you take the opportunity to articulate them, you’ll have to think rigorously.

Thinking rigorously about problems is the process of software development.

You know you’re done when there’s nothing left to think about — the problem you’re solving becomes so utterly trivial, your code is obviously correct.

True Religion.

A Real Life Example - Factoring Terrible Code

This morning I spotted some code I wrote about a year ago. I obviously didn’t have religion then. It’s easy to recognize as terrible code because it’s so damn long:

handle_amqp(#message{name="db.create"}=Msg, State) ->
    e2_log:info({db_create, stax_service:to_proplist(Msg)}),
    Name = get_required_attr("name", Msg),
    verify_db_name(Name),
    User = get_required_attr("user", Msg),
    Password = get_required_attr("password", Msg),
    Options =
        case get_attr("cluster", Msg) of
            undefined -> [];
            Cluster -> [{cluster, Cluster}]
        end,
    case stax_mysql_controller:create_database(
           Name, User, Password, Options) of
        {ok, HostInfo} ->
            Attrs = [{"slaves", ""}|host_info_attrs(HostInfo)],
            {reply, message_response(Msg, Attrs), State};
        {error, Err} ->
            e2_log:error({db_create, Err, erlang:get_stacktrace()}),
            {error, Err, State}
    end;

This code might work but I have no idea — I’d need to subject it to a barrage of tests, or run it in production for a while.

Worse, I can’t possibly know if this code is correct — I don’t even know what it’s meant to do!

While I think this code is “clean” syntactically (it doesn’t make my eyes bleed the way Erlang code can) the problem is not with the syntax — it’s with the meaning. It has no clear meaning!

What a great opportunity on a Sunday afternoon to thinking rigorously!

So let’s refactor.

Step 1 - What’s Going On Here?

I’m going to factor this using a top-down decomposition. The coding mechanics are very simple: describe what’s going on as clearly as possible. As we’ll see again and again, the work here consists of answering the question: “what’s really going on here?”

So what’s really going on with this terrible function?

Simple: I’m handling a particular message type — “db.create”.

So let’s do that:

handle_amqp(#message{name="db.create"}=Msg, State) ->
    handle_db_create_msg(Msg, State);

Done!

Seriously. I am done here! I’ve solved my problem completely and unequivocally. There’s no debating. There’s no head scratching. Hell, I don’t even need to test this — I’m completely done!

Step 2 - Handling “DB Create”

Okay, I’m not done. I need to actually handle the “DB Create” message, whatever that means.

Moving the original code into a seperate function, I have this:

handle_db_create_msg(Msg, State) ->
    e2_log:info({db_create, stax_service:to_proplist(Msg)}),
    Name = get_required_attr("name", Msg),
    verify_db_name(Name),
    User = get_required_attr("user", Msg),
    Password = get_required_attr("password", Msg),
    Options =
        case get_attr("cluster", Msg) of
            undefined -> [];
            Cluster -> [{cluster, Cluster}]
        end,
    case stax_mysql_controller:create_database(
           Name, User, Password, Options) of
        {ok, HostInfo} ->
            Attrs = [{"slaves", ""}|host_info_attrs(HostInfo)],
            {reply, message_response(Msg, Attrs), State};
        {error, Err} ->
            e2_log:error({db_create, Err, erlang:get_stacktrace()}),
            {error, Err, State}
    end.

What’s going on here?”

Let me think a moment, out loud:

  • I first log the pending operation
  • Then I collect and validate input from the message
  • Then I create the database and handle the result

So after all, this is actually pretty simple — but you wouldn’t know it without reading the code carefully! Let’s use functions to make it completely obvious!

After some tinkering and a few misfires I came up with this:

handle_db_create_msg(Msg, State) ->
    log_operation(db_create, Msg),
    handle_db_create_result(create_db(create_db_args(Msg)), Msg, State).

Take a moment and compare it to my “thinking out loud” exercise above. It’s exactly what’s going on — symbolically represented by four distinct functions!

Funny, this little piece of code took work to get right!

But I’m done!

Step 3 - Logging

Okay, not done. Need a log_operation function.

log_operation(Type, Msg) ->
    e2_log:info({Type, stax_service:to_proplist(Msg)}).

We could easily do without this function — but it’s two lines of code that makes “logging an operation” drop-dead obvious when you see it. That’s an easy spend.

Step 4 - Converting “DB Create” Message to Args

Next we’ll implement create_db_args.

Here’s what I have, mostly copied from the original function:

create_db_args(Msg) ->
    Name = get_required_attr("name", Msg),
    verify_db_name(Name),
    User = get_required_attr("user", Msg),
    Password = get_required_attr("password", Msg),
    Options =
        case get_attr("cluster", Msg) of
            undefined -> [];
            Cluster -> [{cluster, Cluster}]
        end,
    [Name, User, Password, Options].

Looking at the last line, this function returns a list of arguments that can be used upstream to create the database.

But it’s too much code for one function! Too much to read and process in one chunk. Let’s use some functions to make this obviously correct.

What’s going on here?”

I’m processing a series of message attributes, converting each to some value I can use upstream.

Maybe something like this:

create_db_args(Msg) ->
    [create_db_arg(Arg, Msg)
     || Arg <- [name, user, password, options]].

This function uses list comprehension to process a series of “args” for a given message. Each arg corresponds to an element in the returned list.

This is exactly what I want to do — it’s so obvious!

Let’s process each arg:

create_db_arg(name, Msg) ->
    verify_db_name(get_required_attr("name", Msg));
create_db_arg(user, Msg) ->
    get_required_attr("user", Msg);
create_db_arg(password, Msg) ->
    get_required_attr("password", Msg);
create_db_arg(options, Msg) ->
    case get_attr("cluster", Msg) of
        undefined -> [];
        Cluster -> [{cluster, Cluster}]
    end.

This is identical to what we had originally, but now it’s in a form that makes each arg conversion explicit and easy to see.

But while easy to see, it’s dense and not totally obvious. What to do?

I could implement each function clause as a single call like this:

create_db_arg(name, Msg) -> db_name(Msg);
create_db_arg(user, Msg) -> db_user(Msg);
...

But the extra typing! My precious coding fingers!

Stop your whining! This religion has no room for lazy hedonists! To reach your spiritual goals, you must endure suffering:

create_db_arg(name, Msg) -> db_name(Msg);
create_db_arg(user, Msg) -> db_user(Msg);
create_db_arg(password, Msg) -> db_password(Msg);
create_db_arg(options, Msg) -> db_create_options(Msg).

This function is now just a mapping of argument names to conversion functions.

Crystal clear focus. So simple. So correct.

Here are the conversion functions, each clearly separated:

db_name(Msg) ->
    verify_db_name(get_required_attr("name", Msg)).

db_user(Msg) ->
    get_required_attr("user", Msg).

db_password(Msg) ->
    get_required_attr("password", Msg).

db_create_options(Msg) ->
    case get_attr("cluster", Msg) of
        undefined -> [];
        Cluster -> [{cluster, Cluster}]
    end.

This is where many will walk away, claiming the religion is too hard. Hell, I probably would have skipped this myself, if I wasn’t writing a blog post about it.

Now that I see the result, I’m glad I took the extra two minutes to type it out!

But that case expression is bothering me. I bet there’s some logic hidden in there that I haven’t thought about.

What’s going on there?”

Looking at db_create_options I read this: if a “cluster” is specified in the message, I want to provide that cluster as an option.

So let’s do that:

db_create_options(Msg) ->
    db_create_cluster_options(get_attr("cluster", Msg)).

db_create_cluster_options(undefined) -> [];
db_create_cluster_options(Cluster) -> [{cluster, Cluster}].

This is arguably an extreme focus on minutiae. Perhaps.

But it’s part and parcel of our “make it obvious” exercise:

  • The case expression is like an if expression — there’s almost always “something going on” that you can clarify using a function
  • Refactoring the case expression forced me to think about what was actually going on
  • The refactor wasn’t hard, once I understood what was going on

Step 5 - Create the Database

Next is create_db:

create_db([Name, User, Password, Options]) ->
    stax_mysql_controller:create_database(Name, User, Password, Options).

This is simply a pass through to the external function call.

While we could live without this indirection, it provides an important role in making things totally obvious. The inputs to this operation are created by something called “create DB args” and the result is handled by “handle create DB result”. Don’t we need a function named “create DB” for it to come into focus?

In our One True Religion, it’s no sin to trade key strokes, which are cheap, for clarity, which is precious.

Step 6 - Handle the Database Create Result

Most Erlang developers use case expression to handle function results. Nothing wrong with this:

case read_something() of
    {ok, Result} -> Result;
    {error, Error} -> error(Error)
end

But now curious, you ask, “what’s really going on here?”

Well, I’m translating the result of read_something into something else.

So I’ll say that:

 translate_read_something_result(read_something())

And then do it:

translate_read_something_result({ok, Result}) -> Result;
translate_read_something_result({error, Error}) -> error(Error).

If you’re using case expressions, you’re missing an opportunity to articulate something. You’ll be shocked at times to discover hidden, important logic when you try to name that “something” in a function.

Of course “translate_read_something_result” is a silly name, though accurate. In such cases, I use a “handle” naming convention. So it’d be spelled something like “handle_read_result”.

With that background, here’s what I have for the create_db result handler:

handle_create_db_result({ok, HostInfo}, Msg, State) ->
    Attrs = [{"slaves", ""}|host_info_attrs(HostInfo)],
    {reply, message_response(Msg, Attrs), State};
handle_create_db_result({error, Err}, _Msg, State) ->
    e2_log:error({db_create, Err, erlang:get_stacktrace()}),
    {error, Err, State}.

This is a mechanical refactoring of the case statement in the original handle_amqp function. It’s not bad — just a few lines of code. But we need to beat on this.

What’s going on here?”

There are two obviously different functions: handling an “ok” case and handling an “error” case.

So let’s say that:

handle_create_db_result({ok, HostInfo}, Msg, State) ->
    handle_db_created(HostInfo, Msg, State);
handle_create_db_result({error, Err}, _Msg, State) ->
    handle_db_create_error(Err, State).

An do it — first our success case handler:

handle_db_created(HostInfo, Msg, State) ->
    Attrs = [{"slaves", ""}|host_info_attrs(HostInfo)],
    {reply, message_response(Msg, Attrs), State}.

Not much code, but the code is weird. What’s going on with the “slaves” attribute? I’m not sure actually. Remember, this is real code that I’m refactoring — I wrote this a while ago and don’t recall why it’s there.

Ah — I remember! It’s there to maintain a service level contract. The original API included a list of “slaves”. The new API does not. Rather than just drop the information, I’m providing an empty value.

That’s a flaw with the code: it’s not obvious that I’m implementing a backward compatibility layer. I can use a function to elaborate that.

But first, let’s make things easier to read:

handle_db_created(HostInfo, Msg, State) ->
    {reply, db_created_message_response(HostInfo, Msg), State}.

The Attrs variable was an indicator that I was doing some work that was probably suited for a function. It turns out I don’t need Attrs at all!

db_created_message_response(HostInfo, Msg) ->
    message_response(Msg, db_created_response_attrs(HostInfo)).

The act of creating the “Attrs” list is now, properly, performed as a function:

db_created_response_attrs(HostInfo) ->
    db_created_legacy_attrs(host_info_attrs(HostInfo)).

And there I’ve snuck in the legacy support — explicitly, in case I forget, or someone else reads this!

db_created_legacy_attrs(Attrs) ->
    [{"slaves", ""}|Attrs].

But you rage, “four functions to replace one, you twit!”

I say, “four distinct logical operations — once hidden, now revealed!”

Our religion emphasizes clarity over economy.

And finally, our error case handler:

handle_db_create_error(Err, State) ->
    e2_log:error({db_create, Err, erlang:get_stacktrace()}),
    {error, Err, State}.

While this seems trivial, there’s actually a bug. I didn’t notice it earlier because I wasn’t fixating obsessively.

What’s going on here?”

I’m handing an error case by logging it and returning an appropriate response for the handle_amqp callback.

Before tackling the bug, let’s use a log_error function to make this slightly clearer:

handle_db_create_error(Err, State) ->
    log_error(db_create, Err),
    {error, Err, State}.

log_error(Type, Err) ->
    e2_log:error({Type, Err, erlang:get_stacktrace()}).

Now, the bug.

In log_error what’s going on with erlang:get_stracktrace()?

At casual glance, it looks like it would return the current stack trace. Does that mean I’m going to log stack trace of the current operation? Why do I care about that?

Reading up on erlang:get_stacktrace/0 we see:

Get the call stack back-trace (stacktrace) of the last exception in the calling process… If there has not been any exceptions in a process, the stacktrace is [].

This is certainly not what I want! I’m not handling an exception in this function, so the result is either [] or some random exception from a previous operation. Pointless!

I suspect this code may have once lived inside an exception catch block, which would have made sense. But in some refactor it was lost amidst the surrounding code sprawl.

In the light of day, it’s obvious what I want:

log_error(Type, Err) ->
    e2_log:error({Type, Err}).

Done! Actually, truly done!

Refactor Summary

Here’s the original, terrible function:

handle_amqp(#message{name="db.create"}=Msg, State) ->
    e2_log:info({db_create, stax_service:to_proplist(Msg)}),
    Name = get_required_attr("name", Msg),
    verify_db_name(Name),
    User = get_required_attr("user", Msg),
    Password = get_required_attr("password", Msg),
    Options =
        case get_attr("cluster", Msg) of
            undefined -> [];
            Cluster -> [{cluster, Cluster}]
        end,
    case stax_mysql_controller:create_database(
           Name, User, Password, Options) of
        {ok, HostInfo} ->
            Attrs = [{"slaves", ""}|host_info_attrs(HostInfo)],
            {reply, message_response(Msg, Attrs), State};
        {error, Err} ->
            e2_log:error({db_create, Err, erlang:get_stacktrace()}),
            {error, Err, State}
    end; ...

Here’s the refactored code:

handle_amqp(#message{name="db.create"}=Msg, State) ->
    handle_db_create_msg(Msg, State); ...

handle_db_create_msg(Msg, State) ->
    log_operation(db_create, Msg),
    handle_db_create_result(create_db(create_db_args(Msg)), Msg, State).

log_operation(Type, Msg) ->
    e2_log:info({Type, stax_service:to_proplist(Msg)}).

create_db_args(Msg) ->
    [create_db_arg(Arg, Msg)
     || Arg <- [name, user, password, options]].

create_db_arg(name, Msg) -> db_name(Msg);
create_db_arg(user, Msg) -> db_user(Msg);
create_db_arg(password, Msg) -> db_password(Msg);
create_db_arg(options, Msg) -> db_create_options(Msg).

db_name(Msg) ->
    verify_db_name(get_required_attr("name", Msg)).

db_user(Msg) ->
    get_required_attr("user", Msg).

db_password(Msg) ->
    get_required_attr("password", Msg).

db_create_options(Msg) ->
    db_create_cluster_options(get_attr("cluster", Msg)).

db_create_cluster_options(undefined) -> [];
db_create_cluster_options(Cluster) -> [{cluster, Cluster}].

create_db(Name, User, Password, Options) ->
    stax_mysql_controller:create_database(Name, User, Password, Options).

handle_create_db_result({ok, HostInfo}, Msg, State) ->
    handle_db_created(HostInfo, Msg, State);
handle_create_db_result({error, Err}, _Msg, State) ->
    handle_db_create_error(Err, State).

handle_db_created(HostInfo, Msg, State) ->
    {reply, db_created_message_response(HostInfo, Msg), State}.

db_created_message_response(HostInfo, Msg) ->
    message_response(Msg, db_created_response_attrs(HostInfo)).

db_created_response_attrs(HostInfo) ->
    db_created_legacy_attrs(host_info_attrs(HostInfo)).

db_created_legacy_attrs(Attrs) ->
    [{"slaves", ""}|Attrs].

handle_db_create_error(Err, State) ->
    log_error(db_create, Err),
    {error, Err, State}.

log_error(Type, Err) ->
    e2_log:error({Type, Err}).

Whooooa! This is better?? No way in Hades is this better!

Original “terrible” code:

  • Functions: 1
  • Lines of code: 20
  • Average lines per function: 20

Refactored “awesome” code:

  • Functions: 18
  • Lines of code: 43
  • Average lines per function: 2.4

Code bloat! 20 lines to 43 lines! “Fired, you twit!”

Hang on there.

We actually reduced the lines of from 20 to just 3!

This is our new function — what we’re actually doing:

handle_db_create_msg(Msg, State) ->
    log_operation(db_create, Msg),
    handle_create_db_result(create_db(create_db_args(Msg)), Msg, State).

Our total line count did go up by 2x. This is bad if you argue that line count correlates positively to bug count, which of course it must if all else is held constant. But in this case, all else is not constant — we beat the living hell out of our code, reducing the lines per function a full eight fold!

My argument here is not empirical but intuitive: if your code is reduced to solving embarrassingly obvious problems, you’re going to have fewer bugs. Really, you should have no bugs.

Cost and Benefit

There’s an obvious cost to this approach: time spent making things obvious.

It’s true — a lot thought and effort went into reducing each problem to something trivial and obvious.

But there’s a very tangible payoff: you don’t need to write and maintain tests.

No tests!

Or at least very few tests.

This is also where a lot of developers will walk away. A few years ago, they’d have walked away if you told them to use test driven development — now that everyone spends half or more of his or her time writing tests, the pendulum has swung the other way.

In my experience, code written with this level of rigor — with a fixation on elaborating problems until each is embarrassingly simple — is typically correct the first time. Some nominal effort running the code in a REPL or tracing the code is usually all it takes to flush out remaining problems.

Some developers I know spend an incredible amount of time and energy on test related activity. In some cases it’s breathtaking in cost. I’ve seen teams spend months building complete test stubs for their runtime environment — and maintain them slavishly — all the while cursing the universe for how hard it is to “properly” test their systems.

That’s great for members of our order — we can take 10% of that time, maybe less — and spend it on OCD style “rigorous thinking”. We can skip the tests and instead make out code correct!

Bravado!” you say.

Maybe a little — but my experience generally confirms this. I might some day change my mind and return to the generally pointless, soul crushing regime of test driven development.

For for now, this religion is more fun!

Summary

  • One of Erlang’s best qualities is that it’s easy to spot terrible code — just look for the long functions
  • You can write great code by whittling terrible code into smaller and smaller pieces, each obviously correct, simply by asking “what’s going on here” — “what’s really going on?” until you know exactly what’s going on, everywhere
  • Obviously correct code will generally have no bugs
  • Obviously correct code will generally not need tests
  • Developers who write fewer tests have more time to write obviously correct code

A final thought.

People who bitch about Erlang’s hideous syntax are writing long functions with lots of variable assignments, case statements, if statements — trying to force an imperative style of programming into a functional language.

I’ve looked at their code.

But let us not judge. Our love for the obvious and the correct must emanate like a beacon of light for these lost souls to follow. Some day they may enter our fold and we will welcome them joyfully as returning prodigals.

Our mission remains ever clear: we relentlessly seek out the obvious, and then do the obvious, until only the obvious remains.

And always drink excellent beer. Never terrible beer.

Amen

Postscript

The first published version of the solution was slightly different from the version you see above:

  • handle_db_create_msg contained both a bug and an API design flaw
  • The flaw seemed okay to me because it resulted in simpler code, which was better suited for illustrating the points above — but it was wrong
  • I totally missed the bug
  • One careful reader was kind enough to point it all out

I fixed the problems, but I want to elaborate on what happened — it’s a good example how striving for obvious correctness can be used to think about and fix bugs.

The original version of handle_db_create_msg:

handle_db_create_msg(Msg, State) ->
    log_operation(db_create, Msg),
    create_db(create_db_args(Msg), State).

It looks enticingly simple and correct. It’s not.

The error is that create_db is made both responsible for creating the database and translating the result to a value appropriate to this function.

This is the correct function:

handle_db_create_msg(Msg, State) ->
    log_operation(db_create, Msg),
    handle_create_db_result(create_db(create_db_args(Msg)), Msg, State).

Note how “create” is clearly separated from “handle result” — that’s correct because that’s what this function does, by definition.

It’s pretty obvious that something was wrong if you look at the original spec for create_db:

create_db([Name, User, Password, Options, Msg], State) -> CreateResult

This is an incoherent definition of “create DB” because two arguments have nothing to do with creating a DB: Msg and State.

Those arguments are used in “handling the DB create result”, which is where they appear in the final, correct version.

As punishment for my muddled thinking, a bug lurked that would have blown everything to high heaven.

This was the original create_db_args function:

create_db_args(Msg) ->
    [create_db_arg(Arg, Msg)
     || Arg <- [name, user, password, options, msg]].

Note that the msg atom — the last element in the args list — it is never handled by create_db_arg:

create_db_arg(name, Msg) -> db_name(Msg);
create_db_arg(user, Msg) -> db_user(Msg);
create_db_arg(password, Msg) -> db_password(Msg);
create_db_arg(options, Msg) -> db_create_options(Msg).

If it was handled, I’d have this function clause:

create_db_arg(msg, Msg) -> Msg

What rot!

The correct reading of this function clause is, “you are not thinking clearly — you’ve tested too much beer — msg is not a create DB arg!”

As the clause was missing, I was in for a rude “function clause” error the first time I called create_db_args!

Erlang would have held the code up to my face and shouted, “What are you saying here? Are you an idiot? Do you have any idea how illogical you’re being right now? I refuse to work with this person!”

And I would say, “er, um, well… but…”

And then I’d think about what I was doing. And eventually get it right.

Thanks to bossek for spotting this!

comments powered by Disqus