More Embarrassingly Obvious Problems

This is a followup to Solving Embarrassingly Obvious Problems In Erlang, which provides the religious basis for what follows.

I use Python on occasion when I want to indulge my inner hacker. Python doesn’t ask you questions or give you guff — it does what it’s told. You want to do this, Python lets you do this.

It fits your brain.

And for those of you who have special brains, there’s ML.

So this is a simple program I wrote to print out the row counts for each of the tables in a database:

import sys
import MySQLdb

def count(db_name):
    db = MySQLdb.connect()
    db.select_db(db_name)
    cur = db.cursor()
    cur.execute("show tables")
    tables = [t[0] for t in cur]
    for table in tables:
        cur.execute("select count(*) from %s" % table)
        count = cur.fetchone()[0]
        print "%s %i" % (table, count)

if __name__ == "__main__":
    count(sys.argv[1])

And I can tell you it was fun to write! I typed and typed and hacked and tweaked and eventually it all just worked!

And then the guilt set in.

Just because I’m using Python — and this code looks pretty clean — I’m never justified in hiding perfectly good logic inside a long list of instructions!

God of Clarity, forgive my sins. Give me the strength to work until my code is embarrassingly obvious…”

Looking over the code, the intent is pretty simple: I want to print a list of tables for a database, along with each table’s row count.

After a short meditation, these operations came into view:

  • Connect to the database and get a cursor

  • Enumerate the tables in the database

  • Get the row counts for each table

  • Print the tables along with their respective row counts

And after some fiddling:

def print_db_table_row_counts(db_name):
    cur = db_cursor(connect_db(db_name)
    print_row_counts(row_counts(all_tables(cur), cur))

When I stare at this code for just a few seconds, it looks really obvious to me! Each of the bullets above is neatly reflected in the function calls. It says exactly what I mean!

Next time I look at this code and ask, “what’s going on here?” I’ll squint for a few seconds and say, “ah, okay — just what the function name says it does.” No documentation, no comments — just expression-of-intent using functions!

Now the implementation.

connect_db looks like this:

def connect_db(db_name):
    return MySQLdb.connect(db=db_name)

The skeptic will say, “What a waste of a typing! The implementation is so trivial that it can be used directly!”

The abstractionists1 will say, “Brilliant! The function hides the implementation, which lets you change the database driver without breaking the rest of the code!”

Both wrong!

To the skeptic, the point of the function is to represent an essential logical operation, which is to connect to the database. While MySQLdb.connect(db=db_name) does that, it doesn’t say that. It says, “use the MySQL Python driver to connect to a MySQL server and select a particular database.” That’s not what I mean at all! I just want to connect to a database. I don’t want to cloud the picture with a driver, a method invocation, and a particular keyword argument!

To the abstractionist, the point of the function is not to allow me to change the implementation! Hiding an implementation detail in a function doesn’t magically make the implementation details go away. The function returns a MySQL database object, which other parts of the module relies on. There’s a lot more work to even attempt such an abstraction.2

The function is there to represent the operation. That’s all.

To drive this point home, here’s the function that returns a cursor, given a database:

def db_cursor(db):
    return db.cursor()

Meditate on this and your path will become illuminated.

Now that we have our cursor we can pass it along to any operation that needs to query the database.

$$code(lang=python) def all_tables(cur): cur.execute(“show tables”) return [row[0] for row in cur]

Using Erlang style decomposition, I’d be tempted to factored this function to this:

def all_tables(cur):
    return handle_show_tables_result(
        execute_query("show tables", cur), cur)

def execute_query(sql, cur):
    return cur.execute(sql)

def handle_show_tables_result(_, cur):
    return [row[0] for row in cur]

These functions articulate the separate logical operations: execute the query and process the results.

But I don’t like this in Python, at all. The function pipelining pattern makes more sense in Erlang, which provides pattern matching function clauses and encourages side-effect free functions. Python however, uses side effects everywhere and this pattern, to me, splits up logic that belongs together.

So I’ll stick with the first version. It’s far more Pythonic and will induce much less head-scratching.

For row_counts, we can use a straight forward list comprehension: 3

def row_counts(tables, cur):
    return [(table, row_count(table, cur)) for table in tables]

In row_count, as in all_tables, we’ll run a query and handle the result in the same function:

def row_count(table, cur):
    cur.execute("select count(*) from %s" % table)
    return cur.fetchone()[0]

It still feels like we’re doing too much here, but I think it’s the right thing in Python.

Finally, our print function:

def print_row_counts(counts):
    for table, count in counts:
        print "%s %i" % (table, count)

Yes, we’re using a for statement. But this is Python — and there’s no sane alternative. Functional purists might be tempted to use a higher order function, but we don’t encourage self flagellation in our order.

There’s one more niggly problem with the original code:

if __name__ == "__main__":
    count(sys.argv[1])

This of course is is incomprehensible. This is what I want to say:

def db_name_arg():
    return sys.argv[1]

if __name__ == "__main__":
    print_db_table_row_counts(db_name_arg())

And so I present the refactored, far more obviously correct code:

import sys
import MySQLdb

def print_db_table_row_counts(db_name):
    cur = db_cursor(connect_db(db_name))
    print_row_counts(row_counts(all_tables(cur), cur))

def connect_db(db_name):
    return MySQLdb.connect(db=db_name)

def db_cursor(db):
    return db.cursor()

def all_tables(cur):
    cur.execute("show tables")
    return [row[0] for row in cur]

def row_counts(tables, cur):
    return [(table, row_count(table, cur)) for table in tables]

def row_count(table, cur):
    cur.execute("select count(*) from %s" % table)
    return cur.fetchone()[0]

def print_row_counts(counts):
    for table, count in counts:
        print "%s %i" % (table, count)

def db_name_arg():
    return sys.argv[1]

if __name__ == "__main__":
    print_db_table_row_counts(db_name_arg())

A few points jump out at me:

  • This is Python, not weirdo Erlang — any language that supports functions can be used to solve embarrassingly obvious problems!

  • Line count per function still seems to be a good proxy for code quality

  • Looking at the Python code I’ve written over the years, I have sinned gravely

Fortunately, the sins of our past can be washed away, if we only apply ourselves to rigorous thinking — and endure typing these small, embarrassingly obvious functions!

Footnotes


  1. A member of a religious sect that views all code as an opportunity to anticipate change. They spend their time solving problems that are not yet defined. 

  2. In the garden, man and woman were tricked by the ORM, which was disguised as a serpent. The ORM promises ease of use and productivity, but poisons applications with deformed, grotesque, incomprehensible SQL. It is the cause of incalculable suffering and terrible evil. 

  3. Some claim that list comprehension makes Python a functional language. This is a false teaching. 

comments powered by Disqus