haskell style tips
      ------------------------
      
      when writing haskell code, you may sometimes find your code
      turning into disorganized spaghetti, even if you're doing all
      the other "expert" things, like using monad stacks and lenses
      and keeing IO functions separate from pure ones. this spaghettification
      is made even worse by the fact that most haskell syntax doesn't
      require any form of delimiters to block sections of code off from one
      another. consider the following code, lifted from an old project of mine:

      
typecheck Fix{unFix = Term'VarDecl name tags} = do tags' <- mapM (mapM $ mapM appTy) tags let ty = Type'Variant name (M.fromList tags') ty' <- generalize ty mapM_ ((flip addTag) ty') (map fst tags') addTypeVar name ty' return Fix{unFix = Term'VarDecl name tags `Typed` Type'None}
this doesn't have all the problems that this guide will talk about, but it demonstrates quite a few of them. here are some rules to follow that will help you avoid bad code like what's displayed above: 1. avoid nesting!!! this means that in most cases (with the major exception being when you're using a domain-specific language, because those usually involve lots of infix operators), you shouldn't have more than two or three functions being applied on one line. if you have to nest parentheses, then you'll probably want to refactor that later. other bad forms of nesting include nesting of do, case, let, and where clauses (and proc clauses, haha). 2. shorten local variable names. this runs counter to what's advised in other languages, but in haskell it helps keep functions very terse and generic-looking. it's perfectly acceptable to write something like swap (a, b) = (b, a) instead of swap (firstElem, secondElem) = (secondElem, firstElem). 3. use lots of local variables. by using where and let clauses within functions, you can break them up into smaller pieces and help avoid the need for deep nesting of function applications. 4. keep functions short. short functions are much easier to understand. now for the complicated step that's mostly complicated just because I like to make haskell jokes way too much: 5. use typeclasses to unkludge things. to be more specific, the "kludgy things" I'm talking about are those times when you need to mapM_ . mapM . fromList . mapM . mapM . toList . mapM . uncurry . flip . fmap in order to apply a monadic action to specific elements of a data structure. it's worth the time to dig around and see if you can find typeclasses and instances that handle whatever bizarre mapping behavior you need to accomplish. if you can't find anything, then you should define your own typeclass that lets you rummage around in data structures in whatever cursed way your program needs to. even if you feel bad making a class with a silly name like ApplyType, it's worth the abstraction. typeclasses can be more humble than Comonad and the other elder gods that haunt the haskell blogosphere. of course, you can make typeclasses more abstract once you start seeing patterns arise, but don't get caught up trying to discover a SuperYonedaGADTHindleyMilnad, unless you're a researcher in desperate need of something to publish. we can't all be optomekmettrists. with that out of the way, let's have a look at an improved version of the aformentioned (monadic) hell-function. my old project is big and clunky, so I haven't dusted it off to test this revision yet -- so while it might not be technically correct, the structure itself is improved. also, I wrote this at 2:00am, so...
-- wraps typechecking around a Fix constructor. typecheck (Fix t) = do t' <- process t return (Fix t') -- typechecks a term, and then runs functions on the -- result in order to update the type context. returns -- the updated term attached to its type. process t = do (t', ty) <- typecheck' t decl <- catalogueDecl t' ty -- if we processed a declaration, it has no final type let ty = if decl then Type'None else ty return (t' `Typed` ty) -- the typechecking case for a variant declaration. typecheck' t@(Term'VarDecl s xs) = do xs' <- applyTypes xs ty <- generalize (Type'Variant s xs') return (t, ty) -- records the declaration of a type; this is -- the specific case for a variant declaration. -- other cases of this function that aren't for -- declarations will not record anything. the function -- returns @True@ or @False@ based on whether anything was -- recorded. catalogueDecl (Term'VarDecl s xs) ty = catTypeVars t >> catTags t >> return True where catTVars = addTypeVar s ty catTags = recordTags ty xs -- here are some typeclasses (not fully defined, because that -- would take forever) that we can make instances of in order to -- make mapping monadic actions across complicated datastructures easier -- behavior for applying type-level functions to the types within -- the datastructure @a@ class ApplyTypes a where applyTypes :: a -> Typechecker a -- behavior for recording the type that variant tags are associated with class RecordTags a where recordTags :: Type -> a -> Typechecker a
this hypothetical refactor alleviated at least some of the guilt I feel for having written such a mess of a haskell project. sure, the code is longer now, but it's more abstract and easier for the human mind to parse. this length also benefits us in the long run, because now every single typechecking case for each term will be shorter. the code for updating the type context is now decoupled from typecheck', so typecheck' exists for the sole purpose of determining a term's type. with the extra baggage removed, typechecking cases essentially just map over the relevant elements of each term, and then perform unification. while it's true that some function names could be changed to better describe their role in relation to other functions, the bigger picture has improved. to be honest, a lot of this improvement came from basic refactoring rather than the allegedly super-special principles that this blogpost was written to promote. /shrugs it should also be noted that there are broader improvements that could be done to the structure of this code; for example, it would have been wiser for my past self to have designed or used a library that implements typechecking and unification in smaller, more generic steps, so that these boilerplate functions wouldn't be necessary. essentially, I've realized in hindsight that the typechecking process could have been decoupled into two parts: one for running the generic algorithms and building up a type context, and another for handling the case-by-case typing rules for the terms specific to the programming language I was implementing.