Elixir, etc. - Typespecs

Written by Rich Morin.

Contents: (hide) (show)

Path:  AreasContentOverviews

Precis:  introduction to Typespecs, Dialyzer, etc.

This page is an introduction to the use of Typespecs (type specifications), as supported by Dialyzer and related tooling. Most of the examples and advice are based on my experience in adding type specifications to Pete’s Alley. For specific details on our current usage, please see Pete’s Alley - Typespecs.

Motivation

Using Elixir’s Typespec attributes (e.g., @spec, @type), a project can document how its internal interfaces use data types. The attributes are located in the working code base, typically next to the functions they document. They are also integrated into ExDoc’s output (e.g., API documentation, online help). This provides developers with a convenient and (mostly) reliable set of reference information upon which to base additions and changes.

Each @spec attribute (aka spec) specifies the data types of the parameters and return value of a named function. Erlang’s Dialyzer utility allows these specifications to be verified to a substantial and customizable degree. For extra credit, @type attributes can also be added, defining custom, namespaced data types. This makes these data types explicit and modular, raising them to the same level of definition and documentation as named functions.

Dialyzer works by detecting conflicts between a function’s usage and its type specification. Errors are reported in a detailed (if sometimes murky) report. Dialyzer can detect a broad and configurable range of type errors. Type information can be refined incrementally, tightening the constraints on data types and usage. Custom data types can be made arbitrarily precise, defining multiple layers of structure and type information.

Like the compiler and test suite, Dialyzer is used during development and maintenance. In particular, it should be considered as part of an acceptance testing suite. However, because the generated documentation can be useful during the development effort, some projects may add it to their continuous integration suite.

Having a gradual type checker on hand may also influence program design. Sorbet (recently open sourced by Stripe) is a type checker for Ruby. In a recent podcast (Introducing Sorbet), Sorbet developer Paul Tarjan said (roughly):

When you start coding a method, and you write down the type signature, if you say “this returns a string, or an integer, or maybe …”. When you write that type signature, you start to second-guess yourself. “Maybe I should write a couple of functions or maybe I should shape my code in a different way that makes the API act a little differently”. We’ve found that literally writing the type signature makes you think about your APIs a little bit more.

In summary, using Dialyzer can help to make a project’s code base more robust and maintainable, by adding verifiable type documentation for named functions. Because it can detect errors before the code is run, it can be considered as a code quality evaluation tool, working with the compiler and test suite to detect errors.

Background

After Erlang had been in use for a number of years, with many lines of code in production, serious efforts were made to retrofit it with a type system. In order to avoid false positives, any practical solution would have to accept “reasonable” type usage found in existing programs. Dialyzer’s approach to this constraint uses gradual typing of named functions. As long as no type specifications and/or usage conflict, Dialyzer will remain silent. This allows existing projects to add static type declarations in a controlled and incremental fashion.

Theory

Like Erlang, Elixir is a strongly typed programming language. So, for example, a program cannot simply append a number to a string. Rather, it must first specify how the number should be coerced into the string data type.

Type errors can be detected in various ways, at various points in the development process. For example:

However, it’s still quite possible for type errors to remain undetected well into production use of the code. For example, an error-logging function might be called with a number, rather than the expected text string. Because the type error crosses a function boundary, the compiler might not catch it. If the relevant behavior is not covered by a test, the type error might crash the offending process at runtime, preventing the desired diagnostic information from being displayed.

Using a form of gradual static typing, Dialyzer can detect many type errors before the code is run. So, it provides a useful addition to the checks isted above. Dialyzer’s notion of “success typing” evaluates every function call against the union of its type signatures. Calls which conform to any relevant type signature “succeed”; the remainder are reported as errors.

Practice

Dialyzer performs program-wide detection of type conflicts, based on type information it infers from the source code and any added specs. The generated diagnostics are generally reliable, in the sense that the program seldom produces false positives. However, they can also be confusing, incomplete, opaque, and voluminous.

Specifically, its messages seldom provide a clear and direct description of the underlying problem, let alone what to do about it. So, interpreting and resolving them takes both study and practice. Also, Dialyzer’s approach means that a lack of error messages only indicates that no type conflicts were detected. In particular, adding or tightening specs might well reveal errors.

If no specs have been defined, Dialyzer can only detect conflicts based on type information that it can infer. Running Dialyzer in this manner (and cleaning up any errors it reports) is a useful precaution before adding your own specs. Otherwise, pre-existing errors may cause confusion as you try to understand the ones your additions are causing.

Syntax

Each spec corresponds to a function of the same name and arity. Typically, it is placed just before the relevant function clause(s). Here is a spec and function definition for foo/2:

@spec foo(atom, number) :: map

def foo(key, val), do: %{ key => val }

The spec declares the intended data type (e.g., atom) for each argument (e.g., key), as well as the return value. Primitive types (e.g., atom, map, number) may be appropriate for many scalar variables and simple data structures, but they don’t provide much information about more complex structures. For example, although a tree of maps can be described as a map, this doesn’t say much about its internal structure or intended role within the program.

The @type and @typep attributes let the programmer define derived types, give them comments and mnemonic names, etc. The @typedoc attribute supports documentation for @type (but not @typep). Typespec attributes are generally placed in *.ex files. They can be added to *.exs files, as documentation, but will not be checked by Dialyzer.

Tooling

Jeremy Huffman’s Dialyxir package provides “Mix tasks to simplify use of Dialyzer in Elixir projects”. It also reworks the (Erlang-style) generated reports, making them more comprehensible to Elixir programmers. Dialyxir’s default output is rather verbose, so we use it as follows:

mix dialyzer --quiet

Sean Cribbs’ Dialyzex package is another alternative which offers some interesting features. Sadly, Dialyxir and Dialyzex both lay claim to mix dialyzer (etc), so they can’t be used together in a single Mix project. Grumble…

Organization

The code base for Pete’s Alley makes organized and pervasive use of Typespec attributes. Our specs use both primitive and derived types. In general, the latter are used for app-specific data structures, most of which involve trees of maps. Derived types are commonly defined in app-specific modules (e.g., Common.Types, InfoToml.Types) which contain mostly @type and @typedoc attributes. A few private types may be also defined by @typep attributes.

As suggested by Dave Thomas, each app in Pete’s Alley has its own implementation directory (.../<app>/lib/<app>). This gives us a logical place to store our type definition files (e.g., _common_t.ex). For brevity and convenience, we alias the defined modules for use in any other modules where they are needed:

alias Common.Types, as: CT

The types are then used as follows:

@spec keyss( %{ CT.map_key => any } ) :: [String.t]

If a type has a long name and/or appears multiple times in a spec, we tend to use a shorthand name. This is defined by a when clause on the following line:

@spec do_files(st, (st -> {st, sc})) :: [sc]
  when sc: ITT.schema, st: String.t

Here are some shorthand names we commonly use:

pc    Plug.Conn.t
st    String.t
tl    [tuple]     # tuple list

Example

Here is a real-world example of how we use (and define) some derived types. The function get_kv_info/1 returns a map, so we could write its spec as:

@spec get_kv_info(atom) :: map

However, this doesn’t say anything about the map’s internals. So, we specify that it is actually an ITT.kv_info structure (used for storing key/value information):

alias InfoToml.Types, as: ITT

@spec get_kv_info(atom) :: ITT.kv_info

This structure is defined (along with some others), in InfoToml.Types. Because @type doesn’t support the when syntax, we use @typep to privately define some shorthand names:

@typep nni      :: non_neg_integer
@typep st       :: String.t

Here are some public types we’ll need to support kv_info:

@type kv_cnts   :: %{ atom  => nni }
@type kv_descs  :: %{ atom  => st }
@type kv_tuple  :: {atom, st, nni}

Finally, here’s how kv_info is defined:

@type kv_info ::
  %{
    :kv_cnts    => kv_cnts,       # %{ <key>: <count> }
    :kv_descs   => kv_descs,      # %{ <key>: <desc> }
    :kv_list    => [kv_tuple],    # [ { <key>, <val>, <cnt> }, ... ]
    :kv_map     =>                # %{ <key => %{ <val> => <cnt> } }
      %{ atom => %{ st => nni } },
  }

In practice, each @type attribute is preceded by the corresponding @typedoc attribute.

Advice

My initial development spike for Pete’s Alley paid little attention to documentation, let alone testing or Typespecs. So, I had the joy of reverse engineering about 10K lines of code and retrofitting a set of specs and type definitions. The following advice is based on that experience.

Before writing any specs, run Dialyzer. If it complains about anything, congratulations! You’ve just been told about some bugs you can fix. Clean up any errors it complains about (iteratively) until it goes quiet.

Now, you can start (incrementally!) adding specs. Handing Dialyzer large numbers of untried specs is asking for trouble (i.e., a good way to amuse the gods). So, add a few specs, run Dialyzer, fix errors, rinse, repeat… If need be, just try to get one new or changed spec working at a time.

As you proceed, you’re certain to encounter Dialyzer’s syntax for describing the “success typing” of a function. For example:

Foo.bar(
  _inp_map :: %{atom() | binary() => %{atom() | binary() => binary() | map()}},
  ...
)

I find it helps quite a bit to copy this into a text editor and add some strategic white space:

Foo.bar(_inp_map ::
  %{
    atom() | binary() => %{
      atom() |
      binary() => binary() |
      map()
    }
  }, ... )

Progressive Refinement

In a production system, every named function in every *.ex file should have a tight spec using well-documented data types. However, this may be too big a burden for a development spike, prototype, or retrofit. So, give yourself a break and start by creating loose specs, using nothing but primitive types.

For example, it’s OK to call something a map at this stage, but you might want to flag the spec for a revisit. (FWIW, I use #!V comments for this, indicating “vague” code.) Once a function has a working spec, you can tighten it up. Here is a progression of tightening steps, based on the example above:

@spec get_kv_info(any)  :: map                      #!V - any, map
@spec get_kv_info(atom) :: map                      #!V - map
@spec get_kv_info(atom) :: %{ any  => any }         #!V - any
@spec get_kv_info(atom) :: %{ atom => any }         #!V - any
@spec get_kv_info(atom) :: %{ atom => list | map }  #!V - list, map
@spec get_kv_info(atom) :: ITT.kv_info

Dialyxir will pass command-line options on to Dialyzer, allowing you to enable or disable certain classes of checks. For example:

mix dialyzer --unmatched_returns

For more information, see Dialyzer Warnings. Meanwhile, here are some options you may want to try out:

Code Smells

As you proceed, stay alert for assorted code smells. Some types (e.g., any, number) can be fixed at this point. The remainder can be flagged for later attention. Here are some common offenders:

You can also help yourself by keeping the code tidy and DRY:

Inspection

As you proceed, you’ll need to clarify your understanding of the data types. In many cases, you’ll be able to do this by inspecting the code and comments. With practice, Dialyzer’s error messages will also start to make sense. If you get confused, however, remember that IO.inspect/2 is your friend. FWIW, I use it so often that I created ii/2 as a helper function:

def ii(thing, label), do: IO.inspect(thing, label: label)

I often use this to examine the data flowing through (or from) a pipeline:

...
|> ii(:foo)
...

Using Libraries

Well-mannered Elixir and Erlang libraries typically define the types that their functions expect and return. Using these types in your own specs can save effort and reduce the chance of getting something wrong. Before trying to create your own types, check the documentation!

Of course, it’s also possible that a library’s spec will (gasp!) contain a bug. For example, I recently tried to use Floki’s new traverse_and_update/3 function and found that its spec had a tiny glitch. After filing an issue on GitHub, my next challenge was to override the spec so that I could continue cleaning up my own errors.

Although there may be a simpler approach, I found one that seemed to work well. First, clone the library’s repo in GitHub. Next, find and edit the spec definition(s) as needed. Finally, point your deps entry (in each relevant mix.exs file) to the edited version:

# Temporary redirect until Hex.pm gets a new version of Floki...
#     { :floki,               "~> 0.25" },
      { :floki, override: true,
        git: "https://github.com/RichMorin/floki.git" },

Once you’re confident that your change is correct, file a pull request.

Afterword

Although Dialyzer has found a few minor errors in my code base, the combination of careful coding, strong typing, and strategic testing seems to leave little for it to find. So, it would be hard to justify the effort of adding hundreds of specs and dozens of types on that basis alone.

However, the ability to create and maintain verified documentation of data types is a huge win, in terms of maintainability. Considering the programs in state/transition terms, the data types define the states and the code defines the transitions. Or, as Niklaus Wirth put it, Algorithms + Data Structures = Programs. In short, verified documentation of data types helps projects to treat their data as a first-class citizen.

If you have questions about Dialyzer, or your project could use help in using it, please feel free to get in touch.

Resources

There is quite a bit of documentation on Dialyzer and related tooling. As a gentle introduction, I also recommend Jason Voegele’s talk, Dialyzer: Optimistic Type Checking for Erlang and Elixir. Although it’s not as gentle, I also like Sean Cribbs’ talk: Chemanalysis - Dialyzing Elixir.

Once you’re feeling comfortable with the basic ideas, I’d recommend at least skimming some of the papers listed on the project web site. I particularly recommend Practical Type Inference Based on Success Typing.

Dialyzer - Elixir

Dialyzer - Erlang

Typespecs

Wrappers