FIT parsing and code generation in Elixir

Feb 19, 2024

Flexible and Interoperable Data Transfer (FIT) developed by Garmin is a format used to store activity/workout files. Garmin provides SDKs in many languages, and there are even more various libraries that are based on its specification, some open source, some not. The specification consists of this FIT Protocol page and a huge spreadsheet in downloadable SDK describing types and messages the FIT file may contain.

Pasted_image_20240219141732.png

At the end of last year, I’ve started implementing a FIT parser in pure Elixir. Excellent Erlang binary pattern matching made it very pleasant to do and at this moment, the decoder can parse many FIT files. Some encoders don’t fully follow the specification to the dot. I’ve scavenged various FIT files from issue reports, examples in SDKs and my own years of running. I wanted to fully support anything the file may contain - components, dev fields, subfields and it does! Mostly matching what official fit2csv tool is outputting.

This post is about a specific aspect of being able to even start parsing a FIT file, “Profile”. “Profile” is based on the spreadsheet in SDK. It describes all the various types of fields, records, their type, scales, offsets, bit offsets and more. This spreadsheet is also used to generate all encoders and decoders in official SDK for C, C++, Java, Python, JavaScript and more. Such profile in official SDK in JavaScript is whooping 25k (yes, k) lines of code, similar amount in Python.

I also explored fitparser in Rust (almost 30k lines), and fit in Go(only 1.5k lines!). In my extremely naive attempt in Elixir so far, my profile is around 45k lines, that’s a lot of lines.

Taking shortcuts with inspect/2

The very basic and simplified input for codegen looks somewhat like this:

types = [
  ["activity", "enum", [["manual", 0], ["auto_multi_sport", 1]]],
  [
    "activity_class",
    "enum",
    [
      ["level_max", 100],
      ["level", 127],
      ["athlete", 128, "0 to 100"]
    ]
  ],
  ["workout_hr", "uint32", [["bpm_offset", 100]]],
  ["weight", "uint16", [["calculating", 65534]]]
  # ... 2000 more types
]

In reality, it’s nowhere like this and it’s coming from xlsx file, but it should be enough to imagine what we’re working with.

Each element of the list is a field with its type and values. Example, for the first element, activity field is always 0 or 1 and each values corresponds to specific label after parsing - manaul or auto_multi_sport in this case. In FIT file there actually won’t be any activity string but something like 51 which is described by another field… It’s fun and efficient binary format! FIT Protocol doc.

In my naive attempt to get the job done, I’ve used Elixir’s Inspect protocol that’s implemented for all the data types. It can be used to dump data structs into strings and read them back later, matching the original data structure. And as you’ll see below, it mostly works.

iex> field = %{type: "uint32", name: "distance"}
%{name: "distance", type: "uint32"}
iex> inspect(field) |> Code.eval_string()
{%{name: "distance", type: "uint32"}, []}
iex> IO.puts(inspect(field))
%{name: "distance", type: "uint32"}

If you try to do that for a very large data structure (like the one in FIT’s spreadsheet), inspect will truncate some data, this is easily fixed by adding limit: :infinity to inspect/1 call.

We also need to be on a lookout for array of integers which may look awfully a lot like a charlist, or maybe… a charlist looks like an array of integers… all the same. Let’s ensure it stays consistent, with the charlists option.

iex> inspect([?h, ?e, ?l, ?l, ?o, 112])
"~c\"hellop\""
iex> inspect([?h, ?e, ?l, ?l, ?o, 255])
"[104, 101, 108, 108, 111, 255]"
iex> inspect([?h, ?e, ?l, ?l, ?o], charlists: :as_lists)
"[104, 101, 108, 108, 111]"

It’s of course, all the same.

iex> 'hello' == [?h, ?e, ?l, ?l, ?o]
true

This approach with dumping data structs into profile.ex worked! My parser uses successfully the module with all the types and messages from the spreadsheet, but…

It’s really annoying to depend on inspect/2. I cannot, without additional options, define custom inspect for many structs, which makes debugging harder when everything is extremely verbose and structs contain many fields.

Another big disadvantage is that while it may work for data structs, I cannot easily dump references to functions or call other modules.

iex> f = & &1
#Function<42.3316493/1 in :erl_eval.expr/6>
iex> inspect(f)
"#Function<42.3316493/1 in :erl_eval.expr/6>"
iex> f.(1)
1
iex> inspect(f) |> Code.eval_string()
{nil, []}

I needed a better approach, and I didn’t want to step back to dumping elixir-like strings into profile.ex while deeply traversing the data structure. I wanted to use the power of Elixir, macros! After all, they can be used to generate other Elixir code.

Switch to macros

There’s a lot to Elixir macros, I still don’t fully grasp all the intricate details about them, but maybe it’s for the best. They’re very powerful. Official docs, and more importantly, Understanding Elixir Macros series by Saša Jurić, were extremely helpful in making my prototype work. They can be used to capture AST for any Elixir code and then dump it into a file.

How do we switch from inspect to macros? Easy!

#### instead of this
iex> field = %{type: "uint32", name: "distance"}
#### do
iex> field = quote(do: %{type: "uint32", name: "distance"})
{:%{}, [], [type: "uint32", name: "distance"]}
iex> field |> Macro.to_string()
"%{type: \"uint32\", name: \"distance\"}"
#### or
iex> field = %{type: "uint32", name: "distance"}
iex> Macro.escape(field) |> Macro.to_string()
"%{name: \"distance\", type: \"uint32\"}"

This works for creating modules too! Elixir’s macros is still a proper AST. It works with syntax highlighting and autocomplete and any other Elixir code, except for comments, they’re not part of AST.

field = quote(do: %{type: "uint32", name: "distance"})
quote do
  defmodule Exgen.Profile.ExampleField do
    @field unquote(field)

    def field, do: @field
  end
end
|> Macro.to_string()
|> IO.iodata_to_binary()
|> IO.puts()

# results in

defmodule Exgen.Profile.ExampleField do
  @field %{name: "distance", type: "uint32"}
  def field do
    @field
  end
end

This brings us closer to proper code generation for profile. Macros also allow us to properly generate method calls!

iex> field = %{type: quote(do: Exgen.Types.by_name("uint32")), name: "distance"}
iex> Macro.escape(field) |> Macro.to_string() |> IO.puts()
%{
  name: "distance",
  type: {
    {:., [], [
      {:__aliases__, [alias: false], [:Exgen, :Types]}, 
      :by_name
    ]}, [], ["uint32"]
  }
}

…or not? That’s not right. Our type quote with the function call gets escaped twice. To fix this, in this specific case:

iex> Macro.escape(%{field | type: {:unquote, [], [field.type]}}, unquote: true) 
iex> v |> Macro.to_string() |> IO.puts()
%{name: "distance", type: Exgen.Types.by_name("uint32")}

That’s much better! In the case of profile generation, it’s pretty straightforward to know what needs to be marked with :unquote for proper code generation. For a more general approach, it should be possible to deeply walk the data structure and mark all quote with :unquote if needed.

Formatting

To tighten it all up, the whole code generation, there’s one last thing to make it pretty - formatting. We don’t want to have one massive unreadable line. Fear not, Elixir has a built-in formatter, mix format , and it’s not only available in CLI but also as a method call - Code.format_string/2. Before the final dump into a file, it can be used to nicely format all the generated code in any way you want.

Conclusion

The approach with macros is much more readable, testable and less error-prone. If something is wrong with syntax, it’s more obvious in macro than in a combination of string with inspect, defmodule Profile do #{inspect(types)} end.

One last thing I’m still missing is how to attach comments. Elixir macros don’t have comments in AST, formatter handles them with Code.string_to_quoted_with_comments/2. I don’t see any easy way to attach them to AST from macro, but in the end, it may not be needed. I want to experiment with a switch to methods with @doc annotations and creating proper named structs for each type, instead of generic one with name: attribute.

The generated profile is now a lot of smaller and it’s still in progress due to above fiddle with @doc. At this moment rest of the parser is still in a rough shape, missing some niceties like CRC calculation or speed (1st make it work!). When this is resolved, I plan to release it as open source library and maybe get some help with adding encoder for it. It would be awesome to be able to decode, edit and encode back FIT files.


My favorite Elixir macro is dbg/2, or maybe def/2… It’s all macros! 😱