This Week in Glean: Glean in 2021

Dec 18, 2020 · 5 minute read

(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean.)

All "This Week in Glean" blog posts are listed in the TWiG index (and on the Mozilla Data blog). This article is cross-posted on the Mozilla Data blog.

A year ago the Glean project was different. We had just released Glean v22.1.0 and Fenix (aka Firefox for Android aka Firefox Daylight) was not released yet, Project FOG was just an idea for the year to come.

2020 changed that, but 2020 changed a lot. What didn't change was my main assignment: I kept working all throughout the year on the Glean SDK, fixing bugs, expanding its capabilities, enabling more platforms and integrating it into more products. Of course this was only possible because the whole team did that as well.

In September I took over the tech lead role for the SDK from Alessio. One part of this role includes thinking bigger and sketching out the future for the Glean SDK.

Let's look at this future and what ideas we have for 2021.

(One note: right now these are ideas more than a plan. It neither includes a timeline nor does it include all the other things maintenance includes.)

The vision

The Glean SDK is a fully self-servable telemetry SDK, usable across different platforms. It enables product owners and engineers to instrument their products and rely on their data collection, while following Mozilla policies & privacy standards.

The ideas

In the past weeks I started a list of things I want the Glean SDK to do next. This is a short and incomplete list of not-yet-proposed or accepted ideas. For 2021 we will need to fit this in with the larger Glean project, including plans and ideas for the pipeline and tooling. Oh, and I need to talk with my team to actually decide on the plan and allocate who does what when.

Metric types

When we set out to revamp our telemetry system we build it with the idea of offering higher-level metric types, that give more meaning to individual data points, allowing more correct data collection and better aggregation, analysis and visualisation of this data. We're not there yet. We currently support more than a dozen metric types across all platforms equally, with the same user-friendly APIs in all supported languages. We also know that this still does not cover all intended use cases that, for example, Firefox Desktop wants. In 2021 we will probably need to work on a few more types, better ergonomics and especially documentation on when which metric type is appropriate.

Revamped testing APIs

Engineers should instrument their code where appropriate and use the collected data to analysis behavior and performance in the wild. The Glean SDK ensures their data is reliably collected and sent to our pipeline. But we cannot ensure that the way the data is collected is correct or whether the metric even makes sense. That's why we encourage that each metric recording is accompanied by tests to ensure data is collected under the right circumstances and that its the right data under the given test case. Only that way folks will be able to verify and analyse the incoming data later and rely on their results.

The available testing APIs in the Glean SDK provide a bare minimum. For each metric type one can check which data it currently holds. That's probably easy to validate for simpler metrics such as strings, booleans and counters, but as soon as you have more complex one like any of the distributions or a timespan it gets a bit more complex. Additionally we offer a way to check if any errors occured during recording, which are usually also reported in the data. But here again developers need to know which errors can happen under what circumstances for which metric type.

And at last we want developers to reach for custom pings when the default pings are not sufficient. Testing these is currently near impossible.

Testing these different usages of the SDK can and should be improved. We already have the first proposals and bugs for this:

Custom Ping Unit Tests proposes a simple callback-like API to validate data in custom pings
Testing parameter in Glean metric definitions proposes to make existing tests/testing documentation for metrics more visible
Improvements to Glinter, the Glean linter (e.g. opinions on naming of things)

(Note: as of writing these document might be inaccessible to the wider public, but they will be made public once we gather wider feedback)

For sure we will have more ideas on this in 2021.

UniFFI - generate all the code

The Glean SDK is cross-platform by shipping a core library written in Rust with language bindings on top that connect it with the target platform. This has the advantage that most of the logic is written once and works across all the targets we have. But this also has the downside that each language binding sort of duplicates the user-visible API in their target language again. Currently all metric type implementations need to happen in Rust first, followed by implementations in the language bindings.

All of the implementation work today is mostly manual, but usually follows the same approach. The language binding implementations should not hold additional state, but currently some still do.

Implementation of new metric types and fixing bugs or improving the recording APIs of existing ones results in a lot of busy work replicating the same code patterns in all the languages (of which we now have have 7: Kotlin, Swift, Python, C#, Rust, JavaScript and C++). If we also come up with new metric types next year this becomes worse.

A while ago some of my colleagues started the UniFFI project, a multi-language bindings generator for Rust. For a couple of reasons the Glean SDK cannot rely on that yet.

In 2021 I want us to work towards the goal of using UniFFI (or something like it) to reduce the manual work required on each language binding. This should reduce our general workload supporting several languages and help us avoid accidental bugs on implementing (and maintaining) metric types.

This is a rather short list of things that only focus on the SDK. Let's see what we get tackled in 2021.

End of 2020

The This Week in Glean series is going into winter hiatus until the second week of January 2021. Thanks to everyone who contributed to 25 This Week in Glean blog posts in 2020:

Alessio, Travis, Mike, Chutten, Bea, Raphael, Anthony, Will & Daosheng