oelf - Mach-O support for sqlelf

sqlelf lets you explore ELF objects through the power of SQL. It turns any executable into a queryable database. Any? No, just those in ELF format, the standard binary file format on most Unix and Linux systems. But not on macOS. macOS relies on the Mach object file format, short Mach-O and sqlelf doesn't support that. The library it uses for parsing in theory does, but that failed on my machine. It depends on a heavy C++ library and I didn't want to bother figuring out how to build and change that.

I still wanted sqlelf for mach-o binaries. Luckily the hardest part was settled early on: naming.

<@fnordfish@mastodon.social> @jer sollte auf jeden Fall „Ölf“ heißen!
(translation: it should definitely be called „Ölf“)

So oelf exists now. The source code is on GitHub and my fork of sqlelf adds it to sqlelf for easy use.

Install

I have released pre-built versions of oelf, but nothing is upstreamed to sqlelf yet. You can install it from git:

pip install git+https://github.com/badboy/sqlelf@with-macho-support#egg=sqlelf

On my M1 MacBook sqlelf doesn't work out of the box. sqlelf depends on Capstone and the installed library coming with the Python wrapper is x86_64 only, so it won't load

That's fixable. Assuming you installed into a Python venv .venv:
Install capstone from Homebrew, remove the bundled library and link to the global one instead:

brew install capstone
rm .venv/lib/python3.11/site-packages/capstone/lib/libcapstone.dylib
ln -s $(brew --cellar capstone)/5.0.1/lib/libcapstone.5.dylib .venv/lib/python3.11/site-packages/capstone/lib/libcapstone.dylib

Usage

Invoke sqlelf and pass any number of Mach-O binaries as arguments. This gives you an SQLite REPL, or you specify SQL commands with --sql.

For example sqlelf knows about libraries references by the binary:

$ sqlelf /usr/bin/grep --sql 'select * from macho_libs'
┌───────────────┬────────────────────────────┐
│     path      │            lib             │
│ /usr/bin/grep │ self                       │
│ /usr/bin/grep │ /usr/lib/libbz2.1.0.dylib  │
│ /usr/bin/grep │ /usr/lib/liblzma.5.dylib   │
│ /usr/bin/grep │ /usr/lib/libz.1.dylib      │
│ /usr/bin/grep │ /usr/lib/libSystem.B.dylib │
└───────────────┴────────────────────────────┘

None of those /usr/lib/*.dylib actually exists in the filesystem though, because Apple now ships those as a big bundled cache file instead.

I have not yet documented the schema nor is it anywhere near complete. Use .schema to get an overview.

$ sqlelf /usr/bin/grep --sql '.schema macho_headers'
CREATE TABLE macho_headers(
  path,
  magic,
  cputype,
  cpusubtype,
  filetype,
  ncmds,
  sizeofcmds,
  flags,
  reserved
);

Tables are persisted views over the data, so everything is in memory. Most values are the raw values read from the file, so you will have to look up what those values mean.

For example the headers include all sorts of magic numbers and file types as integers:

$ sqlelf /usr/bin/grep --sql 'select * from macho_headers'
┌───────────────┬────────────┬──────────┬────────────┬──────────┬───────┬────────────┬─────────┬──────────┐
│     path      │   magic    │ cputype  │ cpusubtype │ filetype │ ncmds │ sizeofcmds │  flags  │ reserved │
│ /usr/bin/grep │ 4277009103 │ 16777223 │ 3          │ 2        │ 21    │ 1688       │ 2097285 │ 0        │
└───────────────┴────────────┴──────────┴────────────┴──────────┴───────┴────────────┴─────────┴──────────┘

You can slice and dice the data as you wish1.

$ sqlelf /usr/bin/grep --sql "select name, type, global, n_value from macho_symbols where path = '/usr/bin/grep' limit 3"
┌─────────────────────┬────────┬────────┬────────────┐
│        name         │  type  │ global │  n_value   │
│ radr://5614542      │ N_PBUD │ 0      │ 90260802   │
│ __mh_execute_header │ N_SECT │ 1      │ 4294967296 │
│ _BZ2_bzRead         │ N_UNDF │ 1      │ 0          │
└─────────────────────┴────────┴────────┴────────────┘

Status

I hacked together oelf in a matter of days. I'm using the excellent pyo3 to wrap goblin's functionality into a Python package, built with maturin. It works reliably (yey for great tooling written in and for Rust!), but so far I haven't documented much. oelf itself is a bit inconsistent in how it exposes different data. The sqlelf integration is really simple, thanks to a nice extensible code structure of the project. Now every newly exposed functionality in oelf needs only defining the schema of a table and mapping the retrieved data to its columns.

I have yet to actually use sqlelf myself more to explore binaries and all the data in there. I also have only a bare understanding of the ELF format and much much less of the Mach-O format, I'm just barely good at plugging together existing things.

Some things that might be good to do:


Footnotes: