Pair Your Compilers at the ABI Café

130 points
1/20/1970
14 days ago
by nrabulinski

Comments


jcranmer

I can definitely feel the pain of trying to work out ABI mismatch concerns. It doesn't help that it often isn't clear from the output what some of the underlying assumptions are--expected stack alignment, for example, or structs being broken up into registers or not.

It would be nice if compilers could output some sort of metadata that basically says "ah yes, here's a struct, it requires this alignment, fields are at these offsets and have these sizes, and are present in these cases" (the latter option being able to support discriminated unions) or "function call, parameters are here, here, and here." You'd think this is what DWARF itself provides, but if you play around with DWARF for a bit, you discover that it actually lacks a lot of the low-level ABI details you want to uncover; instead, it's more of a format that's meant to be generic enough to convey to the debugger the AST of the source program along with some hint of how to map the binary code to that AST--you can't really write a language-agnostic DWARF-based debugger.

11 days ago

orivej

Some Common Lisp FFIs have opted to coax this information out of the compiler. https://github.com/rpav/c2ffi is a C++ tool that links to libclang-cpp and literally outputs JSON with sizes and alignments. (It is then used by https://github.com/rpav/cl-autowrap to autogenerate a Lisp wrapper.) The older CFFI Groveller [1] works by generating C code which is compiled by the system C compiler (e.g. GCC or Clang) and, when executed, prints Lisp code that contains resolved values of constants, sizes, alignments, etc.

[1] https://cffi.common-lisp.dev/manual/html_node/The-Groveller....

11 days ago

ngcc_hk

Very lisp. Basically reprogram itself. Unfortunately this is not applicable to maintained code like c, rust … etc?

11 days ago

orivej

In fact, in a way C and Rust do the same thing!

When you run ./configure or cmake for a C program, it often prints something like "configure: checking size of long long" or "-- Check size of long long". This is done by generating, compiling and running a short C program that prints sizeof long long. The result goes into an autogenerated config.h.

In Rust the first example of build.rs usage [1] compiles and runs a C program during the build of the crate, and the next page [2] shows how to use autogenerated Rust code with include! macro.

Lisp is more similar to C or Rust than you might think. Code generation typically happens while the library or program source code is being loaded, and it is orchestrated by a declaration in an .asd file, which is analogous to meson.build, but looks more like Cargo.toml, e.g. [3]

[1] https://doc.rust-lang.org/cargo/reference/build-scripts.html [2] https://doc.rust-lang.org/cargo/reference/build-script-examp... [3] https://github.com/rpav/cl-freetype2/blob/b7871aed0c5244fc3b...

10 days ago

PartiallyTyped

What you are asking for sounds quite a bit like what rustc does.

Rustc (outside `extern "c"`) offers no guarantees on the ordering of the fields, however, it guarantees that every instance of struct A will have the same ordering during that particular compilation. This allows rustc to compile external crates (as long as no monomorphization is needed) in a consistent manner across all crates that depend on that.

11 days ago

menaerus

Most of the ABI issues arise when you start to mix and match shared libaries produced by different compilers, or even the libraries produced by the different versions of the same compiler.

Rust has none of that, nor does support dynamic linking, so I fail to understand what is it that rustc can offer in that solution space. There is none.

11 days ago

bluGill

Rust works around the issue by not allowing all the useful things that get you there. There are other useful things like sharing pointers across threads that rust will not let you do - for both better and worse. (better in that you avoid a lot of problems for something you rarely need - worse for those few cases where you actually need to do those and cannot)

11 days ago

Georgelemental

You can share pointers across threads in Rust, it's just `unsafe`.

11 days ago

zX41ZdbW

I think the right way to avoid this problem is to avoid using ABI at runtime or build time.

At runtime, it means - don't use shared libraries. At build time, it means - build every library from the source, don't use pre-built artifacts.

This sounds controversial... But it allows you to change compiler or compiler options at any time, and you don't have to bother. It also enables cross-compilation, reproducible builds, and portable binaries. You no longer have to ask developers to set up a complex build environment on a specific Linux distribution because it works everywhere.

I use this approach for ClickHouse.

11 days ago

menaerus

This can work only if you own the entire codebase and have all external dependencies that you depend on statically link (compiled) within your product.

I also very much prefer this way of handling dependencies but it's not a solution for all ABI problems since it also implies that you will need to statically link (compile) against all the transient dependencies. These are including at very minimum libc++ or libstdc++. And with this requirement in place this already isn't possible for many of the codebases out there.

And it also brings another issue at the table: X version of libc++/libstdc++ depends on Y version of libc.

Since you generally cannot statically link against the libc, and you don't own it since it's part of the OS, this becomes a hairy problem. You really need to make sure that your code works across different versions and thereof combinations of libc++/libstdc++/libc.

And then there's ... a bunch of other different platforms which aren't Linux.

11 days ago

JonChesterfield

Glibc is obstructive to static linking but musl is not. That gives you a binary that relies on the Linux syscall interface and nothing else. I believe bsd's libc statically links without problems as well.

Libc++ is set up for static linking out of the box (if you manage to find or guess the many cmake flags).

OSX and Windows insist on libc iirc but they're closed systems anyway so controlling your dependency graph is unavailable.

11 days ago

menaerus

The problem with musl is its malloc implementation. It is subpar when compared to glibc malloc. And we know that glibc malloc is generally subpar when compared to jemalloc. For large-scale applications with many (concurrent) (small) allocations this will be a big performance problem.

I guess (?) it would be possible to fix this by replacing the musl inline malloc implementation by statically linking against the jemalloc.

11 days ago

JonChesterfield

Changing malloc for a different one you like more would work fine in musl. In particular because it has a sane bootstrap structure which means the dynamic loader can call normal C functions, it'll be swap out the code and carry on as normal. I'd suggest replacing it in libc itself instead of requiring a separate library though.

11 days ago

intelVISA

That's a good point, haven't looked too deep as I avoid allocations but musl+jemalloc is workable in Rust at least.

11 days ago

comex

Even then, you still need ABI consistency between compilers if you want to link together codebases written in different languages (e.g. C and Rust).

In practice this almost always 'just works' because most cross-language calls simply don't use the kinds of complicated types discussed in the blog post. They tend to stick to simple integer and pointer types, where ABI consistency is usually a given.

Though you can still get into trouble when passing function pointers, especially when combined with some modern control-flow integrity systems.

11 days ago

tester756

>Even then, you still need ABI consistency between compilers if you want to link together codebases written in different languages (e.g. C and Rust).

Let's talk over http, queue or other IPC-ish way

11 days ago

dwattttt

You _still_ need a consistent way to talk about values; IPC systems tackle the same problems under the name marshalling & de/serialisation. They just tend to take much more conservative options to deal with exactly this kind of problem (you don't have to care about integer endian-ness if integers are expressed as strings).

11 days ago

[deleted]
11 days ago

0xdeafbeef

You can't pass values in registers using this model

11 days ago

jcranmer

Okay, how do you propose to talk to your kernel then?

11 days ago

lmm

Who wants a kernel? Distribute a bootable unikernel image that can be talked to via gRPC or something.

Obviously there are plenty of things you can't build that way (e.g. drivers), but for a server application that's intended to be accessed over the network anyway, like Clickhouse, I'm increasingly thinking that's the way to go.

11 days ago

anonymoushn

Over an spsc queue (unfortunately you cannot mmap this way yet, and you cannot set up the spsc queue itself this way)

11 days ago

vlovich123

Io_uring?

11 days ago

eddd-ddde

Your kernel will likely have well defined interfaces. You don't need libraries to talk to the kernel.

11 days ago

jcranmer

But how can you use those interfaces without an ABI?

Fundamentally, an ABI is the way you define interfaces.

11 days ago

vlovich123

Except for Linux, those well-defined interfaces sit behind a C API.

11 days ago

czarit

Not really - Linux syscalls are stable, so you are free to run your binary with a statically compiled libc and never touch the installed one. You can also handcraft your syscalls in assembly.

This will not work on Windows, where the kernel API is a DLL and syscall numbes are routinely changed.

11 days ago

vlovich123

That's what I said - on Linux the syscall API is stable while on all other OSes you have to go through libc to talk to the kernel.

10 days ago

gigel82

Tell me you never worked in a big codebase without telling me you never worked in a big codebase.

11 days ago

not2b

There is a specified common C++ ABI that gcc, clang, Intel's proprietary compiler, and others use. It was originally developed for the Itanium processor but is now used by gcc and clang for everything. See

https://itanium-cxx-abi.github.io/cxx-abi/abi.html

Unfortunately this ABI didn't specify how __int128 (and other nonstandard types) are to be passed.

11 days ago

jcranmer

The Itanium ABI effectively specifies how to lower the C++ ABI to an assumed C ABI, and the C ABI is given by what is known as the "psABI" (processor-specific ABI).

The (not-most-recent) x86-64 ABI is here: https://raw.githubusercontent.com/wiki/hjl-tools/x86-psABI/x..., and it does actually explain how to pass __int128.

11 days ago

kevingadd

WebAssembly also has a de-facto standard ABI: https://github.com/WebAssembly/tool-conventions/blob/main/Ba...

11 days ago

gigel82

I struggled with this many times and at the end of the day threw down the towel and just wrapped everything in plain C exports. That's the only way I know to get ABI compatibility across different compilers/toolsets/versions. COM-like constructs come as a close second.

It's an unfortunate state.

11 days ago

w10-1

Also function pointers, errors & exception-handling, async/channels/thread-local's, go stacks, swift @objc, @cdecl and cpp inter-op, FFI dialects...

It's not really pain anymore; it's a kind of hilarity

11 days ago

tialaramex

If I understand correctly there's also an ABI problem for synchronization rules.

Within a compiler, it actually doesn't matter whether we think about a problem as A mustn't happen after B, or as B mustn't happen before A. But expressing this across an ABI we have to be careful that we don't have buck passing. Suppose Language #1 thinks of it the first way, and Language #2 the second way, now if Language #1 is responsible for B while Language #2 is responsible for A, each may believe the other will have taken care of ordering and no synchronization is actually implemented.

Overall, "ABI" turns out to mean something like "Every assumption you've made which can be detected by other software, including assumptions you didn't realise you had". Discovering all your assumptions is hard, accepting that other people assumed different and they aren't just wrong is also surprisingly hard.

11 days ago

JonChesterfield

Yes, if you're brave enough to have fence position as part of the calling convention. I think you're safe if ordering is expressed within a given function, or by sequence of calls to functions with the same ideas of fences.

11 days ago

gumby

It's Interop for compilers!

11 days ago

fowl2

I mean there's tons of IDLs, so many no one can agree on which one!

oblig: https://xkcd.com/927/

7 days ago