Site menu Emscripten: run C code in Javascript environments

Emscripten: run C code in Javascript environments

Like it or not, Javascript is the "esperanto", the universal language. Every relevant platform has a embeddable JS interpreter, and a number of platforms like Web and Electron employ JS as #1 language. If you are going to write a portable library today, you should write it in JS.

But this is a recent phenomenon. A lot of C and C++ code has been writen since the 1970s. Rewrite this code would cost money and time, only to deliver something you already have. Joel Spolsky has taught 20 years ago: never rewrite!

At least in Node.js/Electron, a sensible way of reusing legacy code is to write a wrapper module. But the glue code must be written in C++ and you have to deal with DLL compilation in multiple platforms. Yes, there are helper tools like node-gyp, but it is still a non-trivial task.

There is another way: compile C code into Javascript using the Emscripten compiler. Is sounds crazy, but it works.

Emscripten is a bug project, and many libraries and frameworks have been ported to it. For example, an SDL-based program (generally games use SDL) can be readily compiled and run on a browser, since SDL has been already ported, and a HTML5 graphic backend is included!

My interests on Emscripten were more humble: just to port a certain C library. It is a bit like libpng or libjpeg, a "deaf" and "blind" library that does not interact with the platform, it just does a single in-memory task when requested. Such libraries are written like this on purpose to make porting easier, and this is very convenient when we need to port to JS.

Part of Emscripten is very well-documented on site e.g. how to install and use, as well as how to call C from JS and JS from C, passing and returning numbers. (BTW, Linux distros and Homebrew package it, making installation easier than explained in documentation.)

The plot thickens when we need to pass strings and arrays back and forth the two language. Documentation is sparse and taciturn. I had to search around, read many blog posts and example code pieces to find out how to proceed in each case. (Interestingly enough, I had the same problem with every embeddable Javascript interpreter I had to work with.)

The result of my efforts are condensated in this GitHub project. If you are used to these things and are in a hurry, you don't need to keep reading this article. Go straight to GitHub.

The GitHub examples have Node.js as target, and I have chosen the asm.js "binary format", which is pure Javascript, with some annotations can can be used by a optimizer. The default output is WebAssembly, whose support is recentin Node.js. It is advisable to take a look in my Makefile, namely the compilation flags. The adequate flags for your project may be quite different.

Interfacing two languages looks easy because there are only two cases: language A calls function written in B, and vice-versa. Problem is, each function can pass and return data, and handling is different in every direction. There are three main data types: numbers, strings and buffers/arrays. It is necessary to define who is responsible for free()ing the latter two. The 2 cases turned out to be 20, and we haven't spoken about closures and function pointers yet.

An Emscripten module, when compiled as asm.js, loads asynchronously. I had to resort to a global function to detect when the module is ready. If someone knows a better technique, I accept push requests.

The C heap of an Emscripten module is simply a byte array directly accessible from Javascript via m.HEAPU8.buffer where m is the module. Pointers are simply offsets inside this array, and can be passed around as numbers.

It is the ultimate "VAXocentric" machine, in this regard (See this page about "VAXocentrism", items 3 and 5.) On the other hand, Emscriptem is anti-Vax in item 6. It chokes on something that is indeed undefined behavior, but works in most other platfoms: unaligned pointers. The line below fails on Emscripten when buffer_position is odd:

*((uint16_t *) buffer_position) = value;

This kind of thing is the worst fear when porting C code to a new platform. It is important to enable as many assertion and safety tags as possible, and not to use optimization tags unless absolutely necessary. Otherwise, the code above fails silently. Once the problem is found out, it is easy to do the right thing:

memcpy(buffer_position, &data, 2);

Here is a short description of techniques employed in the example's source code.