Skip to content
kripken edited this page Jul 19, 2012 · 29 revisions

The original emscripten compiler was written in JavaScript, which was very useful for quickly prototyping new ideas during development of the various new methods needed for effective compilation to JavaScript (the relooper, longjmp tricks, C++ exceptions in JS, etc.). It is also quite stable at this point and generates very good code. However, it has a few downsides:

  • Compiler speed. The generated code is fast, but generating the code is not so fast. Especially with full optimizations on, builds can be quite slow. This is not an issue for tens of thousands of lines of code, and is annoying but not horrible for hundreds of thousands, but it a serious problems for millions.
  • LLVM backends integrate more closely with LLVM, and can leverage LLVM's internal code analysis and optimization. The original compiler just parses LLVM bitcode externally, so it cannot benefit from internal capabilities of LLVM.
  • An upstream LLVM backend is easier to use for people than a separate project. Compiling to JS should, as much as possible, be just another backend in a compiler.

The plan is to start work over Summer 2012.

Guidelines and issues

  • We will use the C++ Relooper implementation https://github.com/kripken/Relooper
  • Focus on the C-style memory layout method. Other approaches (no typed arrays, unaliasing typed arrays) will only be done by the original compiler.
  • When possible, do native JS function calls f(x,y,z) and not read/writes from the C stack. Tricky with varargs but perhaps possible even there with internal LLVM changes.
  • Far better to do x = (a+b)/z instead of t = a+b ; x = t/z, unclear how easy it is to do that in an LLVM backend.
  • More advanced C++ static analysis than the current compiler should allow removal of a lot of unnecessary address shifting
  • See https://bugzilla.mozilla.org/show_bug.cgi?id=771106 for some optimizations we should implement. Also https://bugzilla.mozilla.org/show_bug.cgi?id=771285#c5
  • To get started we will not create an object format for JavaScript, we can continue to use the emcc wrapper which uses clang in a way that utilizes LLVM bitcode as the intermediate object format. So the initial goal is just to generate JS in the backend directly, that is, from LLVM IR in memory.
  • We still need to support linking with JS libraries (src/library*.js in current emscripten). The reason is that JS is unique compared to other backends: No one normally writes system libraries in low-level x86 or ARM, they at most will add some inline assembly for those CPUs to a C library. But for JS, it is actually a high-level language and people do want to write system libraries in it (and we have written libc, sdl, etc. in JS in emscripten). So while as per the previous point we do not want to invent a JS object format for linking, we do want to link in symbols in a simple way like the current emscripten compiler does.
  • Some initial work by Ehsan on Emscripten support in LLVM and clang are in
  • https://github.com/ehsan/llvm/commit/ad4c8c52f68a1694cbb66fe861f325928ca04d7c
  • https://github.com/ehsan/clang/commit/3a8eff2f5646605d949222032422a12967b34790
  • LLVM already has a target triple ArchType of le32 with comment generic little-endian 32-bit CPU (PNaCl / Emscripten), we should presumably use that?
  • Of the existing backends, the simplest is CppBackend, but it might be too simple. Sparc seems to be the smallest "real" backend.
  • Should we call this+Emscripten Emscripten 2.0?
  • Should we call the LLVM backend itself "JS" or "Emscripten" internally in LLVM?

Method

First Steps

  • Get emcc to generate human-readable sparc assembly using the sparc backend (done)
  • Modify the sparc backend to generate something resembling JavaScript

Setting up and testing

This is still in a very very very early experimental stage, but if you want to see what the current state is, first get and build LLVM

  • git clone git://github.com/kripken/llvm-js.git
  • cd llvm-js
  • cd tools
  • git clone git://github.com/kripken/clang-js.git clang
  • cd ..
  • ./configure --enable-targets=x86,sparc
  • make

Then get emscripten's llvm-js branch

  • Go to emscripten directory
  • git branch llvm-js

You can now try to run emcc, but nothing will fully work yet.

Clone this wiki locally