Pathtracer 17: WebAssembly

Introduction and index of this series is here.

Someone at work posted a “Web Development With Assembly” meme as a joke, and I pulled off a “well, actually” card pointing to WebAssembly. At that point I just had to make my toy path tracer work there.

So here it is: aras-p.info/files/toypathtracer

Porting to WebAssembly

The “porting” process was super easy, I was quite impressed how painless it was. Basically it was:

  1. Download & install the official Emscripten SDK, and follow the instructions there.
  2. Compile my source files, very similar to invoking gcc or clang on the command line, just Emscripten compiler is emcc. This was the full command line I used: emcc -O3 -std=c++11 -s WASM=1 -s ALLOW_MEMORY_GROWTH=1 -s EXTRA_EXPORTED_RUNTIME_METHODS='["cwrap"]' -o toypathtracer.js main.cpp ../Source/Maths.cpp ../Source/Test.cpp
  3. Modify the existing code to make both threads & SIMD (two things that Emscripten/WebAssembly lacks at the moment) optional. Was just a couple dozen lines of code starting here in this commit.
  4. Write the “main” C++ entry point file that is specific for WebAssembly, and the HTML page to host it.

How to structure the main thing in C++ vs HTML? I basically followed the “Emscripting a C library to Wasm” doc by Google, and “Update a canvas from wasm” Rust example (my case is not Rust, but things were fairly similar). My C++ entry file is here (main.cpp), and the HTML page is here (toypathtracer.html). All pretty simple.

And that’s basically it!

Ok how fast does it run?

At the moment WebAssembly does not have SIMD, and does not have “typical” (shared memory) multi-threading support.

The Web almost got multi-threading at start of 2018, but then Spectre and Meltdown happened, and threading got promptly turned off. As soon as you have ability to run fast atomic instructions on a thread, you can build a really high precision timer, and as soon as you have a high precision timer, you can start measuring things that reveal what sort of thing got into the CPU caches. Having “just” that is enough to start building basic forms of these attacks.

By now the whole industry (CPU, OS, browser makers) scrambled to fix these vulnerabilities, and threading might be coming back to Web soon. However at this time it’s not enabled by default in any browsers yet.

All this means that the performance numbers of WebAssembly will be substantially lower than other CPU implementations – after all, it will be running on just one CPU core, and without any of the SIMD speedups we have done earlier.

Anyway, the results I have are below (higher numbers are better). You can try yourself at aras-p.info/files/toypathtracer

DeviceOSBrowserMray/s
Intel Core i9 8950HK 2.9GHz (MBP 2018)macOS 10.13Safari 115.8
Chrome 705.3
Firefox 635.1
Intel Xeon W-2145 3.7GHzWindows 10Chrome 705.3
AMD ThreadRipper 1950X 3.4GHzWindows 10Firefox 644.7
Chrome 704.6
Edge 174.5
iPhone XS / XR (A12)iOS 12Safari4.4
iPhone 8+ (A11)iOS 12Safari4.0
iPhone SE (A9)iOS 12Safari2.5
Galaxy Note 9 (Snapdragon 845)Android 8.1Chrome2.0
iPhone 6 (A8)iOS 12Safari1.7

For reference, if I turn off threading & SIMD in the regular C++ version, I get 7.0Mray/s on the Core i9 8950HK MacBookPro. So WebAssembly at 5.1-5.8 Mray/s is slightly slower, but not “a lot”. Is nice!

All code is on github at 17-wasm tag.