commit 8a03585de7c77e623b2d50a4224de9994a1fdd03
Author: Max Rottenkolber <max@mr.gy>
Date:   Wed Nov 26 11:45:13 2025 +0100

    ....

diff --git a/blog/raptorjit-vm-musttail-preserve_none.mk2 b/blog/raptorjit-vm-musttail-preserve_none.mk2
index 4c9a83f..66cf7e7 100644
--- a/blog/raptorjit-vm-musttail-preserve_none.mk2
+++ b/blog/raptorjit-vm-musttail-preserve_none.mk2
@@ -21,7 +21,7 @@ interpreter in C.
 [reverberate.org 1](https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html),
 [2](https://blog.reverberate.org/2025/02/10/tail-call-updates.html)
 
-#code The _dispatch_ logic ({musttail} was introduced in LLVM 17).#
+#code The _dispatch_ logic ({musttail} was introduced in LLVM 17).#
 __attribute__((always_inline))
 void dispatch
 (Bytecode *pc, Bytecode bc, Value *stack, Subroutine *disp)

commit a6ac1ce2fd5558092a801fa637dbaa8644d3eb47
Author: Max Rottenkolber <max@mr.gy>
Date:   Wed Nov 26 11:33:04 2025 +0100

    blog/raptorjit-vm-musttail-preserve_none: publish

diff --git a/blog/raptorjit-vm-musttail-preserve_none.meta b/blog/raptorjit-vm-musttail-preserve_none.meta
index 4306d62..0609137 100644
--- a/blog/raptorjit-vm-musttail-preserve_none.meta
+++ b/blog/raptorjit-vm-musttail-preserve_none.meta
@@ -2,4 +2,4 @@
            :author "Max Rottenkolber <max@mr.gy>"
            :index-p nil
            :index-headers-p nil)
-:publication-date nil
\ No newline at end of file
+:publication-date "2025-11-26 11:32"
\ No newline at end of file

commit ea6c690993df024e3718b7156747968ccd1dc987
Author: Max Rottenkolber <max@mr.gy>
Date:   Wed Nov 26 11:00:49 2025 +0100

    blog/raptorjit-vm-musttail-preserve_none: edits

diff --git a/blog/raptorjit-vm-musttail-preserve_none.mk2 b/blog/raptorjit-vm-musttail-preserve_none.mk2
index d2acac9..4c9a83f 100644
--- a/blog/raptorjit-vm-musttail-preserve_none.mk2
+++ b/blog/raptorjit-vm-musttail-preserve_none.mk2
@@ -1,11 +1,8 @@
 A contemporary bytecode interpreter might be written in assembly language and look like this:
-{N} bytecodes are implemented by adjacent subroutines {R}0..{N}-1_ aligned evenly to {M} bytes.
+{N} bytecodes are implemented by adjacent subroutines {R0}..{N}-1 aligned evenly to {M} bytes.
 Each subroutine ends with dispatch code inlined that does a
 
-#code where {i} is 0..{N}-1#
-jump R0 + i * M
-#
-
+_jump_ {R0} + {i} * {M}; where {i} is 0..{N}-1
 
 By convention certain registers will hold important values across bytecode subroutines.
 The expected-to-be-common cases will be implemented directly in each subroutine, while so-called

commit 7e04fbbf7adb251e2e9364a6ddf9fccd3dd40661
Author: Max Rottenkolber <max@mr.gy>
Date:   Tue Nov 25 21:59:35 2025 +0100

    blog/raptorjit-vm-musttail-preserve_none: edits

diff --git a/blog/raptorjit-vm-musttail-preserve_none.mk2 b/blog/raptorjit-vm-musttail-preserve_none.mk2
index d748e44..d2acac9 100644
--- a/blog/raptorjit-vm-musttail-preserve_none.mk2
+++ b/blog/raptorjit-vm-musttail-preserve_none.mk2
@@ -1,7 +1,11 @@
 A contemporary bytecode interpreter might be written in assembly language and look like this:
-_N_ bytecodes are implemented by adjacent subroutines _R0..N-1_ aligned evenly to _M_ bytes.
-Each subroutine ends with dispatch code inlined that does a {jump R0 + i * M}
-where _i_ is 0.._N_-1.
+{N} bytecodes are implemented by adjacent subroutines {R}0..{N}-1_ aligned evenly to {M} bytes.
+Each subroutine ends with dispatch code inlined that does a
+
+#code where {i} is 0..{N}-1#
+jump R0 + i * M
+#
+
 
 By convention certain registers will hold important values across bytecode subroutines.
 The expected-to-be-common cases will be implemented directly in each subroutine, while so-called
@@ -10,16 +14,15 @@ eventually return to the bytecode subroutine.
 
 The LuaJIT interpreter follows this design with one extra layer of indirection:
 instead of jumping to evenly aligned subroutines, it looks up the addresses of bytecode
-subroutines in a dispatch table. Being the backbone of a tracing JIT compilers, the
-LuaJIT interpreter used this dispatch table to dynamically swap out bytecode subroutines
+subroutines in a dispatch table. Being the backbone of a tracing JIT compiler, the
+LuaJIT interpreter uses this dispatch table to dynamically swap out bytecode subroutines
 depending wheter it is currently recording a trace or not.
 
 For RaptorJIT we would like to rewrite the interpreter in a high-level language.
 Recently (relatively speaking) added features of LLVM allow us to construct an equivalent
 interpreter in C.
-[1](https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html),
-[2](https://transactional.blog/copy-and-patch/how-it-works),
-[3](https://blog.reverberate.org/2025/02/10/tail-call-updates.html)
+[reverberate.org 1](https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html),
+[2](https://blog.reverberate.org/2025/02/10/tail-call-updates.html)
 
 #code The _dispatch_ logic ({musttail} was introduced in LLVM 17).#
 __attribute__((always_inline))
@@ -63,9 +66,9 @@ saved and restored to and from the stack between tail calls.
 We can use the {preserve_none} calling convention for all interpreter subroutines instead
 by tagging their definitions with {__attribute__((preserve_none))} (available since LLVM 21).
 This directs LLVM to use a calling convention where all registers are caller-save, and
-lets enables penalty-free use of previously callee-save registers.
+enables penalty-free use of previously callee-save registers.
 
 We found that with {-fzero-call-used-regs} (which appears to be enabled by default on some
 installations) LLVM will zero-initialize many registers in between tail calls when using
 {preserve_none} in LLVM 21. In our case this caused redundant code to be emitted and we
-resorted to {-fzero-call-used-regs=skip} to inhibit this behavior.
+resort to {-fzero-call-used-regs=skip} to inhibit this behavior.

commit e22f021a909b60b74421f1f4544ba3baab609069
Author: Max Rottenkolber <max@mr.gy>
Date:   Tue Nov 25 17:58:58 2025 +0100

    blog/raptorjit-vm-musttail-preserve_none

diff --git a/blog/raptorjit-vm-musttail-preserve_none.meta b/blog/raptorjit-vm-musttail-preserve_none.meta
new file mode 100644
index 0000000..4306d62
--- /dev/null
+++ b/blog/raptorjit-vm-musttail-preserve_none.meta
@@ -0,0 +1,5 @@
+:document (:title "Notes on rewriting the RaptorJIT interpreter using LLVM’s musttail and preserve_none attributes"
+           :author "Max Rottenkolber <max@mr.gy>"
+           :index-p nil
+           :index-headers-p nil)
+:publication-date nil
\ No newline at end of file
diff --git a/blog/raptorjit-vm-musttail-preserve_none.mk2 b/blog/raptorjit-vm-musttail-preserve_none.mk2
new file mode 100644
index 0000000..d748e44
--- /dev/null
+++ b/blog/raptorjit-vm-musttail-preserve_none.mk2
@@ -0,0 +1,71 @@
+A contemporary bytecode interpreter might be written in assembly language and look like this:
+_N_ bytecodes are implemented by adjacent subroutines _R0..N-1_ aligned evenly to _M_ bytes.
+Each subroutine ends with dispatch code inlined that does a {jump R0 + i * M}
+where _i_ is 0.._N_-1.
+
+By convention certain registers will hold important values across bytecode subroutines.
+The expected-to-be-common cases will be implemented directly in each subroutine, while so-called
+“slow paths” will cause ABI calls to functions—possibly written in a high-level language—which
+eventually return to the bytecode subroutine.
+
+The LuaJIT interpreter follows this design with one extra layer of indirection:
+instead of jumping to evenly aligned subroutines, it looks up the addresses of bytecode
+subroutines in a dispatch table. Being the backbone of a tracing JIT compilers, the
+LuaJIT interpreter used this dispatch table to dynamically swap out bytecode subroutines
+depending wheter it is currently recording a trace or not.
+
+For RaptorJIT we would like to rewrite the interpreter in a high-level language.
+Recently (relatively speaking) added features of LLVM allow us to construct an equivalent
+interpreter in C.
+[1](https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html),
+[2](https://transactional.blog/copy-and-patch/how-it-works),
+[3](https://blog.reverberate.org/2025/02/10/tail-call-updates.html)
+
+#code The _dispatch_ logic ({musttail} was introduced in LLVM 17).#
+__attribute__((always_inline))
+void dispatch
+(Bytecode *pc, Bytecode bc, Value *stack, Subroutine *disp)
+{
+    bc = *pc++;
+    __attribute__((musttail))
+    return disp[bc.op](pc, bc, stack, disp);
+}
+#
+
+The interpreter dispatch can be expressed as a tail call,
+where the arguments hold important values across bytecode subroutines.
+Fast-path logic can be inlined, while slow paths can be contained in tail calls
+to slow-path functions.
+
+#code An exemplary bytecode subroutine.#
+void op_ADD
+(Bytecode *pc, Bytecode bc, Value *stack, Subroutine *disp)
+{
+    if (!isnumber(stack[bc.b]) || !isnumber(stack[bc.a]))
+        __attribute__((musttail))
+        return add_slowpath(pc, bc, stack, dispatch);
+    
+    stack[bc.a] = stack[bc.b].num + stack[bc.c].num;
+    __attribute__((musttail))
+    return dispatch(pc, bc, stack, dispatch);
+}
+#
+
+Using this technique we end up with machine code comparable to the LuaJIT assembler VM.
+By enabling frame pointer omission ({-fomit-frame-pointer}) we can avoid saving the
+frame pointer to the stack for most functions.
+There is one caveat, namely that code generated for the bytecode subroutines follows
+the default calling convention.
+Under the default convention, caller-save registers are usually limited.
+Once LLVM runs out of those it will resort to callee-save registers which have to be
+saved and restored to and from the stack between tail calls.
+
+We can use the {preserve_none} calling convention for all interpreter subroutines instead
+by tagging their definitions with {__attribute__((preserve_none))} (available since LLVM 21).
+This directs LLVM to use a calling convention where all registers are caller-save, and
+lets enables penalty-free use of previously callee-save registers.
+
+We found that with {-fzero-call-used-regs} (which appears to be enabled by default on some
+installations) LLVM will zero-initialize many registers in between tail calls when using
+{preserve_none} in LLVM 21. In our case this caused redundant code to be emitted and we
+resorted to {-fzero-call-used-regs=skip} to inhibit this behavior.