Inference engineer

Mirai On Device AI

Mirai On Device AI

Posted on Apr 9, 2026

We're looking for engineers who can bridge the gap between ML research and high-performance inference.

You'll work across our inference engine and model conversion toolkit, implementing new model architectures, supporting new modalities, writing optimized kernels, and building a wide range of features such as function calling and batch decoding.

This role is ideal for someone who reads papers for fun, enjoys writing high-performance code, and gets excited about constant learning.

Nobody knows everything. We'd rather you know one area deeply than everything superficially. If you're good at least in a couple of these areas, you're a great fit:

  • JAX / Equinox / Pallas stack

  • Rust systems programming with a focus on developer experience

  • Writing Metal / Vulkan kernels

  • Neural codecs and voice model architectures

  • Trellis-based quantization approaches

  • Advanced speculative decoding methods, such as EAGLE

  • Deep understanding of Transformer / SSM / Diffusion / Vision language models

  • Benchmarking inference performance and model quality

  • Strong linear algebra, optimization methods, and probability theory

And of course, basic engineering skills, we will ship a lot of code 🙃

We welcome applications from students and early-career engineers. If you've participated in projects that demonstrate systems thinking and ML understanding, we want to hear from you!