Jonah Bernard

i want to build systems that advance civilization and make it easier for humans to live

i'm currently exploring llm inference

hi! i'm jonah bernard, i'm a senior at cornell studying computer science

i am interested in the world and in computing systems

i am currently exploring llm inference with a focus on optimizing software for different hardware backends

i am interested in building technology systems that advance civilization and make us more productive, safer, and lead more meaningful lives

please reach out to me if you want to chat!

Experience

AMD logo

AMD

Software Engineer Intern

Summer 2026

Cubic Transportation Systems logo

Cubic Transportation Systems

Software Engineer Intern

Summer 2025

Worked on building technology systems that advance civilization and make it easier for humans to live. Contributed to various software engineering projects.

Education

Cornell University logo

Cornell University

B.S. in Computer Science

2022-2026

Senior studying computer science. Exploring LLM inference with a focus on optimizing software for different hardware backends.

open source work is important to me as it makes it easier for young people to bring their ideas to life

alongside my open-source contributions, i write technical blogs about topics to help newcomers contribute more effectively to ai infrastructure

My open-source contributions

SGLang

Major projects

Full List of my Prs to SGLang

sgl-project/sglang [Pipeline Parallelism][Bug] Fix scheduler hang in pipeline parallelism setup run-ci
#23006 by jonahbernard Contributor was merged last week · Approved
7
sgl-project/sglang Add MLX profiling to bench_one_batch.py run-ci
#22159 by jonahbernard Contributor was merged 2 weeks ago · Approved
7
sgl-project/sglang [LoRA][III] Add LoRA support for MoE layers and enable TP lora documentation quant run-ci
#14105 by jonahbernard Contributor was merged last month · Approved
83
sgl-project/sglang [LoRA][II] Add fused MOE LoRA Triton kernel and tests lora run-ci
#19711 by yushengsu-thu Collaborator was merged last month · Approved
8
sgl-project/sglang [LoRA][I] Add MOE LoRA JIT alignment kernel and tests lora run-ci
#19710 by yushengsu-thu Collaborator was merged last month · Approved
18
sgl-project/sglang [args] Add Expert Parallelism Argument To SRT Runner run-ci
#18492 by jonahbernard Contributor was merged last month · Approved
2
sgl-project/sglang [MoE Refactor] Refactor FlashInferFusedMoE into FusedMoE and flashinfer_trtllm.py quant run-ci
#19266 by jonahbernard Contributor was merged 2 months ago · Approved
2
sgl-project/sglang Add MoE Integration Tests For CUTLASS Coverage
#16280 by jonahbernard Contributor was merged 3 months ago · Approved
5
sgl-project/sglang [MoE] Add Comprehensive MoE Integration Tests run-ci
#12090 by jonahbernard Contributor was merged 5 months ago · Approved
0
sgl-project/sglang [Qwen3 VL] Add LoRA support for Qwen 3 VL run-ci
#12165 by jonahbernard Contributor was merged 5 months ago · Approved
10
sgl-project/sglang [Test] Add parameters to SRTRunner run-ci
#12227 by jonahbernard Contributor was merged 5 months ago · Approved
1
sgl-project/sglang docs(server-arguments): add allowed options for each argument run-ci
#11560 by jonahbernard Contributor was merged 6 months ago · Approved
16
sgl-project/sglang Refactor Triton-kernel MoE runner integration run-ci
#11795 by jonahbernard Contributor was merged 6 months ago · Approved
14
sgl-project/sglang docs(router): add token-bucket rate limiting to the docs documentation router run-ci
#11485 by jonahbernard Contributor was merged 6 months ago · Approved
4
sgl-project/sglang [router] Add Rust CLI flags for queue size, timeout, and rate limit for token bucket rate limiter router run-ci
#11483 by jonahbernard Contributor was merged 6 months ago · Approved
3

personal projects

My tech stack

these are the tools and technologies i reach for when building, debugging, and exploring

Languages

Python
my daily driver for ml infra, scripting, and most of my sglang work
C++
for low-level work
CUDA
writing custom kernels for nvidia gpus, mostly around moe and lora
Triton
use it to write kernels for both AMD and NVIDIA gpus
Metal (MSL)
writing kernels for apple silicon as part of the sglang port
OCaml
picked it up at cornell
Java
my first programming language

ML & Inference

SGLang
the inference engine i contribute to most actively
PyTorch
model definitions, custom ops, and most experimentation
vLLM
reference point when i'm thinking about scheduler and kv-cache design
Hugging Face
model weights, tokenizers, and quick benchmarking
ONNX
portable model format for moving across runtimes and hardware
ONNX Runtime
running onnx models with hardware-specific execution providers

Tools

MacBook Pro M1
my daily machine for development
git
all of my open-source work flows through it
tmux
how i keep long-running training and bench jobs alive
Claude Code / Cursor / Antigravity
i switch between them depending on the most recent updates
Docker
reproducing inference environments across machines
LLVM
wrote a few compiler passes
MLIR
wrote a few compiler passes
Linux
everyone uses linux
Bash
necessary to do work on my mac
LangChain
used it to build an agent to help me contribute to sglang
PostgreSQL
my default for relational storage
MinIO
my default object store

i'm a software engineer because i want to make the world a better place through technology

to ensure i know the best way to do this, i carve out time to learn about law, history and business

recent reads...

the english and their history, the common law, a history of american law, legality, the conquest for happiness

what i listen to...

supreme court oral arguments, acquired, we the people, the rest is history

it's also important to focus on where the rubber meets the road

near and dear to me are keeping up with how new technology helps detect lead contamination and prevent car crashes