
yasp is pioneering the future of software development with a compiler that leverages agentic AI for advanced optimization and code generation. yasp is growing quickly and we are looking for a Senior CUDA Engineer to push the limits of deep learning inference on modern GPU hardware.
This role is deep in the stack. You will own the low-level GPU execution layer of a system that takes models with unsupported or non-standard layers and makes them run at full hardware speed — through Torch FX tracing, ONNX export, TensorRT conversion, and custom CUDA plugin development. You will work closely with the ML research and inference teams, receive technical context and direction, and own the implementation, optimization, and reliability of the GPU execution pipeline. If you enjoy living between the framework and the metal, this role is for you.
What You'll Do
Write and optimize custom CUDA kernels and TensorRT plugins for layers with no native framework support
Own the end-to-end pipeline from Torch FX graph tracing to ONNX export to TensorRT engine compilation, including all edge cases and failure modes
Handle non-trivial tracing challenges including dynamic control flow, unsupported ops, and custom autograd functions
Implement high-performance GEMM paths using CUTLASS and cuBLASLt, including epilogue fusion and algorithm search
Profile and tune kernel performance using Nsight Compute and Nsight Systems — occupancy, memory bandwidth, warp efficiency, and instruction throughput
Experiment with and integrate low-precision and emerging numeric formats including INT8, FP8, BF16, and NF4 / FP4
Identify performance bottlenecks, regressions, and numerical accuracy issues early and surface them clearly with supporting data
Collaborate with ML researchers to onboard new and non-standard model architectures into the inference pipeline quickly and reliably
Drive technical decisions around GPU execution strategy and set a high bar for performance and code quality across the team
Reduce friction in the model-to-deployment pipeline and ensure team's work moves forward consistently week over week
What We're Looking For
5+ years of experience in GPU software engineering, CUDA development, or ML inference systems
Expert-level CUDA C++ — you write kernels from scratch, debug warp divergence, reason about memory coalescing, and understand the full GPU execution model
Strong hands-on experience with TensorRT — custom plugins, engine building, layer fusion, precision calibration, and debugging
Solid experience with Torch FX for model tracing, graph manipulation, and custom transformation passes
Experience exporting models to ONNX including custom ops, dynamic shapes, opset compatibility, and non-standard layer handling
Proficiency with cuBLAS and cuBLASLt for production GEMM workloads
Deep understanding of modern GPU architecture — Ampere, Ada, Hopper — memory hierarchy, Tensor Core utilization, and warp execution
Strong communicator who can explain low-level performance findings clearly to both technical and non-technical stakeholders
Comfort working in a fast-moving environment where requirements evolve and ownership is expected
Nice to Have
Hands-on experience with CUTLASS for custom high-performance GEMM kernels and fused epilogue implementations
Familiarity with emerging quantization and weight formats such as NF4, FP4, or GPTQ-style low-bit packing
Experience with TensorRT-LLM or FasterTransformer for large language model inference
Deployment experience on NVIDIA Orin, DRIVE AGX, or Jetson edge platforms
Knowledge of Triton (OpenAI) for GPU kernel authoring in Python
Open source contributions to CUDA, inference systems, or ML compiler projects
Hot sauce lover
Perks and Benefits
Competitive salary
Opportunities for professional development and growth
Flexible work hours
Dynamic and collaborative work environment
Cutting-edge software and hardware platforms
Why Join Us?
Ownership & Autonomy
We give you the space to lead, experiment, and make decisions; we don't micromanage.
Growth Culture
You won't be static. We support your learning with coaching, resources, and exposure to new challenges.
Stable & Ethical
We operate with transparency, accountability, and financial responsibility. No surprises, no hollow promises.
Belonging & Diversity
We believe diversity of background, thought, and experience makes us stronger. We strive for an inclusive culture where everyone can thrive. At yasp, we're committed to creating an inclusive and diverse environment where everyone has an equal opportunity to thrive.
Long-Term Vision
We plan and invest for the long run — in the business, in our people, and in our community.
Why join us?
At yasp, we’re committed to creating an inclusive and diverse environment where everyone has an equal opportunity to thrive.
We’re looking for people who are excited by deep tech challenges, passionate about pushing the boundaries of AI performance, eager to build products that make a real impact, and absolutely thrive when working with a team.
If that sounds like you, we’d love to connect. Apply now!