Note: The job is a remote job and is open to candidates in USA. Modular is on a mission to revolutionize AI infrastructure by rebuilding the AI software stack. They are seeking a Senior AI Kernel Engineer to lead the design and optimization of high-performance kernels for AI inference on GPUs and custom accelerators, collaborating closely with various teams to enhance performance.
Responsibilities
- Design, implement, and optimize performance-critical kernels for AI inference workloads (e.g., GEMM, attention, communication, fusion)
- Lead kernel-level optimization efforts across single-GPU, multi-GPU, and heterogeneous hardware environments
- Make informed trade-offs between latency, throughput, memory footprint, and numerical precision
- Drive adoption of new hardware features (e.g., Tensor Cores, asynchronous execution, advanced memory spaces)
- Analyze performance using profilers, hardware counters, and microbenchmarks; translate insights into concrete improvements
- Work closely with compiler and runtime teams to influence code generation, scheduling, and kernel fusion strategies
- Review and mentor other engineers on kernel design, performance tuning, and best practices
- Contribute to technical roadmaps and long-term performance strategy for AI inference
Skills
- 5+ years of experience in performance-critical systems or kernel development (or equivalent depth of expertise)
- Strong proficiency in C/C++ and low-level programming
- Extensive hands-on experience with GPU kernel programming (CUDA, HIP, or equivalent)
- Deep understanding of GPU architecture, including memory hierarchies, synchronization, and execution models
- Proven track record of delivering measurable performance improvements in production systems
- Strong problem-solving skills and ability to work independently on complex, ambiguous performance challenges
- Experience with PTX, assembly-level tuning, or code generation frameworks (e.g., Triton)
- Experience optimizing distributed or multi-GPU inference pipelines
- Familiarity with custom AI accelerators or domain-specific hardware
- Understanding of modern AI models (e.g., transformers, LLMs, diffusion) from a systems and performance perspective
- Contributions to open-source kernel libraries, compilers, or performance tools
- Experience collaborating directly with hardware or compiler teams
Benefits
- Premier insurance plans
- Up to 5% 401k matching
- Flexible paid time off
- Stock options
- Annual target bonus
- Equity
- Team Building Events
- Regular team onsites and local meetups in Los Altos, CA as well as different cities
- Traveling 2-4 times a year is expected for all roles
Company Overview
Modular provides AI infrastructure for deployment, serving, and programming GPUs. It was founded in 2022, and is headquartered in Palo Alto, California, USA, with a workforce of 51-200 employees. Its website is https://www.modular.com.Company H1B Sponsorship
Modular has a track record of offering H1B sponsorships, with 3 in 2026, 10 in 2025, 6 in 2024, 8 in 2023, 4 in 2022. Please note that this does not guarantee sponsorship for this specific role.