Descripción del puesto
<p style="min-height:1.5em">Our mission is to automate coding. The first step in our journey is to build the best tool for professional programmers, using a combination of inventive research, design, and engineering. Our organization is very flat, and our team is small and talent dense. We particularly like people who are truth-seeking, passionate, and creative. We enjoy spirited debate, crazy ideas, and shipping code.</p><h3>About the Role</h3><p style="min-height:1.5em">You will lead the Model Routing & Inference team at Cursor, owning the inference platform that powers every AI interaction in the product. This team owns the full inference path: making Cursor's AI faster, more reliable, and more cost-effective at a scale few teams in the world get to operate at. Every agent session, every tab completion, and every chat message flows through your stack.</p><p style="min-height:1.5em">You'll set technical direction for cluster management, inference optimization, and traffic egress, building the platform that lets the rest of the company move fast without worrying about provider complexity. You'll lead a team of strong engineers, set strong direction for the business, and make the calls that balance latency, cost, reliability, and user experience across millions of daily requests.</p><p style="min-height:1.5em"></p><h3><strong>What you’ll do</strong></h3><ul style="min-height:1.5em"><li><p style="min-height:1.5em">Building and evolving our inference gateway, a single abstraction over every provider's API semantics, so model onboarding becomes a config change.</p></li><li><p style="min-height:1.5em">Building the systems that dynamically select the best model for each request based on cost, latency, and quality.</p></li><li><p style="min-height:1.5em">Managing GPU cluster utilization and capacity planning across providers, optimizing for cost and performance.</p></li><li><p style="min-height:1.5em">Designing routing backpressure and admission control so traffic spikes don't cascade into providers.</p></li><li><p style="min-height:1.5em">Hiring and growing the team: sourcing, interviewing, and closing top inference and systems talent, while developing your engineers through coaching, mentorship, and high-leverage project assignments.</p></li></ul><h3>You may be a fit if</h3><ul style="min-height:1.5em"><li><p style="min-height:1.5em">You have led engineering teams building high-throughput, low-latency distributed systems, especially in inference serving, traffic routing, or real-time data pipelines.</p></li><li><p style="min-height:1.5em">You're comfortable reasoning about cost/performance tradeoffs at scale (GPU utilization, provider economics, capacity planning) and making decisions with incomplete information.</p></li><li><p style="min-height:1.5em">You have strong software engineering fundamentals and enjoy shipping production systems that handle millions of requests.</p></li><li><p style="min-height:1.5em">Experience with model serving frameworks (vLLM, TensorRT-LLM, TGI), load balancing, or building resilient multi-provider architectures is a plus.</p></li><li><p style="min-height:1.5em">You make good calls in the gray area: weighing reliability, cost, latency, and user experience when there isn't a single "right" answer.</p></li></ul><p style="min-height:1.5em">#LI-DNI</p>