Loading…
June 23 - 25, 2025
Denver, Colorado
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for Open Source Summit North America 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Mountain Daylight Time (UTC/GMT -6). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

Wednesday June 25, 2025 11:00am - 11:40am MDT
Triton Inference Server has long been a reliable tool for AI model deployment. As Generative AI unfolds its transformative potential, Triton continues to evolve, offering both time-tested features and new capabilities tailored for large language models and more complex agentic workflows.

This session explores how Triton’s core strengths continue to play a crucial role in optimizing generative AI deployments. These include its robust multi-framework support, dynamic batching, concurrent model execution, and the capability to deploy complex inference pipelines through model ensembling and business logic scripting.

We’ll also cover recent enhancements such as OpenAI compatible frontend, allowing easy integration with existing OpenAI-based applications; Python-based backends to standardize the deployment of Python models without writing a custom C++ backend; Triton CLI to simplify model deployment and management; distributed inference enhancements for Data Center scale.

Throughout the presentation, we’ll share practical examples and best practices, equipping our listeners with the knowledge to effectively use Triton Inference Server to optimize AI workloads’ performance and efficiency.
Speakers
avatar for Olga Andreeva

Olga Andreeva

Senior Software Engineer, NVIDIA
Olga Andreeva is a senior software engineer, specializing in machine learning inferencing. With a PhD in Computer Science from the University of Massachusetts Boston and experience in both academia and industry, Olga specializes in translating cutting-edge ML research into robust... Read More →
avatar for Ryan McCormick

Ryan McCormick

Senior Software Engineer, NVIDIA
Ryan McCormick is a senior software engineer working at the intersection of machine learning, systems software and distributed systems at NVIDIA. He is responsible for developing scalable and performant inference solutions, with a current focus on the Triton Inference Server and Triton... Read More →
Wednesday June 25, 2025 11:00am - 11:40am MDT
Bluebird Ballroom 3F
  Open AI + Data
  • Audience Experience Level Any

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link