Loading…
June 23 - 25, 2025
Denver, Colorado
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for Open Source Summit North America 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Mountain Daylight Time (UTC/GMT -6). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

Wednesday June 25, 2025 11:00am - 11:40am MDT
As Generative AI unfolds its transformative potential, and enterprise needs have shifted toward large-scale, distributed deployments, the requirements for inference serving have fundamentally changed. To meet these new demands, NVIDIA has introduced Dynamo— a high-throughput low-latency inference framework designed for serving generative AI and reasoning models in multi-node distributed environments.

This session provides a technical overview of Dynamo’s architecture, focusing on how its design addresses the core challenges of large-scale, distributed generative AI inference. We will walk through concrete deployment scenarios—including disaggregated serving and dynamic GPU scheduling—and examine how Dynamo manages resource allocation, request routing, and memory efficiency for high-throughput, low-latency inference.

We will also share practical implementation examples and discuss engineering best practices for optimizing workload performance, scalability, and cost using Dynamo. We’ll outline the steps and considerations for deploying Dynamo, highlighting key architectural differences and compatibility factors. By the end of the session, attendees will have a clear understanding of how to deploy and operate Dynamo in production environments to support advanced AI workloads.
Speakers
avatar for Olga Andreeva

Olga Andreeva

Senior Software Engineer, NVIDIA
Olga Andreeva is a senior software engineer, specializing in machine learning inferencing. With a PhD in Computer Science from the University of Massachusetts Boston and experience in both academia and industry, Olga specializes in translating cutting-edge ML research into robust... Read More →
avatar for Ryan McCormick

Ryan McCormick

Senior Software Engineer, NVIDIA
Ryan McCormick is a senior software engineer working at the intersection of machine learning, systems software and distributed systems at NVIDIA. He is responsible for developing scalable and performant inference solutions, with a current focus on the Triton Inference Server and Triton... Read More →
Wednesday June 25, 2025 11:00am - 11:40am MDT
Bluebird Ballroom 3F
  Open AI + Data
  • Audience Experience Level Any

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link