Loading…
June 23 - 25, 2025
Denver, Colorado
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for Open Source Summit North America 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Mountain Daylight Time (UTC/GMT -6). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

Venue: Bluebird Ballroom 3E clear filter
Monday, June 23
 

11:20am MDT

EdgeLake-FL: An Automated Federated Learning Platform for the Edge - Roy Shadmon & Moshe Shadmon, AnyLog
Monday June 23, 2025 11:20am - 12:00pm MDT
Edge AI today relies on centralizing data from edge devices to the cloud, but this is impractical due to costs and privacy constraints. Federated Learning (FL) is a viable alternative: edge nodes collaboratively train a ML model without transferring or exposing proprietary data. Instead, only model weights are shared, allowing each entity to develop a model that outperforms what it could train independently. Despite its potential, FL is largely academic due to the complexity of integrating expertise across the technology stack. Additionally, decentralized data can be heterogeneous, requiring non-generalizable, application-specific solutions. EdgeLake-FL is a hardware-agnostic framework leveraging EdgeLake, an LF Edge project, to automate the continuous learning FL workflow. With EdgeLake as the data management layer, decentralized data appears centralized and data heterogeneity is resolved. Using EdgeLake-FL, an ML engineer publishes a training application, and Edge nodes with relevant data autonomously train, share, and aggregate models. Each node can then leverage the aggregated models for inference directly at the Edge. In this talk, I will demo EdgeLake-FL in a real use case.
Speakers
avatar for Roy Shadmon

Roy Shadmon

System Architect at AnyLog & EdgeLake TSC Member, AnyLog
Roy Shadmon is an EdgeLake contributor and a System Architect at AnyLog where he leads blockchain and ML initiatives. He is also a Ph.D. candidate at UC Santa Cruz, and his research focus is at the intersection of Bayesian statistics, distributed systems, and Byzantine fault tole... Read More →
avatar for Moshe Shadmon

Moshe Shadmon

CEO, AnyLog
Moshe Shadmon, CEO at Anylog. AnyLog’s Virtual Edge Data Network is a Plug & Play software, deployed at the edge, allowing real-time insight without centralizing the data. AnyLog enables deployment of applications and AI at the distributed edge. Prior to AnyLog, Moshe was the CEO... Read More →
Monday June 23, 2025 11:20am - 12:00pm MDT
Bluebird Ballroom 3E
  Open AI + Data
  • Audience Experience Level Any

1:30pm MDT

Tutorial: From Planning To Production-Ready RAG With OPEA - Andreas Kollegger, Neo4j; Ezequiel Lanza & Katherine Druckman, Intel
Monday June 23, 2025 1:30pm - 3:05pm MDT
Enterprises struggle to integrate fragmented generative AI (GenAI) technologies. Due to its rapid evolution and diverse implementations, even top LLMs hallucinate when answering Kubernetes-related questions.

The Open Platform for Enterprise AI (OPEA), a Linux Foundation project, accelerates GenAI adoption with an orchestration framework that composes microservices via customizable blueprints to deploy or create GenAI applications.

In this hands-on tutorial, developers will deploy advanced retrieval-augmented generation (RAG) applications using Kubernetes. They’ll explore how OPEA orchestrates AI workloads via a microservices architecture and build a production-ready RAG chatbot. Attendees will go beyond deployment, enhancing vector search with knowledge graphs, customizing OPEA components, and scaling AI solutions efficiently on Kubernetes—all while integrating AI agents for more intelligent automation.
Speakers
avatar for Ezequiel Lanza

Ezequiel Lanza

Open Source AI Evangelist, Intel
Passionate about helping people discover the exciting world of artificial intelligence, Ezequiel is a frequent AI conference presenter and the creator of use cases, tutorials, and guides that help developers adopt open source AI tools.
avatar for Katherine Druckman

Katherine Druckman

Open Source Evangelist, Intel Corporation
Katherine Druckman is an Open Source Evangelist at Intel, where she enjoys sharing her passion for a variety of open source topics. She currently combines her enthusiasm for software security and emerging AI technology as the OPEA Security Working Group Lead and Co-Chair of the OpenSSF... Read More →
avatar for Andreas Kollegger

Andreas Kollegger

GenAI Lead for Developer Relations, Neo4j
Andreas is a technological humanist. Starting at NASA, Andreas designed systems from scratch to support science missions. Then in Zambia, he built medical informatics systems to apply technology for social good. Now with Neo4j, he is democratizing graph databases to validate and extend... Read More →
Monday June 23, 2025 1:30pm - 3:05pm MDT
Bluebird Ballroom 3E
  Open AI + Data

3:35pm MDT

The MODERN Modern Data Stack: Building an Open Distributed Data Warehouse Beyond Data Lakes - David Aronchick, Expanso
Monday June 23, 2025 3:35pm - 4:15pm MDT
Organizations face a critical challenge: data is growing exponentially across distributed locations, but traditional centralized processing approaches are becoming unsustainable. With the majority of enterprise data going unused, companies struggle with massive transfer costs, compliance issues, and network reliability problems when moving data to centralized infrastructure.

This talk introduces a paradigm shift: bringing compute to where data lives. Using the open-source Bacalhau project, we'll demonstrate how to:

- Deploy distributed processing jobs across clouds, edge devices, and on-premises infrastructure
- Reduce data movement costs while maintaining centralized control
- Ensure compliance by processing sensitive data in place
- Enable real-time analytics at the edge

Through real-world examples, including an energy company managing 15,000 microgrids and cities processing camera feeds, attendees will learn practical patterns for modernizing their data infrastructure. We'll explore architectural patterns, security considerations, and best practices for implementing compute-over-data architectures.
Speakers
avatar for David Aronchick

David Aronchick

CEO, Expanso
David Aronchick is CEO of Expanso, the distributed computing company built on Bacalhau ([https://bacalhau.org](https://bacalhau.org/)). Previously, he led Compute over Data at Protocol Labs, Open Source Machine Learning Strategy at Azure, was a product management for Kubernetes... Read More →
Monday June 23, 2025 3:35pm - 4:15pm MDT
Bluebird Ballroom 3E
  Open AI + Data

4:30pm MDT

Agents in Action: Advancing Open Source AI Missions - Hema Veeradhi, Red Hat
Monday June 23, 2025 4:30pm - 5:10pm MDT
AI Agents have become a buzzword in the world of generative AI—but what exactly are agents, and how are they advancing open source AI? Agents are autonomous systems that extend the capabilities of Large Language Models (LLMs) by leveraging external tools and real-time data to perform dynamic actions for solving complex, multi-step tasks. They are at the forefront of transforming open source AI by enabling adaptability, automation and more intelligent decision-making.
In this talk, we’ll explore popular open source agent frameworks like LangChain and Haystack, which are driving rapid development and community-powered innovation in AI agents. Using real-world examples and a live demo, we’ll demonstrate how these frameworks can be leveraged to build a variety of generative AI applications, including RAG, IT operations automation and dynamic, context-aware chatbots.
Attendees will gain a clear insight into why AI agents are at the forefront of innovation and how open source collaboration is driving its evolution. Whether you’re an AI developer, OSPO leader or open source enthusiast, this session will highlight the potential of agents to power the next wave of open source AI missions.
Speakers
avatar for Hema Veeradhi

Hema Veeradhi

Principal Data Scientist, Red Hat
Hema Veeradhi is a Principal Data Scientist working in the Emerging Technologies team part of the office of the CTO at Red Hat. Her work primarily focuses on implementing innovative open AI and machine learning solutions to help solve business and engineering problems. Hema is a staunch... Read More →
Monday June 23, 2025 4:30pm - 5:10pm MDT
Bluebird Ballroom 3E
  Open AI + Data
  • Audience Experience Level Any
 
Tuesday, June 24
 

11:00am MDT

Open AI (Two Words): The Only Path Forward for AI - Matt White, Linux Foundation
Tuesday June 24, 2025 11:00am - 11:40am MDT
The exponential growth in artificial intelligence capabilities has been fundamentally driven by open science and collaborative research. From the publication of the "Attention Is All You Need" paper that introduced the Transformer architecture to OpenAI's strategic release of GPT-2, openness has repeatedly catalyzed breakthrough innovations while enabling crucial public discourse around AI's implications.

This talk presents a compelling case for why open source development is not just beneficial but essential for the future of safe and equitable AI. We'll examine how the open-source ecosystem has democratized access to AI technology, enabled transparency and innovation, and fostered a global community of researchers working to ensure AI systems are robust and aligned with human values.

Through concrete examples, we'll demonstrate how open-source initiatives have already begun addressing critical challenges in AI development. The Model Openness Framework has established clear standards for transparency, while the pioneering OpenMDW license has created a legal framework for responsible sharing of AI artifacts.
Speakers
avatar for Matt White

Matt White

GM of AI, Executive Director, PyTorch, Linux Foundation
Matt White is the Executive Director of the PyTorch Foundation and GM of AI at the Linux Foundation. He is also the Director of the Generative AI Commons. Matt has nearly 30 years of experience in applied research and standards in AI and data in telecom, media and gaming industries... Read More →
Tuesday June 24, 2025 11:00am - 11:40am MDT
Bluebird Ballroom 3E
  Open AI + Data

11:55am MDT

The Responsible Generative AI Framework Pathways: Where Do We Go From Here? - Ofer Hermoni, iForAI & Oita Coleman, Open Voice TrustMark Initiative
Tuesday June 24, 2025 11:55am - 12:35pm MDT
The Responsible Generative AI Framework (RGAF) lays the foundation for ethical and transparent AI development, but what comes next? This panel will explore the Responsible AI Pathways, a set of strategic directions designed to move from framework to implementation.

Panelists will discuss the four key pathways shaping the future of responsible AI:
• Big-Picture Alignment – Understanding AI’s role in humanity’s future and aligning LF AI & Data initiatives with ethical AI progress.
• Ecosystem Mapping – Identifying gaps, overlaps, and collaboration opportunities within the global Responsible AI landscape.
• Deep Dive into Core Dimensions – Addressing AI safety, security, sustainability, and other critical aspects for responsible development.
• Practical Implementation – Grounding principles in real-world use cases, industry applications, and open-source tooling.
This session will also highlight the role of AI safety and security in responsible AI adoption and provide attendees with insights, strategies, and next steps to ensure AI innovation remains transparent, accountable, and trustworthy.
Join us for a forward-thinking discussion on how to shape the future of responsible Gen AI.
Speakers
avatar for Ofer Hermoni

Ofer Hermoni

Founder, Chief AI Officer, iForAI
Dr. Ofer Hermoni is a visionary AI leader with a Ph.D. in Computer Science and 60+ patents in AI, security, networking, and blockchain. He co-founded the Linux Foundation AI and served as its inaugural technical chair, shaping the global AI ecosystem. A two-time startup founder, he... Read More →
avatar for Oita Coleman

Oita Coleman

Project Lead / Senior Advisor, Open Voice TrustMark Initiative
Oita Coleman is the Project Lead/Senior Advisor at the Open Voice TrustMark Initiative, a global Linux Foundation project dedicated to educating and advocating for open standards and best practices for conversational AI technologies. In her role, she is responsible for developing... Read More →
Tuesday June 24, 2025 11:55am - 12:35pm MDT
Bluebird Ballroom 3E
  Open AI + Data

2:10pm MDT

Lightning Talk: Streaming and Processing Edge Vision Data in Real Time - Joyce Lin, Viam
Tuesday June 24, 2025 2:10pm - 2:20pm MDT
Edge-based computer vision gives us real-time insights, but getting that data where it needs to go without high bandwidth, lag, or hardware strain is a big challenge. Learn how to build a fast, event-driven vision pipeline using WebRTC for real-time streaming and gRPC for lightweight commands. Whether for security cameras or IoT, you'll gain a practical blueprint for a scalable, open-source vision system to stay responsive at the edge while being cost-effective, adaptable, and cloud-independent.
Speakers
avatar for Joyce Lin

Joyce Lin

Head of developer relations, Viam
Joyce Lin is the head of developer relations at Viam, a robotics platform that connects software with smart machines in the physical world. Based in San Francisco, she is also a Tiktok influencer, dog mom, cat mom, and writer.
Tuesday June 24, 2025 2:10pm - 2:20pm MDT
Bluebird Ballroom 3E
  Open AI + Data

2:25pm MDT

Lightning Talk: Serving Guardrail Detectors on Vllm - Evaline Ju, IBM
Tuesday June 24, 2025 2:25pm - 2:35pm MDT
With the increase in generative AI model use, there is a growing concern of how models can divulge information or generate inappropriate content. This concern is leading to the development of technologies to “guardrail” user interactions with models. Some of these guardrails models are simple classification models, while others like IBM’s Granite Guardian or Meta’s Llama Guard are themselves generative models, able to identify multiple risks. As new models appear, a variety of large language model serving solutions are being developed and optimized. An open-sourced example, vllm, has become an increasingly popular serving engine.

In this talk I’ll discuss how we built an open-sourced adapter on top of vllm to serve an API for guardrails models, so that models like Granite Guardian and Llama Guard can be easily applied as guardrails in generative AI workflows.
Speakers
avatar for Evaline Ju

Evaline Ju

Senior Software Engineer, IBM
Evaline is a senior engineer working on the watsonx platform engineering team of IBM Research and based in Denver, Colorado. She currently focuses on building guardrails infrastructure for large language model workflows. Her previous experience includes MLOps for IBM’s cloud ML... Read More →
Tuesday June 24, 2025 2:25pm - 2:35pm MDT
Bluebird Ballroom 3E
  Open AI + Data

2:40pm MDT

Lightning Talk: Future-Proofing Compliance: Leveraging Knowledge Graphs and AI in Cybersecurity - Zeyno Dodd, Conjectura R&D
Tuesday June 24, 2025 2:40pm - 2:50pm MDT
Traditional approaches to cybersecurity compliance are being redefined in an era marked by rapidly evolving cybersecurity threats and stringent compliance requirements. This session explores the innovative integration of Knowledge Graphs (KG) and Retrieval Augmented Generation (RAG) with Generative AI to address the ever-evolving complexities of cybersecurity frameworks like NIST CSF v2.0, NIST 800-171, and CMMC. I will briefly delve into an open-source proof-of-concept demonstrating how these technologies can automate the discovery of compliance relationships and streamline cross-framework assessments. Join me in discovering how we can significantly enhance cybersecurity measures by harnessing open-source tools and AI, reducing the resource burden, and maintaining timely and robust adherence to evolving standards.
Speakers
avatar for Zeyno Dodd

Zeyno Dodd

R&D Architect, Conjectura R&D
Cloud Solution Architect and Researcher with 25+ years in software development and research. Committed to leveraging AI to address complex real-world challenges with societal impact. Specializes in applying Graph Neural Networks (GNN) within Cloud/Edge/Hybrid Machine Learning frameworks... Read More →
Tuesday June 24, 2025 2:40pm - 2:50pm MDT
Bluebird Ballroom 3E
  Open AI + Data
  • Audience Experience Level Any

3:05pm MDT

Universal AI: Execute Your Models Where Your Data (And Users) Are - David Aronchick, Expanso
Tuesday June 24, 2025 3:05pm - 3:45pm MDT
Data is exploding across distributed locations, but centralized processing is increasingly unsustainable. This talk explores "compute over data" architectures that bring ML to your data, unlocking new possibilities through real-world examples.
Speakers
avatar for David Aronchick

David Aronchick

CEO, Expanso
David Aronchick is CEO of Expanso, the distributed computing company built on Bacalhau ([https://bacalhau.org](https://bacalhau.org/)). Previously, he led Compute over Data at Protocol Labs, Open Source Machine Learning Strategy at Azure, was a product management for Kubernetes... Read More →
Tuesday June 24, 2025 3:05pm - 3:45pm MDT
Bluebird Ballroom 3E
  Open AI + Data

4:20pm MDT

Gotta Cache 'em All: Scaling AI Workloads With Model Caching in a Hybrid Cloud - Rituraj Singh & Jin Dong, Bloomberg
Tuesday June 24, 2025 4:20pm - 5:00pm MDT
AI models are evolving rapidly, while also growing exponentially in size and complexity. As AI workloads become larger, it is crucial to address the challenges of rapidly scaling inference services during peak hours and how to ensure optimal GPU utilization for fine-tuning workloads. To tackle this, Bloomberg’s Data Science Platform team has implemented a “Model Cache” feature in the open source KServe project for caching large models on GPUs in a multi-cloud and multi-cluster cloud-native environment.

This talk discusses the challenges faced with hosting large models for inference and fine-tuning purposes, and how model caching can help mitigate some of these challenges by reducing load times during auto-scaling of services, improving resource utilization, and boosting data scientists’ productivity. The talk dives into how Bloomberg integrated KServe’s Model Cache into its AI workloads and built an API on top of Karmada to manage cache federation. AI infrastructure engineers will learn about the profound impact of enabling model caching and how teams can adopt this feature in their own AI infrastructure environment.
Speakers
avatar for Rituraj Singh

Rituraj Singh

Software Engineer, Bloomberg LP
Rituraj Singh is a software engineer on Bloomberg’s Data Science Platform engineering team, which is focused on enabling large-scale AI model training on GPUs. Rituraj graduated from Carnegie Mellon University with a master's degree in computer engineering.
avatar for Jin Dong

Jin Dong

Software Engineer, Bloomberg
Jin Dong is a software engineer at Bloomberg. He works on building an inference platform for machine learning with KServe.
Tuesday June 24, 2025 4:20pm - 5:00pm MDT
Bluebird Ballroom 3E
  Open AI + Data
 
Wednesday, June 25
 

11:00am MDT

Guarding the LLM Galaxy: Security, Privacy, and Guardrails in the AI Era - Jigyasa Grover, BORDO AI & Rishabh Misra, Attentive Mobile Inc
Wednesday June 25, 2025 11:00am - 11:40am MDT
The widespread adoption of Large Language Models (LLMs) like GPT-4, Claude, and Gemini has introduced unprecedented capabilities and equally unprecedented risks. Organizations are increasingly deploying LLMs to handle sensitive tasks, from processing medical records to analyzing financial documents. This talk examines the evolving landscape of LLM security and privacy, combining theoretical foundations with a walkthrough of example implementations.

Through real-world case studies of both attacks and defenses and practical implementation guidance using popular security tools, we'll explore critical vulnerabilities and proven defensive techniques. Special attention will be given to securing fine-tuned and domain-specific LLMs, with live examples using NVIDIA’s NeMo Guardrails, LangChain's security tools, and Microsoft's guidance library.
Speakers
avatar for Jigyasa Grover

Jigyasa Grover

Lead, AI & Research, BORDO AI
10-time award winner in Artificial Intelligence and Open Source and the co-author of the book 'Sculpting Data For ML', Jigyasa Grover is a powerhouse brimming with passion to make a dent in this world of technology and bridge the gaps. AI & Research Lead, she has years of ML engineering... Read More →
avatar for Rishabh Misra

Rishabh Misra

Lead Machine Learning Engineer, Attentive Mobile Inc
Author of the book "Sculpting Data for ML", I am a Lead ML Engineer & Researcher recognized by the US Government for outstanding contribution to ML research. I have extensively published and reviewed research at top AI conferences in NLP (LLMs / GenAI), Deep Learning, and Applied... Read More →
Wednesday June 25, 2025 11:00am - 11:40am MDT
Bluebird Ballroom 3E
  Open AI + Data

11:55am MDT

Fast Inference, Furious Scaling: Leveraging VLLM With KServe - Rafael Vasquez, IBM
Wednesday June 25, 2025 11:55am - 12:35pm MDT
In this talk, we will introduce two open-source projects vLLM and KServe and explain how they can be integrated to leverage better performance and scalability for LLMs in production. The session will include a demo showcasing their integration.

vLLM is a high-performance library specifically designed for LLM inference and serving, offering cutting-edge throughput and efficiency through techniques such as PagedAttention, continuous batching, and optimized CUDA kernels, making it ideal for production environments that demand fast, large-scale LLM serving.

KServe is a Kubernetes-based platform designed for scalable model deployment. It provides robust features for managing AI models in production, including autoscaling, monitoring, and model versioning.

By combining vLLM's inference optimizations with KServe's scalability, organizations can deploy LLMs effectively in production environments, ensuring fast, low-latency inference and seamless scaling across cloud platforms.
Speakers
avatar for Rafael Vasquez

Rafael Vasquez

Open Source Software Developer, IBM
Rafael Vasquez is a software developer on the Open Technology team at IBM. He previously completed an MASc. working on self-driving car research and transitioned from a data scientist role in the retail field to his current role where he continues to grow his passion for MLOps and... Read More →
Wednesday June 25, 2025 11:55am - 12:35pm MDT
Bluebird Ballroom 3E
  Open AI + Data

2:10pm MDT

Building Your (Local) LLM Second Brain - Olivia Buzek, IBM
Wednesday June 25, 2025 2:10pm - 2:50pm MDT
LLMs are hotter than ever, but most LLM-based solutions available to us require you to use models trained on data with unknown provenance, send your most important data off to corporate-controlled servers, and use prodigious amounts of energy every time you write an email.

What if you could design a “second brain” assistant with OSS technologies, that lives on your laptop?

We’ll walk through the OSS landscape, discussing the nuts and bolts of combining Ollama, LangChain, OpenWebUI, Autogen and Granite models to build a fully local LLM assistant. We’ll also discuss some of the particular complexities involved when your solution involves a local quantized model vs one that’s cloud-hosted.

In this talk, we'll build on the lightning talk to include complexities like:
* how much latency are you dealing with when you're running on a laptop?
* does degradation from working with a 7-8b model reduce effectiveness?
* how do reasoning + multimodal abilities help the assistant task?
Speakers
avatar for Olivia Buzek

Olivia Buzek

STSM watsonx.ai - IBM Research, IBM
Olivia has been building machine learning and natural language processing models since before it was cool. She's spent several years at IBM working on opening up Watson tech, around the country and around the world.
Wednesday June 25, 2025 2:10pm - 2:50pm MDT
Bluebird Ballroom 3E
  Open AI + Data

3:05pm MDT

AI Pipelines With OPEA: Best Practices for Cloud Native ML Operations - Ezequiel Lanza, Intel & Melissa McKay, JFrog
Wednesday June 25, 2025 3:05pm - 3:45pm MDT
The Open Platform for Enterprise AI (OPEA) is an open source project intended to assist organizations with the realities of enterprise-grade deployments of GenAI apps. Beginning from scratch is a costly endeavor, and the ability to quickly iterate on a solution and determine its viability for your organization is essential to ensure you are making the best moves forward.

During this session, Ezequiel and Melissa will introduce you to the OPEA platform and how to empower your team to build, deploy, and manage AI pipelines more effectively. Attendees will gain insights into best practices for handling complex AI/ML workloads, automating dependency management, and integrating Kubernetes for efficient resource utilization. With a focus on real-world applications, this talk not only showcases the transformative potential of these tools but also encourages attendees to explore new ways to contribute, innovate, and collaborate in driving the future of AI adoption in enterprise environments.
Speakers
avatar for Ezequiel Lanza

Ezequiel Lanza

Open Source AI Evangelist, Intel
Passionate about helping people discover the exciting world of artificial intelligence, Ezequiel is a frequent AI conference presenter and the creator of use cases, tutorials, and guides that help developers adopt open source AI tools.
avatar for Melissa McKay

Melissa McKay

Head of Developer Relations, JFrog
Melissa is passionate about Java, DevOps and Continuous Delivery. She is currently Head of Developer Relations for JFrog and a member of the Technical Steering Committee of the Open Platform for Enterprise AI (OPEA). Melissa has been recognized as a Java Champion and a Docker Captain... Read More →
Wednesday June 25, 2025 3:05pm - 3:45pm MDT
Bluebird Ballroom 3E
  Open AI + Data

4:20pm MDT

Scalable and Efficient LLM Serving With the VLLM Production Stack - Junchen Jiang, University of Chicago & Yue Zhu, IBM Research
Wednesday June 25, 2025 4:20pm - 5:00pm MDT
Large Language Models (LLMs) are reshaping how we build applications; however, efficiently serving them at scale remains a major challenge.

The vLLM serving engine, historically focused on single-node deployments, is now being extended into a full-stack inference system through our open-source project, **vLLM Production Stack**. This extension enables any organization to deploy vLLM at scale with high reliability, high throughput, and low latency.
Code: https://github.com/vllm-project/production-stack

At a high level, the vLLM Production Stack project allows users to easily deploy to their Kubernetes cluster through a single command. vLLM Production Stack's optimizations include KV cache sharing to speed up inference (https://github.com/LMCache/LMCache), prefix-aware routing that directs inference queries to vLLM instances holding the corresponding KV caches, and robust observability features for monitoring engine status and autoscaling.

Attendees will discover best practices and see real-time demonstrations of how these optimizations work together to enhance LLM inference performance.
Speakers
avatar for Junchen Jiang

Junchen Jiang

Assistant Professor, University of Chicago
Junchen Jiang is an Assistant Professor of CS at the University of Chicago. His research pioneers new approaches to LLM inference systems (https://github.com/vllm-project/production-stack and https://github.com/LMCache/LMCache). He received his Ph.D. from CMU in 2017 and his bachelor’s... Read More →
avatar for Yue Zhu

Yue Zhu

Staff Research Scientist, IBM Research
Yue Zhu is a Staff Research Scientist specializing in foundation model systems and distributed storage systems. Yue obtained a Ph.D. in Computer Science from Florida State University in 2021 and has consistently contribute to sustainability for foundation models and scalable and efficient... Read More →
Wednesday June 25, 2025 4:20pm - 5:00pm MDT
Bluebird Ballroom 3E
  Open AI + Data
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Audience Experience Level
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.