Loading…
June 23 - 25, 2025
Denver, Colorado
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for Open Source Summit North America 2025 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Mountain Daylight Time (UTC/GMT -6). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."

IMPORTANT NOTE: Timing of sessions and room locations are subject to change.

Venue: Bluebird Ballroom 3E clear filter
arrow_back View All Dates
Wednesday, June 25
 

11:00am MDT

Guarding the LLM Galaxy: Security, Privacy, and Guardrails in the AI Era - Jigyasa Grover, BORDO AI & Rishabh Misra, Attentive Mobile Inc
Wednesday June 25, 2025 11:00am - 11:40am MDT
The widespread adoption of Large Language Models (LLMs) like GPT-4, Claude, and Gemini has introduced unprecedented capabilities and equally unprecedented risks. Organizations are increasingly deploying LLMs to handle sensitive tasks, from processing medical records to analyzing financial documents. This talk examines the evolving landscape of LLM security and privacy, combining theoretical foundations with a walkthrough of example implementations.

Through real-world case studies of both attacks and defenses and practical implementation guidance using popular security tools, we'll explore critical vulnerabilities and proven defensive techniques. Special attention will be given to securing fine-tuned and domain-specific LLMs, with live examples using NVIDIA’s NeMo Guardrails, LangChain's security tools, and Microsoft's guidance library.
Speakers
avatar for Jigyasa Grover

Jigyasa Grover

Lead, AI & Research, BORDO AI
10-time award winner in Artificial Intelligence and Open Source and the co-author of the book 'Sculpting Data For ML', Jigyasa Grover is a powerhouse brimming with passion to make a dent in this world of technology and bridge the gaps. AI & Research Lead, she has years of ML engineering... Read More →
avatar for Rishabh Misra

Rishabh Misra

Lead Machine Learning Engineer, Attentive Mobile Inc
Author of the book "Sculpting Data for ML", I am a Lead ML Engineer & Researcher recognized by the US Government for outstanding contribution to ML research. I have extensively published and reviewed research at top AI conferences in NLP (LLMs / GenAI), Deep Learning, and Applied... Read More →
Wednesday June 25, 2025 11:00am - 11:40am MDT
Bluebird Ballroom 3E
  Open AI + Data

11:55am MDT

Fast Inference, Furious Scaling: Leveraging VLLM With KServe - Rafael Vasquez, IBM
Wednesday June 25, 2025 11:55am - 12:35pm MDT
In this talk, we will introduce two open-source projects vLLM and KServe and explain how they can be integrated to leverage better performance and scalability for LLMs in production. The session will include a demo showcasing their integration.

vLLM is a high-performance library specifically designed for LLM inference and serving, offering cutting-edge throughput and efficiency through techniques such as PagedAttention, continuous batching, and optimized CUDA kernels, making it ideal for production environments that demand fast, large-scale LLM serving.

KServe is a Kubernetes-based platform designed for scalable model deployment. It provides robust features for managing AI models in production, including autoscaling, monitoring, and model versioning.

By combining vLLM's inference optimizations with KServe's scalability, organizations can deploy LLMs effectively in production environments, ensuring fast, low-latency inference and seamless scaling across cloud platforms.
Speakers
avatar for Rafael Vasquez

Rafael Vasquez

Open Source Software Developer, IBM
Rafael Vasquez is a software developer on the Open Technology team at IBM. He previously completed an MASc. working on self-driving car research and transitioned from a data scientist role in the retail field to his current role where he continues to grow his passion for MLOps and... Read More →
Wednesday June 25, 2025 11:55am - 12:35pm MDT
Bluebird Ballroom 3E
  Open AI + Data

2:10pm MDT

Building Your (Local) LLM Second Brain - Olivia Buzek, IBM
Wednesday June 25, 2025 2:10pm - 2:50pm MDT
LLMs are hotter than ever, but most LLM-based solutions available to us require you to use models trained on data with unknown provenance, send your most important data off to corporate-controlled servers, and use prodigious amounts of energy every time you write an email.

What if you could design a “second brain” assistant with OSS technologies, that lives on your laptop?

We’ll walk through the OSS landscape, discussing the nuts and bolts of combining Ollama, LangChain, OpenWebUI, Autogen and Granite models to build a fully local LLM assistant. We’ll also discuss some of the particular complexities involved when your solution involves a local quantized model vs one that’s cloud-hosted.

In this talk, we'll build on the lightning talk to include complexities like:
* how much latency are you dealing with when you're running on a laptop?
* does degradation from working with a 7-8b model reduce effectiveness?
* how do reasoning + multimodal abilities help the assistant task?
Speakers
avatar for Olivia Buzek

Olivia Buzek

STSM watsonx.ai - IBM Research, IBM
Olivia has been building machine learning and natural language processing models since before it was cool. She's spent several years at IBM working on opening up Watson tech, around the country and around the world.
Wednesday June 25, 2025 2:10pm - 2:50pm MDT
Bluebird Ballroom 3E
  Open AI + Data

3:05pm MDT

AI Pipelines With OPEA: Best Practices for Cloud Native ML Operations - Ezequiel Lanza, Intel & Melissa McKay, JFrog
Wednesday June 25, 2025 3:05pm - 3:45pm MDT
The Open Platform for Enterprise AI (OPEA) is an open source project intended to assist organizations with the realities of enterprise-grade deployments of GenAI apps. Beginning from scratch is a costly endeavor, and the ability to quickly iterate on a solution and determine its viability for your organization is essential to ensure you are making the best moves forward.

During this session, Ezequiel and Melissa will introduce you to the OPEA platform and how to empower your team to build, deploy, and manage AI pipelines more effectively. Attendees will gain insights into best practices for handling complex AI/ML workloads, automating dependency management, and integrating Kubernetes for efficient resource utilization. With a focus on real-world applications, this talk not only showcases the transformative potential of these tools but also encourages attendees to explore new ways to contribute, innovate, and collaborate in driving the future of AI adoption in enterprise environments.
Speakers
avatar for Ezequiel Lanza

Ezequiel Lanza

Open Source AI Evangelist, Intel
Passionate about helping people discover the exciting world of artificial intelligence, Ezequiel is a frequent AI conference presenter and the creator of use cases, tutorials, and guides that help developers adopt open source AI tools.
avatar for Melissa McKay

Melissa McKay

Head of Developer Relations, JFrog
Melissa is passionate about Java, DevOps and Continuous Delivery. She is currently Head of Developer Relations for JFrog and a member of the Technical Steering Committee of the Open Platform for Enterprise AI (OPEA). Melissa has been recognized as a Java Champion and a Docker Captain... Read More →
Wednesday June 25, 2025 3:05pm - 3:45pm MDT
Bluebird Ballroom 3E
  Open AI + Data

4:20pm MDT

Scalable and Efficient LLM Serving With the VLLM Production Stack - Junchen Jiang, University of Chicago & Yue Zhu, IBM Research
Wednesday June 25, 2025 4:20pm - 5:00pm MDT
Large Language Models (LLMs) are reshaping how we build applications; however, efficiently serving them at scale remains a major challenge.

The vLLM serving engine, historically focused on single-node deployments, is now being extended into a full-stack inference system through our open-source project, **vLLM Production Stack**. This extension enables any organization to deploy vLLM at scale with high reliability, high throughput, and low latency.
Code: https://github.com/vllm-project/production-stack

At a high level, the vLLM Production Stack project allows users to easily deploy to their Kubernetes cluster through a single command. vLLM Production Stack's optimizations include KV cache sharing to speed up inference (https://github.com/LMCache/LMCache), prefix-aware routing that directs inference queries to vLLM instances holding the corresponding KV caches, and robust observability features for monitoring engine status and autoscaling.

Attendees will discover best practices and see real-time demonstrations of how these optimizations work together to enhance LLM inference performance.
Speakers
avatar for Junchen Jiang

Junchen Jiang

Assistant Professor, University of Chicago
Junchen Jiang is an Assistant Professor of CS at the University of Chicago. His research pioneers new approaches to LLM inference systems (https://github.com/vllm-project/production-stack and https://github.com/LMCache/LMCache). He received his Ph.D. from CMU in 2017 and his bachelor’s... Read More →
avatar for Yue Zhu

Yue Zhu

Staff Research Scientist, IBM Research
Yue Zhu is a Staff Research Scientist specializing in foundation model systems and distributed storage systems. Yue obtained a Ph.D. in Computer Science from Florida State University in 2021 and has consistently contribute to sustainability for foundation models and scalable and efficient... Read More →
Wednesday June 25, 2025 4:20pm - 5:00pm MDT
Bluebird Ballroom 3E
  Open AI + Data
 
  • Filter By Date
  • Filter By Venue
  • Filter By Type
  • Audience Experience Level
  • Timezone

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date -