SREcon21 - SRE for ML: The First 10 Years and the Next 10

SRE for ML: The First 10 Years and the Next 10
Todd Underwood, Google
Over 10 years ago we started building SRE for a large multi-model ML service at Google. We faced many interesting challenges including:
Defining scope: Why do these services need ML anyway?
Unclear SLOs: What are we measuring and how can we actually be responsible for those things?
Fuzzy demarcation with our modeling teams: What is a model quality problem caused by infrastructure vs a model quality problem caused by the model or the data?
With the explosion of ML training and serving platforms, the choices we faced are now confronting many SRE teams across the industry. I will review the history focusing on the decisions we made and why those made sense to us at the time and might make sense for others. And I'll try to answer the question of whether there is a real need for SRE for ML at all.
View the full SREcon21 program at www.usenix.org...

Жүктеу

SREcon19 Europe/Middle East/Africa - All of Our ML Ideas Are Bad (and We Should Feel Bad)

Possible End of Humanity from AI? Geoffrey Hinton at MIT Technology Review's EmTech Digital

Good teacher wows kids with practical examples #shorts

Watermelon magic box! #shorts by Leisi Crazy

Life hack 😂 Watermelon magic box! #shorts by Leisi Crazy

Worst flight ever

SREcon21 - Lessons Learned Using the Operator Pattern to Build a Kubernetes Platform

USENIX Security '21 - Nyx: Greybox Hypervisor Fuzzing using Fast Snapshots and Affine Types

Generative AI in a Nutshell - how to survive and thrive in the age of AI

CEO of Microsoft AI speaks about the future of artificial intelligence at Aspen Ideas Festival

Analytics 3.0: Big Data and Small Data in Big and Small Companies

What is generative AI and how does it work? - The Turing Lectures with Mirella Lapata

The Problem With Microservices

Andrew Ng On AI Agentic Workflows And Their Potential For Driving AI Progress

Advanced Work Packaging Data Requirements

How Data Engineering Works

Good teacher wows kids with practical examples #shorts

SREcon21 - SRE for ML: The First 10 Years and the Next 10

Пікірлер