Load balancers are a staple of scalable, high-throughput, high-availability architectures. They work great to scale web services. When requests take longer, though, things get complicated. Requests can pile up on some backends; bursts of traffic can send the latency through the roof; and when autoscaling kicks in, it might be too late and/or too expensive.
Asynchronous architectures and message queues can help a lot here combined with event-driven autoscaling.
We're going to see how to implement that pattern on Kubernetes, leveraging:
- A popular LLM to generate thousands of completions;
- RabbitMQ and PostgreSQL to store requests and responses;
- Bento to implement API servers, producers, and consumers without writing code;
- Prometheus, Grafana, and KEDA for observability, dashboard, and autoscaling;
- Helm and Helmfile to automate deployment as much as possible.
***********************************************************
PerfectScale makes it easy for DevOps and SRE professionals to govern, right-size and scale Kubernetes to continually meet customer demand.
By comparing overtime usage patterns with resource configurations we provide actionable recommendations that improve performance while eliminating wasted compute resources up to 60%.
Get the data-driven intelligence needed to ensure peak Kubernetes performance at the lowest possible cost with PerfectScale.
👉 Start your free trial today: www.perfectsca...
👉 Book a demo and let's talk: www.perfectsca...
***********************************************************
► PerfectScale is platform agnostic, supporting EKS, EKS Anywhere, GKE, AKS, KOPS, and other Kubernetes distributions.
► Trusted globally by DevOps, SRE, and Platform Engineering teams at leading companies like Rapyd and Paramount Pictures.
#kubernetes #k8s #devops #sre #EKS #platformengineering #AKS #GKE #genai
Негізгі бет [Webinar] Scaling Out GenAI with Message Queues on Kubernetes with Jérôme Petazzoni.
Пікірлер