In this demo, explore how to leverage GPU workload profiles in ACA to run your own model backend, and easily switch, compare, and speed up your inference times. You will also explore how to leverage LlamaIndex to ingest data on-demand, and host models using Ollama. Then finally, decompose the application as a set of microservices written in Python, deployed on ACA.
#microsoftreactor #multillm #llms #azurecontainerapps #azure #chatapp
[eventID:22137]
Негізгі бет Ғылым және технология Build a multi-LLM chat application with Azure Container Apps
Пікірлер: 1