"380% Lower Latency." A percentage above 100% in this context is incorrect because latency cannot be reduced by more than 100%. A latency reduction of 100% is a latency of 0 ms.
@sanesanyo
Ай бұрын
AI is making KZitemrs dumber. That's the only explanation otherwise I don't know how this can happen.
@vitalis
Ай бұрын
We have just identified the non-LLM entity here... unless you are Grog 10 "watermelon"
@avidlearner8117
23 күн бұрын
Wow man, go out, do something, but this is a bad look 🤣
@thenoblerot
Ай бұрын
This changed my entire approach to a project. I wish they held the cache longer than 5 minutes! Even 10 or 15 would be nice, but how about an hour!? Love that it will cache images, too.
@epipolar4480
Ай бұрын
Script execution time is different to latency. Latency is effectively how long it takes to return the first token, and and then the time to the next token and so on. This will always be a very low number, with or without caching, so for short responses such as in your second example the script execution will always be fast. For longer responses such as book summary the latency makes a difference as it accumulates for each token, and there are many more tokens. I didn't look at your code and haven't used the anthropic API, but I guess you weren't streaming the tokens so you couldn't actually measure the latency by this method. Still really appreciate the video as I was curious about this caching and this explained a lot to me, thank you!
@Quitcool
Ай бұрын
The part you didn't understand happened because the meaning of you added sentence was far away in the terms of the book understanding tokens, so it got it pretty easily
@pioggiadifuoco7522
Ай бұрын
It would be very useful if you showed us the cost of your entire testing, thanks.
@modoulaminceesay9211
Ай бұрын
Your videos being very helpful . I don’t know when do we get the desktop app
@DitDev-o9t
26 күн бұрын
It would be really nice to see how you talk to a book that wasn't in the training dataset (I am pretty sure that Harry Potter was there).
@ramp2011
27 күн бұрын
Thank you for the video. Second time you are asking a question, I am curious why you are passing the context again in the system prompt if its already cached? Can we just ask the question without calling the system prompt to send the context?
@rcj1337
Ай бұрын
How is this replacing RAG?
@ahtoshkaa
Ай бұрын
Example: My AI companion uses facts about myself when answering. 5 facts are pulled based on the average vector of the latest input. this is done after each message. But I can dump all in facts into cache and forgo this system entirely. will it be better? damn if I know. probably? requires a lot of testing. it would be awesome if anthropic wasn't this censored. I'm not sure I can even use their models in my companion without it getting triggered. But it's definitely not a replacement for RAG... something different, but really cool
@radu-mirceasirbu2159
23 күн бұрын
Do you write the code yourself or you generate it with LLM?
@kamelirzouni
Ай бұрын
Thanks, Kris!
@PunitaOjha01
18 күн бұрын
We are using anthropic claude v3.5 sonnet on amazon bedrock. Since the prompt caching feature is in beta, I wanted to clarify if it is available for bedrock. I tried reaching out to Anthropic support for the same but could not get through. It will be great if someone could answer this for me?
@micbab-vg2mu
Ай бұрын
thanks for update:)
@StayPolishThinkEnglish
Ай бұрын
Sorry for posting a question randomly, but do you have any tutorial for ai voicebot for discord ?
@JNET_Reloaded
Ай бұрын
wheres link to code you used?
@perschistence2651
Ай бұрын
I do not understand why they not just cache everything that changed when you set this flag... Why the cache points?
@newfrontiers5673
Ай бұрын
Interesting but not a replacement for rag I dont think.
@j0hnc0nn0r-sec
Ай бұрын
I’m thinking of trying the Anthropic cache will a local pgvector store or neo4j. Might make things better… or weird. Kris could do it better. Is this a good idea?
@j0hnc0nn0r-sec
Ай бұрын
You can cache the “500 page book” context you find in Claude projects, btw
@hemanthkumar-tj4hs
Ай бұрын
What if I ask another question after caching the entire book?
@JNET_Reloaded
Ай бұрын
also i recommend putting timing in the script!
@luisfelipe6368
Ай бұрын
Nice, but still expensive, $15 per MTok output is rough. Hopefully we will see this decrease in the future, specially since OpenAI probably has something similar on the works.
@ahtoshkaa
Ай бұрын
Damn, I completely forgot about Google's caching. Looked at the prices. It seems like Google caching is 4 times cheaper than normal. In contrast Anthropic is 10 times cheaper BUT it's more expensive to create Cache by 25%... So I have no idea what the math is here someone help me out.
@ginocote
Ай бұрын
5 minutes is very short, worse when you are programming. You can easly have more than 5 min. between 2 prompts. It shoud be 30 minimum, hope they will update this.
@Alex29196
Ай бұрын
5 minutes is very short
@norlesh
Ай бұрын
FYI Google Gemini has been doing prompt caching for some time now.
@Solo2121
Ай бұрын
He mentions that at 13:33
@ronaldronald8819
28 күн бұрын
Claude is getting stupid (its being quantized) To bad.
@yellowboat8773
Ай бұрын
Who actually uses rag? I've found it so unreliable
@DESX312
Ай бұрын
It's as good as your implementation of it is. Use crappy embedding models and crappy text organization, and get crappy output. The inverse is true as well.
Пікірлер: 35