- added support for XTTSv2 and wav streaming.
- added a lips movement from the video via wаv2liр-streaming.
- reduced latency.
- English, Russian and other languages.
- support for multiple characters.
- stopping generation when speech is detected.
- commands: Google, stop, regenerate, delete everything, call.
Under the hood
- STT: whisper.cpp medium
- LLM: Mistral-7B-v0.2-Q5_0.gguf
- TTS: XTTSv2 wav-streaming
- lips: wаv2liр streaming
- Google: langchain google-serp
I had to add distortion to this video, so it won't be considered as impersonation.
Runs on 3060 12 GB,
Nvidia 8 GB is also ok with some tweaks.
"Talking heads" are also working with Silly tavern. Final delay from voice command to video response is just 1.5 seconds!
Code, exe, manual:
github.com/Mozer/talk-llama-fast
reddit.com/u/tensorbanana2
t.me/tensorbanana
Негізгі бет talk-llama-fast v0.1.3 - informal video assistant [en]
Пікірлер: 54