Self-Hosted AI That's Actually Useful

Рет қаралды 89,033

Techno Tim

Жүктеу

Пікірлер: 174

@TechnoTim
Ай бұрын
Hey everyone! Thanks for watching and asking for the tutorial! I've just posted it on my new channel! Enjoy! kzitem.info/news/bejne/2qWwy2V_sZZzl4I
@JoeyDee86
Ай бұрын
Are you going to release any how to’s for this? Preferably with you explaining what each step does rather than just going down a list of steps
@TechnoTim
Ай бұрын
Yes, coming soon on my Techno Tim Tinkers channel! Subscribe there to know when it's available!
@traxeonic3600
Ай бұрын
@@TechnoTim I'm surprised it will end up on Tinkers, given these videos would seem to hit your core main channel. Interesting.
@iamkiber
Ай бұрын
If not for reading this comment I would never have know about tinkers
@espressomatic
Ай бұрын
@@TechnoTim Any chance that you might post your AI rig's hardware composition in this description before finishing up with the more detailed video on the other channel?
@TechnoTim
Ай бұрын
I can understand that but this tutorial might be close to 40 minutes long (or longer) 😅. Videos that long do not perform well and ultimately hurt the channel.
@prathamshenoy9840
Ай бұрын
One of the most useful tech videos of this year. Unlike some other channels that post so many videos but 95% of them are useless
@VillSid
Ай бұрын
There are two features that I am especially looking forward too: a) Video text search: I have security cameras that are using Frigate NVR that is using AI for image recondition to trigger if a person entered and are; audio AI model that listens for fire alarm or breaking glass but they are working on implementing text search for video clips, so you could search for clips with a guy in red jacket. b) Local audio transcription*: Tested whisper large models for transcribing non English call recording and it works but it is sloooow. I ran out of time on google collaboratory. I saw that there is optimized whisper version that I can run on google coral locally without a GPU, so I still need to test that one out. I would love to be able to search my calls.
@2ndtlmining
Ай бұрын
Man this looks interesting, you gotta show us how you set this all up
@showmequick2245
Ай бұрын
I second this
@droneforfun5384
Ай бұрын
I approve this
@questionablecommands9423
Ай бұрын
I'm all about self-hosting these technologies. Ever since DALL-E hit the scenes, I've been thinking that artists should train a model on their own art so if they get creatively stuck, they can, "ask themselves" for inspiration.
@TechnoTim
Ай бұрын
That's an awesome idea. I really wish I knew more about training. Maybe soon!
@Breeegz
Ай бұрын
@@TechnoTim If you can train a dog... actually that's nothing like training an A.I.
@TechnoTim
Ай бұрын
On top of that one of my dogs still bites me!
@nicoschroder8692
Ай бұрын
Great video as always :) Would love to see a video about the hardware setup & requirements and some guidelines for which models to choose for different hardware configs
@llortaton2834
Ай бұрын
glasses off? it's about to get serious!
@raul17533
Ай бұрын
Yeah. Nice try Tim AI
@TheMrDrMs
Ай бұрын
9:51 nah, I work for a company where we've been doing this since before "AI" was mainstream and the e2e models have not only helped accuracy but improved performance, even with our CPU workloads. It's been incredible to be working on this and seeing the sudden rapid development.
@Drkayb
Ай бұрын
You can also set up piper as a server, and just feed it text by curl (local or remote). Then it generates audio-files super quick. It can also be piped to stdout iirc if you don't need the files.
@TechnoTim
Ай бұрын
Thank you! I will look into how to connect this to HASS!
@Drkayb
Ай бұрын
@@TechnoTim I think the problem was that is just a ton of overhead every time you run the executable file, so by keeping a server running the .exe is "running" all the time.
@andrewbennett5733
Ай бұрын
I've watched a few videos on people setting up AI like this, but this just has the perfect blend of information AND instruction. Your 230K follows should be more like 2.30M. Thanks for sharing so much good stuff!
@TechnoTim
Ай бұрын
Thank you so much! If you can believe it, it's actually more difficult to say less. I had to constantly remind myself to not ramble or go on side quests 😅. Thanks for noticing and a full tutorial will be coming soon on my other channel, @technotimtinkers
@andrewbennett5733
Ай бұрын
@@TechnoTim I get that! I used to be an educator and it's hard not to tell everyone you meet all of the facts you know, especially when it's stuff that excites you. For the record I would happily listen to all of the side quests haha. And how did I not know about your other channel??? HERE I GO
@TechnoTim
Ай бұрын
@@andrewbennett5733 Sometimes side quests are more fun than the main quest!
@andrewbennett5733
Ай бұрын
I need you to go the @JeffGeerling route and start a third channel for side quests 🤣
@TechnoTim
Ай бұрын
That's what Techno Tim Tinkers is for ;)
@Nextrix
Ай бұрын
I wonder how well these tools work in an offline or no-internet VLAN. Most still tend to connect to third party domains/servers, and we have no clue what data is being sent when it does. I'm not ready to trust these yet. Would make a good video to showcase the endpoints they do try and connect to.
@ewenchan1239
Ай бұрын
I've played with Ollama, the open-webui, a different open-webui, and Automatic1111. One of the models ended up needing about 40 GB of VRAM, so I had to use two 3090s to be able to have enough VRAM for the model. Pretty nifty though. Not perfect, but still fun to play with.
@Nastalas
Ай бұрын
There is a HACS version of ollama support where you already can control your devices with it in Home Assistant
@jwr6796
Ай бұрын
What are the gpu requirements for all this? Are we talking a recent-enough gaming gpu like a 3060, or do you have to shell out for those enterprise cards with no video output?
@TechnoTim
Ай бұрын
3060 should work fine. Smaller models should fit fine!
@jwr6796
Ай бұрын
@@TechnoTim good to hear!
@Rodent007
Ай бұрын
Thank you, great video. I wish you would run through what hardware you run this on.
@TechnoTim
Ай бұрын
Thanks for the feedback. I have a video on it, it's my new All in One HomeLab server. More to come!
@CharafEddineCHERAA
Ай бұрын
For anyone who's using Ollama, what's the minimum hardware needed to run a 70b model?
@krurschak2653
Ай бұрын
I would say RTX 4090 but with poor performance experience. For GPT Like experience you will need something like 4x RTX 4090. But than you could deploy Mixtral 8x7B which is a GPT-4 class LLM with good Performance and Context Window.
@antaishizuku
Ай бұрын
Id say 2 4090s or a 4090 plus another nvidia card. Like a 4060 or 3060. You will need about 40gb of vram for decent quantization but if you are willing to give up decent responses go for about 30-ish. Just keep clear of the 2k quantization. The 3k is okw with 4k being a standard. 8k/q is about the same as the full float 16 model but need huge amounts of vram. Anyway more vram/cuda = better
@antaishizuku
Ай бұрын
Phi3 14b 128k is really good and i heard good things about gemma 2 27b. Though overall im still a fan of llama3
@brandonmansfield4328
Ай бұрын
It varies since you can adjust the quantization for fit. For the big models (70b) I would suggest > 40GB if you can swing it. >70 GB if you want to run 120b models. A pair of p40s off eBay isn't too bad to buy. Probably the best budget path presently.
Ай бұрын
This is 12 minutes of pure gold, thank you very much. 😊
@spaceco1
Ай бұрын
Awesome video. Would love to see a follow up video where you go over the hardware for inferencing these models. And what kind of performance changes you noticed when playing around with different components
@wyattarich
Ай бұрын
This seems to be covered in many other places, and it's almost entirely subject to the models you run. Hard to generalize such a thing. Google for Llama.cpp benchmarks and INT8 performance for GPUs.
@abudi45
Ай бұрын
It was a great video but U didn't show us how we can install it in our home lab 😢
@jgarfield
Ай бұрын
We need info on the hardware setup! Like are Nvidia GPUs the only option or can we use NPUs in the newer Intel processors?
@brandonmansfield4328
Ай бұрын
NPU performance is going to be bound by memory bandwidth performance and ddr5 isn't where you want to be. The soldered lpddr5x is going to have much better memory bandwidth and will be when these chips start to get some reasonable performance. Lunar lake and Zen 5 should both come in this configuration at some point.
@jason-budney7624
Ай бұрын
Really cool video TIm! I've been wanting to play with some image to image "AI" stuff, but it's been hard to find much about it when self hosting is involved. I'll be poking around with the tools you mentioned to see if I can find something.
@Unselfless
Ай бұрын
What hardware are you using to run this?
@Mishanw
Ай бұрын
What kind of GPU are using ? I have a Dell R730, I wanted to try to put a GPU on that and run Ollama . I reallly wish there was a low power AI processor that we could plug into any device with sufficient RAM and be able to run models effectively and efficiently at a relatively affordable cost
@Solus-Regnator
Ай бұрын
this teaser was nice, where is the setup video ? :D
@macthiswork3006
Ай бұрын
what is the project called that you use for the whisper webui?
@FreedomToRoam86
Ай бұрын
Very cool idea, the private search AI!
@l0gic23
Ай бұрын
Been waiting for this one. Let's go!
@BCKammen
Ай бұрын
Ok, Tim, where is the guide for how to set this all up ? Especially the Home Assistant stuff....
@TechnoTim
Ай бұрын
Soon on my Techno Tim Tinkers channel!
@BCKammen
Ай бұрын
@@TechnoTim Standing by then......
@xythiera7255
Ай бұрын
If you dont have a realy realy powerfill gpu its not realy possibel in turms of usability if you have to wait ages for something to happen its kind of pointless
@TechnoTim
Ай бұрын
@xythiera7255 It really depends on the GPU, I will cover this in my tutorial!
@krurschak2653
Ай бұрын
@@xythiera7255 4090 is enough for llama3 8B. 4x 4090 or one A100 will work for the 70b version or even for Mistral 8x7b nearly as good as GPT-4 and super fast :) but phi-3 and llama3 8B are really not that bad. They are better than GPT-3.5, so i see this as a good starting point. I would recommend waiting for new hardware like llm specific GPUs because they can be much cheaper like 1/4 of the price.
@jacobnollette85
Ай бұрын
that dances with wolves earned my thumb
@tohur
Ай бұрын
I have pretty much been running local AI from the onset of all the opensource models and have ran plenty of backends and now am on ollama and plan to stick with it as its the fastest backend I have ran out of all of them.. and on Linux so easy to run the models on AMD OR Nvidia.. run 7b-13b models on my little ol RX 6600 XT with Rocm and tbh it runs great and also IMO running locally 7b-13b bout all anyone needs just have specialty models on the ready for different tasks which ollama makes that easy af haha.. best feature to me with ollama is having it setup to auto unload models when not in use
@lakshaynz
27 күн бұрын
Thank you 😊
@alexjohansson328
Ай бұрын
Super awesome video - unique cutting edge I can't wait to give it a go
@angryox3102
Ай бұрын
You’ve just given me so many ideas. This is awesome.
@raymondx137
Ай бұрын
Do you have a part list and or setup tutorial?
@Squirrel4Gir
Ай бұрын
Love the vid. Please also try to include a notice to help these free models either via training or donations to accelerate their further development
@knutblaise9437
Ай бұрын
Curious if there is a self-hosted AI which could serve as a replacement for Grammarly? I recently noticed my Office 2016 had a new AI process running. From a privacy perspective I'd prefer not sharing my documents with organizations like MS/Google/Grammarly.
@brandonmansfield4328
Ай бұрын
You don't need a full ai for grammar. Language tool is self hostable and they have browser extensions you can configure for your local copy.
@The_Mup
Ай бұрын
1:45 - Third option: Let surf shark snoop on you. VPN providers are no more trustworthy than your mobile ISP. VPNs are for getting around region blocks, NOT for privacy.
@TechnoTim
Ай бұрын
They both have data logging, selling, sharing, and trading policies... ISP is to do it ...VPNs like this is to not.
@user-ic6xf
Ай бұрын
I was so ready for you to do a video on this.
@WMRamadan
Ай бұрын
I tried this a while back with an nVidia 3060 RTX 12GB and of course bigger models wouldn't load. Would using two GPU's help load bigger models giving a combined memory of 24GB? Also do you know if mixing GPU's works, for example having a 3060 12GB with a 4060 16GB to give a combined 28GB?
@xythiera7255
Ай бұрын
If you dont have a 3090 at least you are realy limited . Yes that exist but you coud also just buy a workstation card means insane costs . So if you realy want to play with AI you need a 4090 becouse of the Vram its the only real option other then going with a NVIDIA RTX 6000 for 6 grand and 48gb Vram
@WMRamadan
Ай бұрын
@@xythiera7255 I'm going for the cheapest option, If I can buy two 4060 16GB to have a combined 32GB of GPU memory then I will do that!
@Techonsapevole
Ай бұрын
well done, local LLMs are the future
@_coderizon
26 күн бұрын
what is the Difference between Ollama with WebUI and LangChain for NLP tasks
@DorZ1983
Ай бұрын
What is the UI that shows the app stack flow? Is it an actual app or just after effect?
@Squirrel4Gir
Ай бұрын
Gonna need a video of whisper. Also any chance it can be integrated into Plex drafting subtitles
@santiago69
Ай бұрын
Hello what is the name of the open source web based version of whisper that is mentioned please?
@vaidkun
Ай бұрын
the thing with AI is that even if you are running it locally you need to get the training data from somewhere, so someone still has to give up their privacy :)
@TechnoTim
Ай бұрын
touché
@Arthur-o2y
Күн бұрын
which rack is that? at 0:44
@koevoet7288
Ай бұрын
You can run homeassistant faster whisper on gpu, ive been doing it for months. I’ve got a dockerfile for this, lmk if you want it
@TechnoTim
Ай бұрын
Thank you! I found a forked version of wyoming whisper but it didn't seem to help. I figured I'd wait for the official one to get updated.
@koevoet7288
Ай бұрын
@@TechnoTim I’m also using someones fork, don’t remember if i changed it in any way but its running perfectly on my quadro p2000
@tchesnokovn
Ай бұрын
What’s the nocode workflow looking thing you are using?
@OvernightSuccess721
Ай бұрын
This is Tim’s evil twin brother NoTechTim. Insert Travolta meme looking for the tech.
@TechnoTim
Ай бұрын
TechNOTim 😂
@abhijithabhi58
Ай бұрын
What are the hardware requirements ?
@eliaskallelindholm8339
Ай бұрын
This is the first time I have done something Techno Tim is showing before he did show it :D
@TechnoTim
Ай бұрын
Ha! It took a while for me to build, integrate, and actually evaluate all of these systems!
@eliaskallelindholm8339
Ай бұрын
Did you try the 70B model from Llama? (because I saw you also used the 8B model only) I read some stuff about this with 2 rtx 4070 or an Ada 6000 but I sadly dont have the hardware to run that purely on Grafic cards yet. The results should even be better than the payed ChatGPT stuff.
@eliaskallelindholm8339
Ай бұрын
RTX 4090 with 24GB VRam I mean.
@droneforfun5384
Ай бұрын
Just subsribed for the upcoming guides on local Ai 😃🥰😎
@TechnoTim
Ай бұрын
@@droneforfun5384 soon!!!
@ivlis32
Ай бұрын
HA Voice integration is, unfortunately, very strange. They insist on using HA "add-ons" for voice what I really don't want because I do not use HAOS, but deploy HA as any other container.
@dragonhunter2475
Ай бұрын
The addons are just docker containers, you can find them in the rhasspy git repo
@sree_nath
Ай бұрын
Love your videos, even though there are plenty of how to videos on these topics, I would love to hear it with your mesmerizing voice 😊
@TechnoTim
Ай бұрын
🥰. thank you! Audio in this old wooden / plaster room is hard, so hopefully it sounds ok!
@gemargordon6885
Ай бұрын
I’m loving Gemini for sure! It’s a bit better than llama or ChatGPT.
@TheJoaolyraaraujo
Ай бұрын
Mac Whisper is amazing
@TechnoTim
Ай бұрын
100% agree! I bought it for better models and they work even better for scripted talks (like this). It's so accurate!
@TeambitDK
Ай бұрын
This was really interresting, now I want to build it :D
@coletraintechgames2932
Ай бұрын
Im ready for the how to! I have messed with it and have something running,but these features look awesome!
@TechnoTim
Ай бұрын
Soon on my other channel!
@huseinnashr
Ай бұрын
You have other channel?
@FatalSkeptic
Ай бұрын
haven't been able to get Home Assistant to give me any data back from AI agents, so frustrating
@djstraylight
Ай бұрын
I see a future video of you building a dedicated AI server with multiple GPUs and benchmarking the tokens per second depending on the setup. It would get many views from r/LocalLLM or r/LocalLLaMA groups for sure.
@TechnoTim
Ай бұрын
Thanks! Sounds awesome! I am always hesitant to share my content on subreddits other than my own, but if you feel this is worthy of it feel free to!
@showmequick2245
Ай бұрын
Nice, welcome to Minnesota btw 😂
@SyedZainUlHasan
Ай бұрын
What are the system spec?
@truckerallikatuk
Ай бұрын
Why do so many services go with such odd names? Like Sear XNG, which is how I'd pronounce it,, not search NG. That's how it's written after all.
@benhillard919
Ай бұрын
I think in the area it comes out of the "x" makes a "ch" sound.
@TechnoTim
Ай бұрын
@@benhillard919 I think so too, and I totally guessed so I hope that's how it's pronounced! Also, now that I see it again, it might be "searching". 🤣
@itaco8066
Ай бұрын
Great video! ❤
@dhmybiker5034
Ай бұрын
How to define a graphics card on Docker in Ubuntu
@Rohinthas
Ай бұрын
I am generally skeptical of the AI hype but your way of going about it has piqued my interest. Hope more in-depth guides on setup and hardware are coming, subscribed ;)
@Rohinthas
Ай бұрын
Ah I just found your homelab video! That answers some questions!
@HenryBiglin
Ай бұрын
Damn, you just sent me down a rabbit hole.. lol
@dcoidua
Ай бұрын
Would this all run well on a 4090?
@brandonmansfield4328
Ай бұрын
The bigger models need more vram than a single 4090 provides. You can run the smaller models just fine. You will lose out on some performance the bigger models provide but it runs!
@famousartguymeme
Ай бұрын
this is awesome!
@Mrtrunks
Ай бұрын
Glass off so we don’t see that DeskPi
@voodoochild420ai
Ай бұрын
nice vid
@OGH3294
Ай бұрын
Can I do these things with a 4060 TI 16Gb version ?
@TechnoTim
Ай бұрын
Yes, just use smaller models.
@xythiera7255
Ай бұрын
you can but it will be realy slow
@OGH3294
Ай бұрын
Ok. Plan dropped . I will just keep watching TechnoTim 😁.
@jensodotnet
Ай бұрын
I currently run two 1070 (8gb), while a little slow it works fine, but for image generation you would need more vram, 8b llm models works fine on single 8gb vram. A 3090 is much faster and does images very well and can run larger models. imho integrating search had bigger impact than using a larger model of the same type(not tested 70b)
@Act1veSp1n
Ай бұрын
YEESS!!!!
@hamdibougattaya
Ай бұрын
That's awesome, I like ur vids...
@llortaton2834
Ай бұрын
hi!
@TheRowie75
Ай бұрын
Surfshark privacy??? Open Source?
@yewbacca
Ай бұрын
What happened to Tim? Who is this imposter?
@TechnoTim
Ай бұрын
🤓
@romayojr
Ай бұрын
you forgot to mention the script for this video was made by AI 🤖
@TechnoTim
Ай бұрын
Ha! Nope, 100% me! Bad grammar, bad jokes, stutters were all compliments of HI (Human Intelligence)
@romayojr
Ай бұрын
@@TechnoTim i love AI but HI will always win my heart. but seriously, thanks for this video, i've been waiting for this one. now i need to integrate more stuff to my open webui!
@alexey_sychev
Ай бұрын
Sure, electricity is free nowadays
@TechnoTim
Ай бұрын
It uses a lot less power than a gaming machine since you only use it in spurts, nothing new here, just shifting the workload that's using the card.
@mavis-io
Ай бұрын
What hardware is used for this AI heaven?
@tsmot911
Ай бұрын
When AI can write an OS it will have arrived.
@TechnoTim
Ай бұрын
The singularity!
@Tr1pke
Ай бұрын
Surfshark has no logging policy, yea right. A VPN seller with no logging policy will never exist. Don’t lie we like you to much
@Kaleb-lf8kf
Ай бұрын
lol
@CYYB3RMISTER
Ай бұрын
You're a One Piece fan?
@ABTcorp
Ай бұрын
😀😀🥰🥰🥰🥰
@realneighborhoodP
Ай бұрын
My name is Inigo Montoya.
@GamingPenguinEnthusiast
Ай бұрын
AI models pre-trained by Meta, Google or Microsoft. Put a leftist inside your PC 😂 no thanks
@xythiera7255
Ай бұрын
Impressive you managed to throw out your brain and made it about politc
@wire9486
Ай бұрын
Leftist keyboard warrior above ☝️
@dhmybiker5034
Ай бұрын
Please use audio dubbing from English to Arabic in your videos
@pkt1213
Ай бұрын
Can I put this into proxmox? "Proxmox, spin up LXC container for Plex and pass my gpu through from hardware encoding."
@EvgenMo1111
Ай бұрын
да, в lxc контейнере с GPU работает без проблем, только настроить сетевой адрес