A very good friend of mine did some recent work upgrading storage for the research division of a very large pharmaceutical corporation. Their security protocols were good, but also inflexible, creating motivation to work around restrictions that slowed the upgrade down to a near standstill. The financial incentives, combined with a sense of hubris resulted in several major potential risks of security being temporarily bypassed in ways that weren't fully auditable. If an insider was waiting for the moment when exfiltration of very expensive and proprietary data and software was possible, then they got their chance. Security is always in tension with getting work done and there's no such thing as perfect security.
@fxsrider
10 ай бұрын
Even on my level, typing my password every time I wake up my computer gets on my nerves. Encrypted files are fun as well. I have removed security numerous times only to swing the other way worrying about malware etc. This is on my personal PC. I worked for decades at an aerospace company that had sign in and log on requirements that were super annoying to repeat many times a day. Then I had to change my password all the time it seemed. Everyone had to do it every 3 months or so. To the point I had rolled the entire alphabet as the last character and was well into the upper case when i retired.
@mikebarushok5361
10 ай бұрын
@@fxsrider I know that same frustration with frequently having to change passwords at aerospace companies, having worked for a couple of them myself. It was an open secret at one of them that everyone left post-it notes with their most recent password under the keyboard.
@craigslist6988
10 ай бұрын
as an engineer I've never once seen a company that wasn't compromised by China. China has a lot of people trying and small US companies are such easy fodder. People act like best security practices simply existing somewhere makes the tech world safe... but if you graphed population vs competency of IT, it would look like wealth in the US - almost all of the high competency is in a very small number of people. The other 99% are abysmal. It's hard to be smart enough about security now, there are so many attack vectors and corporations see it as an expensive cost with a low risk high punishment, so they justify not paying for it. And tbh the amount of money to compete for those few people who are actually very competent might not be worth it to the company.
@michaelpoblete1415
10 ай бұрын
Llama 2 is now almost at the level of GPT-3.5, even without breaches, Llama 3 might be at the level of GPT4, in that case, isnce Llama series is open source, then the question of what would happen if GPT4 is stolen might become moot and academic since anyone can just download open source Llama which at some point in the near future might reach the level of GPT4.
@ebx100
10 ай бұрын
Well, Llama is only sort of open source. If you commercialize it, you pay.
@michaelpoblete1415
10 ай бұрын
@@ebx100 this video's topic is about the ramifications of GPT4 getting stolen. With a stolen model, you dont even have the option to pay for it, you go straight to jail.
@96nico1
10 ай бұрын
Yeha I had the same thought
@joaosturza
10 ай бұрын
@@ebx100 it doesn't prevent people from commercializing it covertly, to prove it would require you to prove a certain work was done by a specifc AI, something that we curretnly cannot
@emuevalrandomised9129
10 ай бұрын
Honestly, it would be a very curious idea to see how the model would behave in the absence of all the limiting systems.
@100c0c
10 ай бұрын
From what I've read, not as good as you'd assume. Just more erratic and wrong...
@quickknowledge4873
10 ай бұрын
@@100c0c mind sharing what you read specifically? Very interested in coming up with my own conclusion on this.
@amandahugankiss4110
10 ай бұрын
endless child porn that seems to be the goal of all of this
@nobafan7515
10 ай бұрын
@@100c0cwhat's weird is I've been hearing the main one is already making more errors from users inputting incorrect data.
@obsidianjane4413
10 ай бұрын
It will just do any dumb sht the meat puppets tell it to.
@nixietubes
10 ай бұрын
Commoncrawl doesnt provide data only for machine learning, it's for research of all sorts. And the 45TB number is inaccurate, the dataset is measured in PB
@Nik.leonard
10 ай бұрын
This already happened in the Image generation space when the NovelAI model got leaked from a badly secured Github account, downloaded and used as a (somewhat) foundational model for a lot of anime image generation models.
@asdkant
10 ай бұрын
Small correction: SSH is used for remotely operating (unix and linux) machines, for API and web traffic it's more common to use TLS (also called SSL coloquially , technically ssl is older)
@nexusyang4832
10 ай бұрын
Just a matter of time we see a "Folding at home" equivalent project that can train a single distributively and decentralized. Then it isn't about theft, but what can be done with such a tool....
@aniksamiurrahman6365
10 ай бұрын
For LLM to be truely embedded all around people's lives, it needs to be open sourced. There are many importatnt things can be done with GPT-4, like using it to automate corporate paperwork, to use it to aid peer review of scientific research, summerizing and investigating documents etc. What Microsoft is doing will never do these. The closed source nature also ensures that there can't be anything better than what they got, essentially inhibiting any proper growth and application.
@LimabeanStudios
10 ай бұрын
The effectiveness of generating training data off of existing public models has been really impressive. The open source community has been embracing it for obvious reasons to some real results. As of right now fine tuning off of generated data is where it's most used
@TheOwlGuy777
10 ай бұрын
I work next door to a movie studio. Our own IT department monitors all traffic in the area and there are multiple mobile piracy attempts a week.
@magfal
10 ай бұрын
0:44 I don't know how successful OpenAI would be in enforcing the proprietary nature of their model if it leaked. It's built upon mountains of stolen and misappropriated data after all.
@insom_anim
10 ай бұрын
I think the AI companies are probably more afraid of an open source competitor that makes all of these protections irrelevant. There's no need to steal something built on publicly accessible information with enough time and effort.
@Charles-Darwin
10 ай бұрын
I would think Quora is a massive source of conversational Q&A made available and contributes to the dataset - unfettered. Adam D’Angelo is a senior board member basically at both companies. Also, what OpenAI did with going live on such a simple interface was 100% stroke of genius. I firmly beleive this format allowed for not only training, but providing a very solid baseline of what humanity cares about OF the data set - else there is just way too much data to model on. This 2x bootstrapped a 'scope' to start from and trained errors out based on the acceptance of the result to a query. This is prob some secret sauce as to why they're able to iterate so fast. Its the end user.
@SalivatingSteve
10 ай бұрын
Exactly the project narrows the scope on its own as it trains out errors.
@aapje
10 ай бұрын
Quora is extremely low quality data, though, for the most part.
@cbuchner1
10 ай бұрын
A verbatim copy of those 1TB weights would not be valuable for very long as I am sure OpenAI are continually updating and refining it and I am sure they already have the next big thing in the pipeline. It would just be a momentary snapshot with a fixed knowledge cutoff
@joaosturza
10 ай бұрын
the training data, however, is so precious it would warrant a massive ransom, as its public release would see every IP holder suing the company, especially since in several jurisdictions you are required to protect you copyright against violations and not suing OpenAI might eventually be interpreted by a judge as not caring if your work appears in any AI
@moth.monster
10 ай бұрын
What people think large language models are: Skynet, HAL-9000 What large language models really are: Your keyboard's predictive text if it read the entirety of Reddit
@SalivatingSteve
10 ай бұрын
This x1000. The fear mongering over AI is way overblown. The models are useless without new human-created data to feed into the system. My CS professor pointed out that if people stop posting on Stackoverflow or Quora because now they’re using ChatGPT instead, then it will just regurgitate old info and get outdated very quickly. It turns into this weird bootstrap paradox feedback loop where “knowledge” effectively stagnates.
@guilhermealveslopes
9 ай бұрын
The entirety of Reddit plus some lots of other sources
@monad_tcp
10 ай бұрын
I would say that it happened would be overall a good thing. It's too much of a powerful thing to be in the hands of a few persons. I don't believe anyone has magical ethic to be able to decide or "protect" humanity from any bad outcome. Actually the other way around, in trying to do good, but without the input of the rest of humanity, they for sure are going to end up doing evil.
@AlexDubois
10 ай бұрын
Data at rest is only encrypted for the layers below the encryption process. If done by the OS, the client of the OS sees the data in clear. So which layer does the encryption is important. For encryption of data in use. Intel SGX is a very common way to secure cloud playloads, however an application vulnerability on the code running in SGX negate the security properties of SGX. This is why languages such as Rust should be used and the number of lines running inside the enclave needs to be limited as much as possible to limit the attack surface. A Man in the process for such enclave is very hard to detect.
@sangomasmith
10 ай бұрын
It is darkly hilarious to watch AI companies spend enormous effort and resources to to fend off the theft of their models, when the models themselves were build off of stolen and public-domain data.
@makisekurisu4674
10 ай бұрын
Hence stealing stolen goods is perfectly fair.
@relix3267
9 ай бұрын
not exactly
@vidal9747
9 ай бұрын
There is public in public-domain... You can argue it is wrong to train in non-public domain data.
@bbirda1287
10 ай бұрын
You have to remember he mentions state actors many times during the presentation, so a lot of the hardware / software / resource limitations for anonymous hackers don't really apply. State actors can easily have servers to store Petabytes of information and have multiple hi speed connections for download.
@aspzx
10 ай бұрын
I think the reason data has to be exfiltrated slowly is that it probably sits behind hardware that limits the speed of any outgoing network connection.
@SalivatingSteve
10 ай бұрын
@@aspzxit has to be done stealthily with lots of connections masked to look like normal traffic, because trying to download a massive amount of data to a single user would raise red flags.
@aleattorium
10 ай бұрын
9:30 - worth also researching Okta and Microsoft Azure hackings of their ticketing and supporting systems.
@dingodog5677
10 ай бұрын
If AI is based on what’s on the internet. It’s gonna be the dumbest thing around. Garbage in garbage out. It’ll probably become sentient and commit suicide from depression.
@AmericanDiscord
10 ай бұрын
The data is available and there are open source models with close to equivalent performance. The problem is the cost curve for more advanced queries. The leaders in AI will likely be determined by access to efficient hardware, not anything else. Worrying about protecting weights, while it shouldn't be ignored, is the wrong direction.
@SalivatingSteve
10 ай бұрын
This is why the USA has put restrictions on certain GPU & chip exports to China.
@AmericanDiscord
10 ай бұрын
@@SalivatingSteve I don't think improvements to current hardware architectures are going to get AI past the coming hardware wall. You are going to be looking at something different.
@florianhofmann7553
10 ай бұрын
So ChatGPT pulls all these answers out of only one TB of data? Sounds like the most efficient data compression we've ever created.
@tardonator
10 ай бұрын
its lossy
@Greyboy666
10 ай бұрын
1TB of /parameters/, working on 45TB of text. thats an absolutely staggering amount of information for what it can manage
@dtibor5903
10 ай бұрын
LLMs are not storing the training data like a database, it is remembering it more like humans. It is lossy, it has gaps, it has mistakes.
@Geolaminar
10 ай бұрын
That's because AI don't store their answers. I don't know how many times it has to be explained that AIs are not lookup tables. They're not compression, lossy or otherwise. That's made up by the NoAI crowd to try to pretend a generative AI can't produce original work. it was literally never true. Compression doesn't let you retrieve something that wasn't in the original dataset.
@gorak9000
10 ай бұрын
They must be using Hooli Nucleus
@obsidianjane4413
10 ай бұрын
Meh. The LLM datasets are less important than the algorithms that build them. GPT is just a chatbot. A big, good, training set is valuable for its functionally and the cost it took to build. Lots of datasets are being built these days. They are going to be like cyptos. The first one was valuable, but then everyone made one and the value of all dropped. Chatbots are good at "talking", as in it can predict what a human would say based upon the keywords in the prompt input. But the model does not "know" or "think" anything. Most of them are dumb. There best utility is in making serendipitous connections of concepts and ideas from masses of data.
@isbestlizard
10 ай бұрын
What do you think a human mind is, but lots of chatbots talking with each other, supervising each others output, correcting, analysing, reviewing, rating, amending in a way that creates the epiphenomenom of intelligence?
@obsidianjane4413
10 ай бұрын
@@isbestlizard That is not what the human mind is any more than it is a computer, or any other poor metaphor used before.
@damien2198
10 ай бұрын
Gonna be nice when will be able to run theses huge model distributed/trained/infered on "Folding@home" systems, uncensored
@dr.eldontyrell-rosen926
10 ай бұрын
"Malicious capabilities?" please define.
@retard1582
6 ай бұрын
generation of spam that is so complex that it will fool 90% of laymen. Help with the creation of fake bank login landings, and fake shopping sites. There's all kind of stuff that's possible. Voice spoofing, fake news generation, propaganda creation.
@RandomPerson-bv3ww
10 ай бұрын
as usual with these questions its not if but when
@joaosturza
10 ай бұрын
the companies would imediatly be massively sued if the training data is leaked, as it gives every party with works in it the possibility of suing the company, it is an unwinnable battle as hundreds potentialy tens of thousands of IP holders will sue chat GPT and openAI
@behindyou702
10 ай бұрын
Love the way you present your research, can’t believe I wasn’t subscribed!
@johnmoore8599
10 ай бұрын
Tavis Ormandy found Zenbleed where the CPU was exposing data from the system. I think hardware vulnerability security testing is in its infancy and he's one pioneer using software.
@SurmaSampo
10 ай бұрын
Travis is rockstar in the field!
@honor9lite1337
10 ай бұрын
@@SurmaSampois he still at Google?
@okman9684
10 ай бұрын
Imagine downloading the full version of gpt4 from your internet
@florin604
10 ай бұрын
😅
@romanowskis1at
10 ай бұрын
Easy with fiber to home, i think it should take few hours to full save on ssd.
@michaelpoblete1415
10 ай бұрын
the problem is running on what hardware.
@lashlarue7924
10 ай бұрын
8:45 Look, it isn't that we here in the US don't appreciate the contributions of Chinese nationals (and others too) to our infrastructure projects. We do. The issue is that if you have family, real estate, or other ties to China, or if you LIE about those ties, then you are susceptible to being manipulated, blackmailed, or otherwise vulnerable to coercion by regimes that can snap their fingers and send your parents or children into a gulag. That's why you guys get your clearances held up. It's not that we don't like you guys, it's that we have to face the cold hard facts about what happens when someone gets their arm twisted by the Ministry of State Security.
@EyesOfByes
10 ай бұрын
13:13 Glad I'm not the only one thinking that was Sam
@Т1000-м1и
8 ай бұрын
0:10 btw that Andrej guy is pretty controversial
@isbestlizard
10 ай бұрын
What if someone steals the collective writing of humanity, every book, news article, reddit post ever written, and uses it to train a model they then consider propietary trade secret? Can you really 'steal' something that was already stolen and hoarded?
@dr.eldontyrell-rosen926
10 ай бұрын
They hope to build these institutions amass huge investment and valuations and then cash out when regulations really hit.
@TwistersSK8
10 ай бұрын
When you read a book and acquire new knowledge, are you stealing the knowledge from the author of the book?
@stevengill1736
10 ай бұрын
Apparently the use of synthetic data is supposed to avoid DRCM or copyright issues as well as speed up processing, but I had to look up synthetic data: en.wikipedia.org/wiki/Synthetic_data
@howwitty
10 ай бұрын
@@TwistersSK8Uhhh... not the same as a machine "reading" the book. Isn't that obvious? Pirates made a similar argument that copying digital files isn't theft because the owner still has the original copy. Maybe you should try stealing this book?
@EpitomeLocke
10 ай бұрын
@@TwistersSK8 lmao are you seriously equating a human and an ai model
@jjj8317
10 ай бұрын
The goal is to build things in America, Canada, Europe etc by said people. The thing is, Chinese Canadians are also Canadian, and Chinese Americans are also Americans. It is not possible to ignore the issues that arises from people who have links or are literally part of the Chiense state in the aforementioned countries. Also, there is nothing wrong with being proud of your roots, and being proud of having a direct association with the Peoples Republic of China. You dont really want Chinese nationalist to actively manage a data center when there are other people who sre perfectly capable. I think people who cant differentiate the PRC snd chinese people are an issue, just like it is true that companies dealings with critical tech should be aware of people who have links to other states.
@stefanstankovic4781
10 ай бұрын
I'd rather not have any nationalist actively manage a data center, thank you very much. ...assuming we're using the term "nationalist" in a fanatical/irrational sense here.
@bruceli9094
10 ай бұрын
I think the future is India though. They are currently the world's biggest population.
@SalivatingSteve
10 ай бұрын
I think tech companies who pull the H1-B visa scam to save a few bucks on payroll are especially at risk of IP theft from foreign actors.
@jjj8317
10 ай бұрын
@@bruceli9094 A bit of the same issue. Huge nationalism issue that puts india or sikh values over Canada or America. In Canada there are riots where these two beat fuck out of each other. There has been assasinations and terrorists attacks. You have to prioritize the needs of the country above everything. I can tell you as an immigrant that some of the people who move to north america are a testament of bad screening practices. In Canada there has been cases of Chinese nationals who were somhow allowed to worked in defense programs, and took blue prints from frigates and signaling codes and handed them to the chinese state. In the case of the UK, they had a dude who worked in their nuclear program steal blue prints and recreated the bomb in Pakistan. So the whole it doesnt matter if a person is loyal to the country is ridiculous
@jjj8317
10 ай бұрын
@@stefanstankovic4781 You want to assure that you have your tech companies and data centers filter out of people who have direct ties with foreign states. Canada has suffered a lot of security breakdowns due to a lack of oversight and security clearence. It is very simple: you dont have to like american or western doctrine, but as long as you are western, you will be targeted. so you dont want people whose entire goal is to disrupt the enviroment you work and live in to control your data.
@Steven_Edwards
10 ай бұрын
There are so many open source LLMs trained on public resources that it is a moot point. Proprietary will never be able to keep up with open as far rate of improvement. When last I checked there was something like a dozen different LLMS most of them coming out of China but plenty coming out of other places in the world they've all been trained on different data sets many are up to gpt3-5 equivalence, exponentially faster than it took OpenAI to get to the same level. Honestly the big bottleneck is the same for everyone and that is inference. Processing prompts is an expensive proposition. I've seen used with home systems of up to 1pb of compute with GPUs that still are not performant enough to be realtime. As of right now only the largest in online services and state actors can afford inference that performs reasonable, that is the only thing that prevents true Democratization of AI at this point.
@raylopez99
10 ай бұрын
The biggest risk to GPT "theft" is simply an employee walking out the door with the knowledge of GPT. In California you cannot stop an employee from using what they remember. You can stop them from taking files with them however. It's a delicate balance but in general, "information wants to be free" and it's hard to keep stuff proprietary. At the core, GPT is matrix multiplication which cannot be copyright per se.
@raylopez99
10 ай бұрын
Also non-compete agreements have to be reasonable and in California are generally not enforced much by law except in specific circumstances.
@dtibor5903
10 ай бұрын
Absolutely true, but to recreate the same training data costs a lot.
@vvvv4651
10 ай бұрын
nobody can remember 1tb of data out the door buddy 😂. true tho.
@dtibor5903
10 ай бұрын
@@vvvv4651 it's more important how the training data was organized, structured, formatted and the training methods. If these informations would be really that secret, other LLMs would be far far behind.
@theobserver9131
10 ай бұрын
@@vvvv4651 there are a few special people that remember absolutely everything they see. They're usually fairly challenged cognitively, but they can remember a whole phonebook just by reading it once. Have you ever heard of rain man?
@JoseLopez-hp5oo
10 ай бұрын
Secure multi-party computing allows sensitive data to processed in secret without revealing the plaintext, however this is more to protect medical data for research and such. To protect a language model or some other complex business logic is best not to put the code in the hands of the attacker and use the glovebox / API methods to interact with with the sensitive IP without revealing it. Everything is so easy to hack, all your XPUs belong to me!
@binkwillans5138
10 ай бұрын
Open the pod bay doors, HAL.
@Quast
10 ай бұрын
8:25 Finally we know what John Doe looks like!
@svankensen
10 ай бұрын
Great video as always, but... you didn't answer the main question in the title. You went about how it would happen, not what consequences could be.
@lilhaxxor
10 ай бұрын
TLDR: Databases with user and business information are far more valuable. I honestly doubt anything will happen. You need a whole infrastructure and competent staff to make use of these large models. Stealing those is completely pointless. You can't even really do ransomware with it (albeit you mentioned personal data might be used in the training set, there are ways to alter such data enough to remove personally identifiable information). There is honestly nothing to worry about here in my opinion.
@nwalsh3
10 ай бұрын
While I refuse to call things like ChatGPT for AI, I can't deny that the security and usage scenarios fascinate me to no small degree. Partly because of my work background in security, but also of how, with little regards to what they type in, these text generators are being used. When companies activly have to go out in their internal communication channels and say "don't put personal or business data into [insert system here]", then you know that access, use and filters on people are basically non-existent. Some years back MS did a video on how the carious security layers in their datacentres are supposed to work (or was it AWS?). A good watch but, as with all things, a bit rosy. I worked at a company that had what they called a "secure facility". It was in fact so secure, that when a cleaner was going to clean in one of the server rooms, they yanked out a cable to run their machine... and 3/4 of the servers just stopped responding. Very Secure indeed.
@SurmaSampo
10 ай бұрын
Cleaners are the natural predator of DC's.
@SalivatingSteve
10 ай бұрын
The janitor unplugging a critical server sounds like my ISP Charter Spectrum.
@nwalsh3
10 ай бұрын
@@SalivatingSteve It wasn't just the server... it was a section of server racks that went. :D AND it was not an isolated incident either.
@NATANOJ1
10 ай бұрын
i worked in several it offices, there was always someone who had a similar story where a cleaner just pulled a plug to clean in the server room.
@jcdenton7914
10 ай бұрын
Ignore this, I am doing research and my own comment will show at the top when I revisit this. 13:53 Model Leeching: An Extraction Attack Targeting LLM's attacked a small LLM 14:39 Membership Inference Attacks on Machine Learning: A Survey 14:50 Reconstructing Training Data from Trained Neural Networks Goes onto how extracting training data and lead to copyright lawsuits Insider threats 16:10 "Two Former Twitter Employees and a Saudi National Charged as Acting as Illegal Agents of Saudi Arabia URL not shown. 16:58 Verizon 2023 Data Breach Investigation Not sure if useful but it's recent
@hermannyung7689
10 ай бұрын
the only way to prevent model being stolen is to keep pushing new and better models
@VEC7ORlt
10 ай бұрын
What will happen? Nothing, nothing at all, world will not implode, internet will be fine, LLM will give same half assed answers as before, maybe some stock numbers will fall and poor ceo heads will roll, but I'm fine with that.
@ikuona
10 ай бұрын
Just copy it on floppy disk and run away, easy.
@The_Conspiracy_Analyst
9 ай бұрын
See, using their data (in the form of responses) to train your own AI isn't an "attack" pe se. It's just doing what they did. And it shows culpability (IANALATINLA) that they see using third party data to train the AI is wrong. I bet the measures they took to prevent it happening to them will be subpoenaed and used against them in lawsuits. I mean, if internally they call using data to train without permission to be an "attack", that's pretty damning!!
@damien2198
10 ай бұрын
That s why OpenAI is planning to have their own hardware, who control the hardware controls the model (that would only be able to run on that specific hardware)
@nekogami87
10 ай бұрын
Pretty sure they don't ? The CEO opened a new company and used the name OpenAI to sell it to investors, but i'm pretty sure that new entity has nothing to do with OpenAI (and is fully for profit)
@sumansaha295
10 ай бұрын
unless they are running their models off of quantum computers it makes no difference, At the end of the day it's still just matrix multiplication in a specific order.
@dtibor5903
10 ай бұрын
@@sumansaha295matrix multiplications do not need quantum computers,
@Urgelt
10 ай бұрын
Purely open source models are not far behind Chat-GPT, and are advancing rapidly. We are approaching a tipping point: AI that is able to goal-seek and self-optimize, at which point curation of training data will no longer be much of an obstacle. AI will do it. The cat is almost out of the bag. It's probably too late to contain it. One obstacle remains: compute cycles. Training requires a lot of them. But advances are coming there, too - more compact models and better, cheaper chips tailored for training. AI is moving at blinding speed now. Anything proprietary you could steal will soon be obsolete - and even open source models will quickly surpass what was stolen. AI will fall into hands we might prefer not get it. No security protocols could prevent it, I'm thinking. What happens next, I can't even begin to guess.
@szaszm_
10 ай бұрын
I wonder whether NN model parameters fall under the copyright law, or if not, whether there's anything protecting it from copying. It's not really art, and it's not clear whether it's even a human creation.
@g00rb4u
10 ай бұрын
Get that hacker @01:02 a space heater so he doesn't have to wear his hoodie indoors!
@whothefoxcares
10 ай бұрын
someone like 3lonMu$k could teach machines that greed is good.
@GungaLaGunga
9 ай бұрын
Basically as the compression gets better, all of human knowledge can be copy pasted onto any device in seconds.
@marcfruchtman9473
10 ай бұрын
A little bit misleading... obviously "Nothing" happens. It's like asking, what if Actor A steals the open source content for public plays... There are so many open source near equivalents to GPT-4 now. And the data is simply out there to be scraped -- without having to do any hacking at all.
@vvvv4651
10 ай бұрын
haha this popped up on my feed right after fantasizing possibly leaked no limits gpt models. well done.
@Manbemanbe
10 ай бұрын
Good to see SBF taking that Home Ec class from prison there at 13:15 . You gotta stay busy, that's the key.
@ronaldmarcks1842
10 ай бұрын
Yan Xu has created a somewhat misleading graphic. For both GPT-2 and GPT-3, the architecture doesn't involve separate *decoders* in the way that some other neural network architectures do (like the Transformer model, which has distinct encoder and decoder components). Instead, GPT-2 and GPT-3 are based on the Transformer architecture, but they use only the decoder part of the original Transformer model. What Yan probably refers to are not decoders but *layers*: GPT-2 has four versions with the largest having 48 layers. GPT-3 is much larger, with its largest version having 175 billion parameters across 96 layers.
@Dissimulate
10 ай бұрын
The most humorous part of that deer picture was the word humorous in the caption.
@nneeerrrd
10 ай бұрын
Yeah, someone * cough cough * China Iran Russia
@stachowi
10 ай бұрын
This channel is unbelievably awesome
@av_oid
10 ай бұрын
Steals? Isn’t it OPEN AI? Or should it be called ClosedAI?
@Lopson13
10 ай бұрын
excellent video, would love to see more security videos from you!
@yeshwantpande2238
10 ай бұрын
You mean to say it's not yet stolen by traditional thiefs ? And will GTP4 help in stealing itself?
@glennac
10 ай бұрын
“Isn’t it ironic?” - Morissette
@MO_AIMUSIC
10 ай бұрын
Well, consider how big is the file, steal the parameter would be impossible to be unoticed. and even it is possible, would required physical move of the storage instead of transfer it over internet.
@flioink
10 ай бұрын
That's totally happening in the near future!
@redo1122
9 ай бұрын
This sounds like you want to present a plan to someone
@thomasmuller6131
10 ай бұрын
it sounds like sooner or later everyone has their own personal LLM and there is no money to be made with providing the service itself.
@astk5214
10 ай бұрын
I think i would love for open-source unix skynet
@MyILoveMinecraft
10 ай бұрын
Honestly with the importance of AI and the significant advantage of those who have full access to AI in compared to those who don't NOTHING about AI should be propietery. Especially openAI still pisses me off. AI was promised to be open source. Now we are further from that than ever (despite much off the foundations actually being created as open source code)
@adamgibbons4262
9 ай бұрын
If all chips had a unique identifier value then couldn’t you encode data to be only executed on a specific set of chips? Then you can simply forget about all the headaches of theft? Data would then be secure once, execute multiple times (on a set list of cpus)
@GavinM161
6 ай бұрын
Hasn't IBM been doing the encryption at 'line speed' for years with their mainframes?
@benjaminlynch9958
10 ай бұрын
I’m not terribly worried about any of these models being stolen or otherwise made non-proprietary by malicious actors. State of the art models only remain state of the art for a few months. We went from GPT 1 to GPT 4 in just 5 years. We went from DALL-E to DALL-E 3 in 33 months. Worst case scenario is that the stolen ‘foundational’ model becomes obsolete in 12-18 months, and likely much sooner unless it’s stolen immediately after being released. And that assumes that competing models don’t surpass it either.
@Valkyrie9000
10 ай бұрын
Which is exactly why nobody steals Lamborghinis older than 6 months old. They'll just build a faster/better one. /s
@johnbrooks7350
10 ай бұрын
It’s crazy to me that these models are so huge. I do wish many of these would be released entirely to the public. Even with the risks, I think open source and open development lead to the best long term production for everyone
@Fs3i
10 ай бұрын
Llama-2 is the biggest open source model. It’s very mid.
@H0mework
10 ай бұрын
@@Fs3i Goliath-120B is based on llama and I heard it's very good.
@magfal
10 ай бұрын
@@Fs3iit's not open source, it's relatively permissively licensed.
@henrytep8884
10 ай бұрын
Yes lets give everyone nuclear weapons....NO WE DON'T DO THAT
@johnbrooks7350
10 ай бұрын
@@henrytep8884 homie…. So only give private companies nuclear weapons??? What the hell is this ancap logic
@joelcarson4602
10 ай бұрын
And your interface for the model is not going to parse the model's parameters using a Commodore 64 either. You will need some serious silicon to really make use of it.
@nahimgudfam
10 ай бұрын
OpenAI's value is in their industry partnerships, not in their subpar LLM product.
@Narwaro
10 ай бұрын
I have yet to see any positive impacts of any of this stuff. Im kinda deep into the state of the art of reasearch in this field and its really not that impressive. The only thing I can see is that it replaces many stupid people in useless job positions. Which is yet to be seen if positive or negative.
@thebluegremlin
10 ай бұрын
entertainment
@grizwoldphantasia5005
10 ай бұрын
FWIW, I think the problem of stealing intellectual property is overblown, because if you rely on copying someone else's work, you have fewer resources to develop your own knowledge in the field, you are always one or two generation behind, and you don't know what to copy until the market decides what is successful. A business which relies on copying will never develop the institutional knowledge of all the hard work which is never published and can't be copied. A business which wants to do both has to put a lot more resources into the redundant efforts. A State-sponsored business might look like it has solved the money problem, but money is not resources, it is only access to resources, and States can only print money, not resources. The more inefficient a State-sponsored business is, the higher the opportunity cost, the fewer other fields can be investigated or exploited. It's one reason I do not fear CCP expats stealing proprietary IP; it weakens the CCP overall. The more they focus on copying freer market leaders, the more fields they fall behind in.
@greatquux
10 ай бұрын
This is a good point and one he has brought up in some other videos on computing history.
@bilalbaig8586
10 ай бұрын
Copying is viable strategy when you are significantly behind the market. It allows you to keep pace with fewer resources. It might not be something China may be satisfied with but other players with fewer resources like North Korea or Iran would definitely find value in.
@durschfalltv7505
10 ай бұрын
IP is evil anyway.
@obsidianjane4413
10 ай бұрын
Except most development is based upon prior work. When you have a bot that can churn thru a million patents and papers it can put A, B, and Z together better than any human, or even collection of humans can. The intellectual theft problem isn't in the stealing of the LLM, its the theft of the documents or works by the company that builds the training model. Its common to pay for research papers and for books etc. The claim is that they are scraping the internet for these documents without compensation or paying royalties. Yeah, the CCP being able to develop a 5th gen fighter aircraft really weakened them. More insidious is that the authoritarian states like the PRC have institutionalized IP theft. They do this by forcing expats to spy with extorting them with implied threats to family and themselves. Chinese nationals really are a security threat to other countries and companies. That isn't sinophobia, its just reality.
@SpaghetteMan
10 ай бұрын
@@obsidianjane4413 then you'd be stuck in the same quandary as the folks at the Manhattan Project when they were looking for "Jewish Communist Spies", and never suspected that the German-born Englishman Klaus Fuchs was the Soviet Spy after all. "Intellectual theft" is just a politician's word for "Corporate espionage" or "headhunting for skilled experts". Only idiots cut off their own nose to spite their face; there are plenty of ways businesses and industries insulate themselves from IP theft without kicking out highly capable workers from their potential hiring pool.
@Bluelagoonstudios
10 ай бұрын
It happened already, they could extract training data from GPT, by repeating a word 50x and it spit out these data. Even personal details from who wrote the data in the LLM. OpenAI closed the door by now. By noting this is against regulations from OpenAI. But is it solid enough? A lot of research has to be done to close off that one.
@lobotomizedamericans
10 ай бұрын
I'd fucking *love* to have a personal GPT4 or 5 with all BS ethical guard rails removed.
@SalivatingSteve
10 ай бұрын
I would split up their model among machines based on subject areas of knowledge. Each server running its own “department” at what I’m dubbing ChatGPT University 🎓
@johnkraft7461
10 ай бұрын
Remember what happened with the Bomb when only one guy had it ? Strangely, the use of the Bomb stopped when the Other Guy got one too ! Probably a good argument for open source from here on.
@nekoill
10 ай бұрын
Whoever knows better please correct me, but I'm pretty sure the source code of model, most likely alongside dataset (but probably on different storage devices, both physically and virtually), is stored somewhere on a machine that isn't connected to the web at large, if connected to any kind of network at all. That doesn't eliminate the risk of data being stolen, but you need to be physically present at the storage site fairly close to the computer (like *really* close) with a SATA cable shaped in a way that would allow it to serve as an antenna, or something like that. I expect OpenAI to take at least that kind of precaution, but who knows, dumb screwups happen in IT just as well.
@maht0x
10 ай бұрын
there is no "source code" of the model, the model is the output of the training program which takes PB of text as it's input + HFRL (Human feedback, Reinforcement learning) feedback (this bit was missed out of the description and is arguably the hardest to replicate). Search for openAi's "Learning from human preferences" paper
@nekoill
10 ай бұрын
@@maht0x yeah, sounds like it. Thank you for correction. My familiarity with ML/NNs is superficial, I know a couple of high-level concepts and a very coarse approximation of how it works under the hood.
@Kyzyl_Tuva
10 ай бұрын
Fantastic video. Really appreciate your channel.
@buzzlightyear3715
10 ай бұрын
"The time has come." It would be surprise a number of nation states havn't been stealing the LLM today😂
@JordanLynn
10 ай бұрын
I'm surprised Meta's (facebook) Ollama isn't mentioned, their model was literally leaked onto the internet, so starting with Ollama 2 Meta just releases it to the public. It's all over huggingface.
@wrathofgrothendieck
10 ай бұрын
Just don’t forget to steal the 40k computer chips that run the model…
@Excelray1
10 ай бұрын
Waiting for "Dynamic Large Language Models" (DLLM) to be a thing /jk
@Game_Hero
10 ай бұрын
8:26 Woah there! Did that IA succesfully put text, actual meaningful correct text, in a generated image???
@Veylon
10 ай бұрын
Dall-E actually does okay at that sometimes these days. Hands even have five fingers most of the time and are rarely backwards.
@MostlyPennyCat
10 ай бұрын
I wonder if you could ask gpt to steal itself for you.
@scarvalho1
9 ай бұрын
I love this video. Excellent and interesting title, and very good research.
@criticalgrower
8 ай бұрын
The most interesting video on what really is and what makes AI, the many sides of AI and how is made
@luxuriousturnip181
10 ай бұрын
If it is theoretically cheaper to steal the data than to reproduce or create something able to compete with it, the question of the security of the data is a matter of when not if. We should all be asking when this will happen, and an even more troubling question is if that when has already passed.
@SpiritsBB
10 ай бұрын
Maybe not - I’d say everyone would rather build the model themselves than go through this hassle. If it’s 80% as good, that means it’s not good enough.
@simonreij6668
10 ай бұрын
"just as chonk" i have a man crush on you
@MostlyPennyCat
10 ай бұрын
Maybe it's cyber thieves complaining it's too slow so they _don't_ encrypt memory! 😮
@MrRaizada
10 ай бұрын
TEE etc are nothing new or 2019 provnance. It goes back to 2005 by a proposal from IBM called Trusted computing and Intel's Le Grande project which later became Intel SGX. To cut a long story short, it basically assumes that a consumer of the hardware can not be trusted so hardware should be able to hide secrets. It has myrids of complexities but everything said and done it just involves this at core : To hide a public key in silicon and hope no one is able to retirve it. This is done by embedding a key in processor or trusted platform module and giving no option to retrive it. These kind of methods have long been studied and found to be failures.
@Sebastian-gf2fk
10 ай бұрын
If it happens, modern internet will die, if it happens...
@l2azic
10 ай бұрын
After hearing about the EUV theft from S. Korea to China of ASML tech. Tech needs more safeguards definitely.
@Zero11_ss
10 ай бұрын
Everything the Chinese do is based on theft and unfair business practices.
Пікірлер: 395