Негізгі бет Symbolic Knowledge Distillation: from General Language Models to Commonsense Models (Explained)

Күн бұрын

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models (Explained)

Рет қаралды 24,482

Yannic Kilcher

Жүктеу

Пікірлер: 64

@YannicKilcher
2 жыл бұрын
OUTLINE: 0:00 - Intro & Overview 2:30 - Sponsor: Weights & Biases 4:15 - Commonsense Knowledge Graphs 7:50 - ATOMIC dataset 10:00 - Generating the corpus from a model 13:00 - Prompting GPT-3 15:30 - Generating Events 18:40 - Generating Inferences 23:00 - Evaluating the created dataset 26:45 - Introducing the critic 31:25 - Using the critic to filter the data 36:30 - Training a student on the generated data 41:00 - Key Findings 44:45 - Comments & Conclusion
@samsungtelevision695
2 жыл бұрын
This channel is gold man. Thanks for your work
@sampruden6684
2 жыл бұрын
Now I'm curious how well GPT-3 would be able to be its own critic if one prompted it with a few of those event inferences and always/often/never evaluations. Would it consistently rank its output highly because the generation and validation come from the same biased model, or would the evaluation task be sufficiently different that it might notice when it generated something that didn't make sense? If the language model is able to be its own critic, one might be able to move significantly further towards eliminating the humans from the loop. "Generate a thing for me. Good, now tell me how well you think you did." is a potentially powerful idea. I expect there are papers on this already.
@amanvijayjindal5742
2 жыл бұрын
Yup, I completely agree and second the idea that if the GPT 3 itself can be the critic then we can completely remove the dependency on human being
@numbah16
2 жыл бұрын
So basically a GAN based off Left Brain/Right Brain (discriminator, generator). Throw in a stereo vision system on a body, and that's basically a conscious agent lol
@RaviAnnaswamy
2 жыл бұрын
Nice intuition. However instead of saying gpt being it’s own critic which causes self referential doubts, we should think of critic process as some inferences cross checked by other set of inferences and hence a valid way of verifying them. Each inference from a prompt is really an extrapolation in a fast random walk on somewhat disciplined chains. Critic tries to see if it can be reached via other known facts.
@sampruden6684
2 жыл бұрын
@@amanvijayjindal5742 I think you would still need a human to evaluate how good the language model is at being a critic, that's significantly less work than having the human do the critiquing.
@sampruden6684
2 жыл бұрын
@@numbah16 I don't think of it as being quite like a GAN because the two parts don't feedback and teach each other, but I do like that this is similar to how humans evaluate ideas - we generate lots and then reject most of them, sieving out the ones with potential.
@benibachmann9274
2 жыл бұрын
You should make a TEDx talk about the 47 million Lamborghinis in your garage up here in the Hollywood hills.
@MSDOS128
2 жыл бұрын
Is this an attack through attracting tax collecting agencies to the person? I genuinely can't tell, this is some kind of synthetic humour right here...
@danielalorbi
2 жыл бұрын
@@MSDOS128 I'm guessing it's a Tai Lopez joke based on the way Yannic is pronouncing Knowledge in the same way Tai did in his "Here in my garage... In the Hollywood Hills... You know what I value more than my Lamborghinis?" ad
@bukkiahgolden6043
8 ай бұрын
Napoleon was born in Corsica. Good catch, always sharp Yannic. I love your work.
@Virsconte
2 жыл бұрын
35:13 "Happy families are all alike; every unhappy family is unhappy in its own way."
@amanvijayjindal5742
2 жыл бұрын
Makes sense, wow , makes perfect sense
@DamianReloaded
2 жыл бұрын
In a sense this shows transformers learn a lot of noise. Maybe human brains do too, but as our brains have plasticity all the noise gets overtaken by more useful knowledge with time leaving only the things we use (the reason why I no longer know French :/). A sort of natural pruning/distillation.
@RaviAnnaswamy
2 жыл бұрын
That is one valid way of seeing the results. Another way is that humans learn noise too and they learn internal critics too. In fact initially learning has to be noise because by definition learning is learning something new. New connections unfiltered by earlier knowledge have to be recorded and tried out then filtered. Our dreams might be the filtering process when critic steps aside and all hypothesis is played out with critic labeling them silently. Often the doubts of a student questions and deepens a teachers understanding
@probably_crater
2 жыл бұрын
"X lost his shirt, as a result X is wearing a shirt"
@lucca1820
2 жыл бұрын
prompt engineering feels like learning how to extract information from an alien which is a superposition of human knowledge
@mahdipourmirzaei1048
2 жыл бұрын
Pretty cool. I am thinking of how many nlp tasks would be able to generate samples using this method.
@FourTwentyMagic
2 жыл бұрын
Notice me senpai
@hurktang
2 жыл бұрын
Well, a fire in a 2D bath would create turbulence and extinguish itself right ?
@oshri8
2 жыл бұрын
Very nice video thanks. I wonder why they didn't show the accuracy of the COMET model trained on a filtered Atomic 20 20 dataset.
@mahdipourmirzaei1048
2 жыл бұрын
Yannic, could you create a video about liquid neural network please?
@nikitastaf1996
2 жыл бұрын
The most amazing part of this paper is that it is relatively cheap.
@willfrank961
2 жыл бұрын
Looking forward to the world of GPT-3 and GPT-4 based apps. The GPT app-store will be dope.
@jawadmansoor6064
2 жыл бұрын
So GPT3 is like ENIAC and new models will be like PCs. :)
@titusfx
2 жыл бұрын
Yannic is an AI, 16:21 "a bunch of humans" 🤣
@smort123
2 жыл бұрын
X steals his grandfathers sword, so his grandfather punishes him serverly.
@jawadmansoor6064
2 жыл бұрын
If you KNOW LEDGE you don't fall
@trinityblood5622
2 жыл бұрын
I am not sure what they meant by knowledge graph... Because these do not seem to be KGs, instead the examples or the outputs that they produced are more like RDF Triplets... Clearly these triplets are isolated and without intrgrating them you cannot call them a Knowledge Graph. This is the problem of NLP community. They totally misunderstood the concept of knowledge graph.
@G12GilbertProduction
2 жыл бұрын
There's no Steve Brunton in the paper credits? I guess he working in the multilinear neural network models on your campus.
@herp_derpingson
2 жыл бұрын
0:30 Can we appreciate how the 175B model robot emoji is so large that it doesnt even fit in the diagram? . This research has a pretty good business value. These are the general steps 1. Get a large corpus of unstructured text on some subject matter 2. Get some expert humans generate some prompts and questionnaire for the corpus 3. Make the model generate more prompts 4. Run the prompts on the unstructured text 5. Evaluate a subset of the results and use it to teach a critic network 6. Filter the data and keep evaluating until the acceptance ratio improves . I think business process improvement firms are going to jump onto this technology.
@jawadmansoor6064
2 жыл бұрын
Can this model be used like a hugging face model? (to perform zero shot language tasks?)
@Ke_Mis
2 жыл бұрын
Do you know what I like more than GPT-3....KNAWLEDGE!
@waxwingvain
2 жыл бұрын
"just pay a bunch of humans" sounds what a robot CEO would say if he needed something for his new hip robo start up
@cjhhong
2 жыл бұрын
The critic part seems a good way of correcting biases in a model while maximizing its performance. Is it a novel thing in this paper? Or, something like a commonly used method in this field?
@raunaquepatra3966
2 жыл бұрын
So now machines are better than humans in creating datasets. The only thing that needed humans. 😱 I like where it's going
@ikoukas
2 жыл бұрын
Maybe he cannot find the shirt because he is wearing it, the way we often think we can't find our glasses because we are wearing them?
@holthuizenoemoet591
7 ай бұрын
PersonX appears to be 3d
@LiaAnggraini1
2 жыл бұрын
Save my day by understanding a paper with 45mins. Thanks a lot!
@paxdriver
2 жыл бұрын
I like whne you step through the math. Great video, thanks for all your work!
@MyMattinthehat
2 жыл бұрын
PersonX appears to be 3 Dimensional. Ah yes, I see
@joywang8173
2 жыл бұрын
i love your idea that we need to include the irrelevant maths to make the paper accepted lolll
@adokoka
2 жыл бұрын
Great content! Keep going. The future is bright :)
@kokobertin
2 жыл бұрын
can i have ur mail dear Yannick, from DRC
@erdna97
2 жыл бұрын
What do you use to annotate papers? iPad + pencil?
@tellmebaby183
2 жыл бұрын
Why you wear those glasses when there is no SUN or SON?
@quebono100
2 жыл бұрын
When vim rc? come on, takes you 5 minutes
@1PercentPure
Жыл бұрын
Love u
@THEMATT222
2 жыл бұрын
Noice 👍
@nichevo
2 жыл бұрын
second!
@jawadmansoor6064
2 жыл бұрын
Skynet ... coming soon
@MrSchweppes
2 жыл бұрын
A quick question. Help is much appreciated. Is it possible to extract knowledge from a model like GPT-6 once it is fine-tuned on a couple thousand of examples? I want to build a Q&A app based on GPT-6, but before fine-tuning, I want to train it some more in order to add some copyrighted text (from various books). Will it be possible to find out the specific books that were added to the training set, once the model is finetuned? Many thanks for your answer!
@StephenRayner
2 жыл бұрын
Excellent channel!
@JTMoustache
2 жыл бұрын
Love it
@jabowery
2 жыл бұрын
Imagine my shock when confronted with the counterintuitive result that reducing the size of a model of a corpus could produce a better model. Maybe someone, like Musk or the X-Prize guys, should increase the purse of The Hutter Prize for Lossless Compression of Human Knowledge.
@TheRohr
2 жыл бұрын
The main point is about reducing the noise in the dataset. The shrinking is only an effect of this. This approach is more towards data-centric machine learning. A larger dataset with the same amount of reduced noise might still produce better models.
@RaviAnnaswamy
2 жыл бұрын
The original model has lot more potential hypotheses about various things while the distilled one is a selection for a purpose of common sense reasoning. It does so by removing useless and potentially wrong inferences. It is better at ONE task even though that task is a general and useful one, bringing explainability to gpt s thought process. The original model is still the richer abs creative one, the genius with far less labels and constraints to its creativity and hence untamed.
@jabowery
2 жыл бұрын
@@RaviAnnaswamy Yes and thanks to both you and @TheRohr for correcting me. My mis-interpretation arose from a research direction that occurred to me while listening to this presentation: Interpretable model creation through "distillation" to remove the "noise", in such a way as to enable the model to create a high quality corpus, could be a much better approach to inducing the algorithmic information of the original, large, model. A way of looking at algorithmic information in terms of distillation and noise is to consider an algorithm that outputs the original corpus as being composed of two parts: Executable code (knowledge) and uncompressible literal data (noise). Understanding "the original corpus" here is that used to train the large language model, hence is not a practical test target, it would be most interesting to take a much smaller corpus and run through a similar process. The Hutter Prize's enwik9 corpus, being only a billion characters, may be _too_ small, but its relatively high quality content may somewhat compensate for that. Moreover, if the entirety of Wikipedia were included in the corpus (including other languages), the results could be very interesting indeed, and accomplished with far fewer resources than those required to generate GPT-3. If Solomonoff's proof regarding induction holds, then the resulting distilled model that outputs the composite Wikipedia corpus should be not only much more _interpretable_ than the much larger model that outputs the same corpus, but its behavior should be far _richer_ and more _creative_ than the much larger model.
@RaviAnnaswamy
2 жыл бұрын
@@jabowery Very nice elaboration, thanks for taking time to articulate further. Gives more context and new relationships. The behavior of derived and verified model may be more reliable and useful but not necessarily more creative or 'richer'. Here is what I mean by that. More interpretable or comprehensible something is, less comprehensive it is. Completeness versus Provability tradeoff (Godel). This I think because, when we interpret something, it is nothing but restating it in terms of other facts or ideas. So we are de-scribing it with respect to THOSE facts. That is, we are re-presenting it in a coordinate system provided by those other known facts. While this process of interpretation helps is using this new fact, it is a reduction of the fact in terms of a subset of facts and hence it reduces the expressivity of full meaning of that fact. This is what I meant by saying the learned model is a REDUCTION, though probably far more handleable and useful. GPT3 is prescience. Comet Distilled is science or technology. This was my original thought. But after reading your discussion, I see how a verified subset of valid facts can be used as basis to unearth previously hidden facts - obfuscated by noise and unverified hypothesis. Thanks a lot.
@textjoint
2 жыл бұрын
Paper authors: Here is a bunch of techniques we used to generate corpus to train commonsense model. Commenters: Yeah! GPT-4 is going to be dope! Thanks for the video, Yannic! I would never digest that paper on my own.