Hella Brand New AI Papers

Check out the Newsletter/Podcast with summaries of all the papers I kept:
open.substack.com/pub/evintun...
Support my learning journey either by clicking the Join button above, becoming a Patreon member, or a one-time Venmo!
/ tunadorable
account.venmo.com/u/tunadorable
Discuss this stuff with other Tunadorks on Discord
/ discord
All my other links
linktr.ee/tunadorable
Timestamps:
0:00 Intro
0:52 Accelerated Grokking by Amplifying Slow Gradients arxiv.org/abs/2405.20233
2:54 Standard Language Ideology in AI-Generated Language arxiv.org/abs/2406.08726
5:16 Optimizing Large Model Training through Overlapped Activation Recomputation arxiv.org/abs/2406.08756
6:01 Zoom and Shift are All You Need arxiv.org/abs/2406.08866
7:32 Diffusion - An Elementary Tutorial arxiv.org/abs/2406.08929
9:12 A Memory-Efficient Expert Switching Framework for LLMs arxiv.org/abs/2406.09041
11:17 Chain of Preference Optimization arxiv.org/abs/2406.09136
12:16 Scalable Functional Encryption in Federated Learning through Weight Clustering and Probabilistic Filters arxiv.org/abs/2406.09152
13:17 Towards Bidirectional Human-AI Alignment - A Systematic Review arxiv.org/abs/2406.09264
14:37 Analysing Neurons Across Languages and Tasks in LLMs arxiv.org/abs/2406.09265
15:42 Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding arxiv.org/abs/2406.09297
17:24 Why Warmup the Learning Rate? arxiv.org/abs/2406.09405
18:44 Interpreting the Weight Space of Customized Diffusion Models arxiv.org/abs/2406.09413
19:14 Explore the Limits of Omni-modal Pretraining at Scale arxiv.org/abs/2406.09412
21:14 Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? arxiv.org/abs/2406.04391
23:08 Scaling Speech Decoding With Self-Supervised Learning arxiv.org/abs/2406.04328
24:36 A Systematic Survey of Prompting Techniques arxiv.org/abs/2406.06608
27:48 When Swarm Learning meets energy series data arxiv.org/abs/2406.04743
29:47 Your Language Agents Already Know How to Achieve High-level Goals arxiv.org/abs/2406.04784
31:12 Zero, Finite, and Infinite Belief History of Theory of Mind Reasoning in LLMs arxiv.org/abs/2406.04800
31:51 LLMs emulate certain cognitive profiles arxiv.org/abs/2406.04988
32:49 Compositional Generalization with Grounded LLMs arxiv.org/abs/2406.04989
34:25 Large Generative Graph Models arxiv.org/abs/2406.05109
36:07 The Factorization Curse arxiv.org/abs/2406.05183
38:14 How to Strategize Human Content Creation in the Era of GenAI? arxiv.org/abs/2406.05187
40:41 Information Geometry of Evolution of NN Params While Training arxiv.org/abs/2406.05295
42:07 Concept Formation and Alignment in LLMs arxiv.org/abs/2406.05315
43:11 Critical Phase Transition in a LLM arxiv.org/abs/2406.05335
44:16 Natural Language-Oriented Programming arxiv.org/abs/2406.05409
48:10 Generalist Multimodal AI - A Review arxiv.org/abs/2406.05496
49:29 Automata Extraction from Transformers arxiv.org/abs/2406.05564
50:31 The Price of Debiasing Language Models arxiv.org/abs/2406.05587
51:22 Attention as a Hypernetwork arxiv.org/abs/2406.05816
52:52 LLM-powered Personalized Agent for Long-term Dialogue arxiv.org/abs/2406.05925
54:14 Recurrent Context Compression arxiv.org/abs/2406.06110
55:42 LLMs Resist Alignment arxiv.org/abs/2406.06144
56:46 Lifelong Learning of LLMs - A Survey arxiv.org/abs/2406.06391
57:25 What's in an embedding? arxiv.org/abs/2406.06870
58:43 Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot arxiv.org/abs/2406.06893
59:06 Effectively Compress KV Heads for LLM arxiv.org/abs/2406.07056
60:55 Teaching LLMs to Self-Improve by Learning from Language Feedback arxiv.org/abs/2406.07168
61:24 Ternarized LLM arxiv.org/abs/2406.07177
62:52 Needle In A Multimodal Haystack arxiv.org/abs/2406.07230
63:19 Limited Out-of-Context Knowledge Reasoning in LLMs arxiv.org/abs/2406.07393
63:41 Hybrid State Space Models for Efficient Unlimited Context Language Modeling arxiv.org/abs/2406.07522
66:31 Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy arxiv.org/abs/2406.07735
68:15 Are LLMs Good Statisticians? arxiv.org/abs/2406.07815
70:53 An Empirical Study of Mamba-based LLMs arxiv.org/abs/2406.07887
74:45 LLMs Must Be Taught to Know What They Don't Know arxiv.org/abs/2406.08391
75:25 Scaling Laws in Linear Regression arxiv.org/abs/2406.08466
76:19 Outro

Жүктеу

Пікірлер: 33

@dinoscheidt
17 күн бұрын
Thank you for investing the time and energy to comb through this. Bell rung ❤
@alexanderbrown-dg3sy
19 күн бұрын
My guy. Thanks for highlighting grokfast.
@mazakielad
18 күн бұрын
Thank you for your hard work 🥇
@wwkk4964
18 күн бұрын
Thank you
@spencerfunk6697
3 күн бұрын
my main focus has been bit net llms.ive been making a bitnet copy of mistral. i made a modified copy of the grokast ema algoritm this week after seeing your video on it being solved to work with the turnery weights to over fit training data properly and max grads out. it works well (validation loss still decreasing at similar rate to loss despite riding at a 10k average grad norm) im still training it so not sure if i will see any difference to performance but u know thought id share
@manuelbevand6366
16 күн бұрын
Thanks
@Tunadorable
16 күн бұрын
glad you enjoyed it! same style video comes out every week
@user-ew4jl4oo5r
19 күн бұрын
Bro I really like this kind of videos and the high level! I’m really happy that I discovered you in my feed. Maybe it’s too much to ask but if there was more vision and multi modal it will be great!!🎉
@Tunadorable
19 күн бұрын
I’m definitely going primarily towards whatever interests me rather than taking requests or even doing the popular papers but I’ll post a poll up in the community tab to see what everyone’s interested in and keep that info in the back of my head
@stereoplegic
19 күн бұрын
❤❤❤ the description
@Tunadorable
19 күн бұрын
bet I just completely removed timestamps for a couple of the papers that were especially boring
@sikunowlol
19 күн бұрын
i dont really understand the much of whats going on in here but its good to see people working hard for humanity!
@BrandonMcCurry999
18 күн бұрын
31:27 I've observed that several llms I have interacted with have a better theory of mind than many people I've met
@BrandonMcCurry999
18 күн бұрын
Correctly interpreting intent and why someone is saying what they are ect
@BrandonMcCurry999
18 күн бұрын
Meta llama comes to mind
@spencerfunk6697
3 күн бұрын
i share your exact sentiment in regards to democratization of ai heck ya man
@timstump
19 күн бұрын
Loojs like the Description only allows for about 5000 characters. Doc for time(76:19) was incomplete and it was the last one didplayed. Any chance you can put the remainder somewhere else? BTW, thanks for all the work you expended to provide this.
@Tunadorable
18 күн бұрын
Good idea. I'm adding it to my Todo list to update the python scripts I use to automate this whole paper downloading & timestamp recording thing. Might not be done by next week's video but at some point there should be a link in the description on future videos in this series pointing to a GitHub repo with a folder or csv file or something that'd solve your request, along with all the automation scripts I use in the process.
@kylev.8248
19 күн бұрын
Handsome dude and hella AI papers. Life’s good. Bless the KZitem algorithm 😂
@Tunadorable
19 күн бұрын
☺️
@kylev.8248
16 күн бұрын
@@Tunadorable hey, if you’re single (and also like) I’d definitely give you my number. It’s a rare enough combo to find attractive people in AI
@angloland4539
19 күн бұрын
❤
@AB-wf8ek
19 күн бұрын
"Reasonable person... yea, whatever" 😂😂😂
@Lolleka
17 күн бұрын
"Recent advancements in general purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical WHATEVER"
@bobtarmac1828
18 күн бұрын
Laid off by Ai and or human extinction? An Ai new world order? With swell robotics everywhere, Ai jobloss is the only thing I worry about anymore. Anyone else feel the same? Should we cease Ai?
@Tunadorable
18 күн бұрын
personally nah. most people are making the mistake of thinking about AI as something that will get added to the world while our institutions/systems (in this case capitalism and the need for everyone to have a job in order to put food on the table) remain static. i believe AI is disruptive enough that after some tough sledding in the short term we’ll have to redesign our systems/institutions in the long term, and that redesign will leave us in a better place
@Nexus-zc3cb
18 күн бұрын
that attention as hyper network paper is useless. they say its linear time (like linear attn) but that cant be true, it has the same or even worse speed and memory requirements as softmax attn and its much worse lol. reason being rmsnorm introduces nonlinearity that depends on all attention scores for a given query which prevents the attention calculation from being reformulated as simple matrix multiplication like standard linear attention even relu. even their statement is clearly wrong, they say that if you have a non linearity which acts on each element individually then its fine and can be made linear, but that's just not true. you may think that it still maintains a linear complexity with respect to sequence length, at page 16 they create an L x L x D//H and L x L x H matrix and i don't know how you could not make them at inference because rms norm is a nonlinearity and relu too, the main thing about linear attn is that it doesnt have a non linearity on the L x L x H attn matrix meaning at inference you can switch to a recurrent (linear time) form. here they have 2 non linearities lol the only thing holding me back is that no way these authors just never though about this and lied but i cant see how theyre right
@Tunadorable
18 күн бұрын
hmmm ill check back on this comment once i get around to reading it
@Nexus-zc3cb
18 күн бұрын
@@Tunadorable sure waiting
@Tunadorable
15 күн бұрын
i didn’t get as deep into it as you but did come to the conclusion that i wasn’t interested in reading past a certain point
@Nexus-zc3cb
15 күн бұрын
@@Tunadorable it's fine, only good thing which it provides is a good reasoning behind attention being a hypernetwork (which it is)
@Nexus-zc3cb
15 күн бұрын
idk if you have checked this before but this paper models the behavior of transformer layers by drawing inspiration from hopfield nets and some more too. "Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory"

Mixture of Agents (MoA) BEATS GPT4o With Open-Source (Fully Tested)

The Illusion of State in State-Space Models (like Mamba)

Really practical tips and tricks! How to securely fasten a wire to a metal pole #shorts #diy #tips

Дибала против вратаря Легенды

Vivaan Tanya once again pranked Papa 🤣😇🤣

The clown broke the wings of the white angel and gave the wings to Harley Quinn!#cosplay

You need to learn AI in 2024! (And here is your roadmap)

Hella New AI Papers This Week - June 29, 2024

AI’s Dirty Little Secret

How Britain Became a Poor Country

Making 1 MILLION Token Context LLaMA 3 (Interview)

Multimodal AI from First Principles - Neural Nets that can see, hear, AND write.

GPT2 is AS GOOD as Neuroscientists at Predicting Research Results?!

Is AGI Just a Fantasy?

Exploring Learning Dynamics in Concept Space

How I'd Learn AI (If I Had to Start Over)

Really practical tips and tricks! How to securely fasten a wire to a metal pole #shorts #diy #tips

Hella Brand New AI Papers - June 15, 2024

Пікірлер: 33