Ali Jadbabaie | Optimization Theory and Machine Learning Practice: Minding the Gap

Abstract: In this talk, I will present some of our recent attempts at bridging the theory-practice divide between optimization theory and machine learning practice. While we have witnessed rapid advances in optimization theory and its application machine learning (optimal algorithms, minimax rates, tight complexity bounds), many of these advances have failed to advance machine learning practice. Many upper and lower bound tables and finite-time rates are now completed, yet theoretically optimal algorithms do not seem to be widely used in practice, and many practical algorithms seem to have theoretical issues. I will report on the attempts in my group to address a few of these gaps. Specifically I will discuss a few cases where empirical studies can help extend existing theories to better explain the state of the art in practice. As a first example, I will discuss how standard assumptions about Lipschitz-smoothness of the gradient can be relaxed to better match empirical observations. Using a relaxed smoothness assumption, one can provide an explanation for why adaptively scaled gradient methods and “gradient clipping” algorithms work well in practice. Next, I will show how lack of differentiability leads to complications in defining approximate stationary points in nonconvex non-differentiable optimization and present a modified definition and algorithm with tight rates that are reminiscent of the type of algorithms used in practice. As another case study, I focus on use of online learning to possibly exploit variability in the stochastic gradient noise across samples. Finally, I will conclude the talk by arguing for a need in a paradigm shift in analysis of large scale optimization algorithms as used in training deep neural network. Using some empirical examples, I will discuss the puzzle of having convergence of training loss without convergence to a stationary points and advocate for a different kind of analysis of optimization algorithms that focuses on convergence of empirical measures to a distribution, as opposed to convergence of iterates to a point. This talk is mostly based on the work of my PhD student Jingzhao Zhang, together with my colleague professor Suvrit Sra.

Жүктеу

Cynthia Rudin | Scoring Systems -At the Extreme of Interpretable Machine Learning

Prof. Mert GürbüzbalabanDS | Heavy-tail Phenomenon in SGD

Опасность фирменной зарядки Apple

#JasonDeruloTV // Lottery #GotPermissionToPost From @prestige_et_collection #FromTheIslands

Айбек Оралбай - Олайтан Олаоре | 92 кг | БОКС | Олимпиада | 1/16 финал

ПРЕВРАТИЛИ ВСЁ В ТОРТ! ТОРТ ИЛИ ФЕЙК ЧЕЛЛЕНДЖ #Shorts #Глент

Dr. Alexis Bellot | Policy Analysis using Synthetic Controls in Continuous-Time

The moment we stopped understanding AI [AlexNet]

Prof. Saptarshi Das -A Bio-inspired Era of Sensing, Computing, Storage, and Security

How to Appreciate Your Life Without Getting Attached | Eckhart Tolle

Do you think that ChatGPT can reason?

Prof. Ilya Shpitser | The Proximal ID Algorithm

Prof. Mahmoud Hussein -Large-scale vibron-phonon couplings: A new strategy in thermoelectricity

GEOMETRIC DEEP LEARNING BLUEPRINT

David Wang | My Video Conferencing Journey

Professor Caroline Uhler | Causality and Autoencoders in the Light of Drug Repurposing for COVID-19

Опасность фирменной зарядки Apple

Ali Jadbabaie | Optimization Theory and Machine Learning Practice: Minding the Gap

Пікірлер