Abstract: In this talk, I will present some of our recent attempts at bridging the theory-practice divide between optimization theory and machine learning practice. While we have witnessed rapid advances in optimization theory and its application machine learning (optimal algorithms, minimax rates, tight complexity bounds), many of these advances have failed to advance machine learning practice. Many upper and lower bound tables and finite-time rates are now completed, yet theoretically optimal algorithms do not seem to be widely used in practice, and many practical algorithms seem to have theoretical issues. I will report on the attempts in my group to address a few of these gaps. Specifically I will discuss a few cases where empirical studies can help extend existing theories to better explain the state of the art in practice. As a first example, I will discuss how standard assumptions about Lipschitz-smoothness of the gradient can be relaxed to better match empirical observations. Using a relaxed smoothness assumption, one can provide an explanation for why adaptively scaled gradient methods and “gradient clipping” algorithms work well in practice. Next, I will show how lack of differentiability leads to complications in defining approximate stationary points in nonconvex non-differentiable optimization and present a modified definition and algorithm with tight rates that are reminiscent of the type of algorithms used in practice. As another case study, I focus on use of online learning to possibly exploit variability in the stochastic gradient noise across samples. Finally, I will conclude the talk by arguing for a need in a paradigm shift in analysis of large scale optimization algorithms as used in training deep neural network. Using some empirical examples, I will discuss the puzzle of having convergence of training loss without convergence to a stationary points and advocate for a different kind of analysis of optimization algorithms that focuses on convergence of empirical measures to a distribution, as opposed to convergence of iterates to a point. This talk is mostly based on the work of my PhD student Jingzhao Zhang, together with my colleague professor Suvrit Sra.
- Күн бұрын
Ali Jadbabaie | Optimization Theory and Machine Learning Practice: Minding the Gap
- Рет қаралды 326
Пікірлер