Title: Implicit bias of one large step of Gradient Descent
Abstract: In this talk I consider shallow neural networks, and present recent work on understanding the implicit bias of taking one large step of gradient descent. We explicitly characterize the learning direction, and its effects on the generalization capabilities of the neural network. Additionally, we show that different activation functions and regularization techniques can have different qualitative and quantitative effects on the learned features.