On Monday 12/November 11:00am Dr. Julien Mairal, Inria, Grenoble, will give a talk on: Invariance and Stability to Deformations of Deep Convolutional Representations.
At 2:00pm of the same day Mme Magda Gregorová will defend her thesis: Sparse Learning for variable selection with structures and nonlinearities.
Both presentations will take place at the Battelle Campus, building B, 3rd floor, room B3.08
Invariance and Stability to Deformations of Deep Convolutional Representations
Abstract: The success of deep convolutional architectures is often attributed in part to their ability to learn multiscale and invariant representations of natural signals. However, a precise study of these properties and how they affect learning guarantees is still missing. In this work, we consider deep convolutional representations of signals; we study their invariance to translations and to more general groups of transformations, their stability to the action of diffeomorphisms, and their ability to preserve signal information. This analysis is carried by introducing a multilayer kernel based on convolutional kernel networks and by studying the geometry induced by the kernel mapping. We then characterize the corresponding reproducing kernel Hilbert space (RKHS), showing that it contains a large class of convolutional neural networks with homogeneous activation functions. This analysis allows us to separate data representation from learning, and to provide a canonical measure of model complexity, the RKHS norm, which controls both stability and generalization of any learned model. In addition to models in the constructed RKHS, our stability analysis also applies to convolutional networks with generic activations such as rectified linear units, and we discuss its relationship with recent generalization bounds based on spectral norms. This is a joint work with Alberto Bietti.
Sparse Learning for variable selection with structures and nonlinearities.
Abstract: In this thesis we discuss machine learning methods performing automated variable selection for learning sparse predictive models. There are multiple reasons for promoting sparsity in the predictive models. By relying on a limited set of input variables the models naturally counteract the overfitting problem ubiquitous in learning from finite sets of training points. Sparse models are cheaper to use for predictions, they usually require lower computational resources and by relying on smaller sets of inputs can possibly reduce costs for data collection and storage. Sparse models can also contribute to better understanding of the investigated phenomenons as they are easier
to interpret than full models.
We are specifically interested in problems with non-trivial sparse relationships amongst the data. In particular, problems where the dependencies exhibit some sparse patterns that can be exploited in the modelling but for which the prior understanding is not sufficient to formulate explicit constraints to be hard-wired into the model. We build on the ideas of learning with structured sparsity to factor such patterns into the models.
Furthermore, as the relationships may be too complex to be satisfactorily captured by simple linear functions we allow the methods to operate over a broader space of nonlinear functions. For this we rely on the theory of regularised learning in the reproducing kernel Hilbert spaces (RKHSs) and extend it in the direction of sparse learning in nonlinear non-additive models. Throughout the thesis we propose multiple new methods for sparse learning over reduced set of input variables. We initially concentrate on the problem of multivariate time series forecasting and develop methods that learn forecasting models together with discovering the Granger causality dependencies amongst the series.
In the second half of the manuscript we focus on the more general problem of learning sparse nonlinear regression functions. Making parallels to linear modelling, we formulate new regularisers based on partial derivatives of the function to promote structured sparsity in the nonlinear model. We show how these can be incorporated into the kernel regression problem and reformulated into a problem solvable in practice by an iterative algorithm derived from the alternating direction method of multipliers
Finally, we address the scalability issues of sparse learning with kernel methods. We use the random Fourier features to approximate the kernel function and shift the sparsity search from the original function space into the space of the random features. We thus significantly reduce the dimensionality of the search space and therefore the computational complexity even when working over large datasets with thousands of data instances.