Adjointe scientifique HES (postdoc) magda.gregorova@hesge.ch |
Bio
I am a member of the data mining and machine learning team of Mr. Kalousis in the Department of Business Informatics of the University of Applied Sciences-Western Switzerland, Geneva, as adjointe scientifique (postdoc). I graduated from a Master's program in statistics at the University of Economics in Prague in 2001. After gaining over 12 years of work experience as an applied statistician tackling real-life problems (including in the statistical departments of the Czech and the European Central Banks, and EUROCONTROL, the European Organisation for the Safety of Air Navigation), I have returned to academia and in 2018 defended a PhD in machine learning at the Department of Computer Science at the University of Geneva.
Research
During my PhD I focused primarily on structured sparsity and kernel methods. I'm now exploring the areas of generative modelling, life-long learning and Bayesian inference.
Teaching
I teach data mining and machine learning courses at bachelor and masters level (course material).
Publications
2021 |
Gregorova, Magda; Desaules, Marc; Kalousis, Alexandros Learned transform compression with optimized entropy encoding Inproceedings Neural Compression: From Information Theory to Applications--Workshop@ ICLR 2021, 2021. @inproceedings{gregorova2021learned, title = {Learned transform compression with optimized entropy encoding}, author = {Magda Gregorova and Marc Desaules and Alexandros Kalousis}, url = {https://openreview.net/pdf?id=SmV8N_RbB_}, year = {2021}, date = {2021-01-01}, booktitle = {Neural Compression: From Information Theory to Applications--Workshop@ ICLR 2021}, abstract = {We consider the problem of learned transform compression where we learn both, the transform as well as the probability distribution over the discrete codes. We utilize a soft relaxation of the quantization operation to allow for back-propagation of gradients and employ vector (rather than scalar) quantization of the latent codes. Furthermore, we apply similar relaxation in the code probability assignments enabling direct optimization of the code entropy. To the best of our knowledge, this approach is completely novel. We conduct a set of proof-of concept experiments confirming the potency of our approaches.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We consider the problem of learned transform compression where we learn both, the transform as well as the probability distribution over the discrete codes. We utilize a soft relaxation of the quantization operation to allow for back-propagation of gradients and employ vector (rather than scalar) quantization of the latent codes. Furthermore, we apply similar relaxation in the code probability assignments enabling direct optimization of the code entropy. To the best of our knowledge, this approach is completely novel. We conduct a set of proof-of concept experiments confirming the potency of our approaches. |
2020 |
Lavda, Frantzeska; Gregorova, Magda; Kalousis, Alexandros Data-Dependent Conditional Priors for Unsupervised Learning of Multimodal Data Journal Article Entropy2020, 22(8) (888), 2020. @article{Lavda2020entropy, title = {Data-Dependent Conditional Priors for Unsupervised Learning of Multimodal Data}, author = {Frantzeska Lavda and Magda Gregorova and Alexandros Kalousis}, url = {https://www.mdpi.com/1099-4300/22/8/888 https://bitbucket.org/dmmlgeneva/cp-vae/src/master/}, doi = {https://doi.org/10.3390/e22080888}, year = {2020}, date = {2020-08-13}, journal = {Entropy2020}, volume = {22(8)}, number = {888}, abstract = {One of the major shortcomings of variational autoencoders is the inability to produce generations from the individual modalities of data originating from mixture distributions. This is primarily due to the use of a simple isotropic Gaussian as the prior for the latent code in the ancestral sampling procedure for data generations. In this paper, we propose a novel formulation of variational autoencoders, conditional prior VAE (CP-VAE), with a two-level generative process for the observed data where continuous z and a discrete c variables are introduced in addition to the observed variables x. By learning data-dependent conditional priors, the new variational objective naturally encourages a better match between the posterior and prior conditionals, and the learning of the latent categories encoding the major source of variation of the original data in an unsupervised manner. Through sampling continuous latent code from the data-dependent conditional priors, we are able to generate new samples from the individual mixture components corresponding, to the multimodal structure over the original data. Moreover, we unify and analyse our objective under different independence assumptions for the joint distribution of the continuous and discrete latent variables. We provide an empirical evaluation on one synthetic dataset and three image datasets, FashionMNIST, MNIST, and Omniglot, illustrating the generative performance of our new model comparing to multiple baselines.}, keywords = {}, pubstate = {published}, tppubtype = {article} } One of the major shortcomings of variational autoencoders is the inability to produce generations from the individual modalities of data originating from mixture distributions. This is primarily due to the use of a simple isotropic Gaussian as the prior for the latent code in the ancestral sampling procedure for data generations. In this paper, we propose a novel formulation of variational autoencoders, conditional prior VAE (CP-VAE), with a two-level generative process for the observed data where continuous z and a discrete c variables are introduced in addition to the observed variables x. By learning data-dependent conditional priors, the new variational objective naturally encourages a better match between the posterior and prior conditionals, and the learning of the latent categories encoding the major source of variation of the original data in an unsupervised manner. Through sampling continuous latent code from the data-dependent conditional priors, we are able to generate new samples from the individual mixture components corresponding, to the multimodal structure over the original data. Moreover, we unify and analyse our objective under different independence assumptions for the joint distribution of the continuous and discrete latent variables. We provide an empirical evaluation on one synthetic dataset and three image datasets, FashionMNIST, MNIST, and Omniglot, illustrating the generative performance of our new model comparing to multiple baselines. |
Lavda, Frantzeska; Gregorová, Magda; Kalousis, Alexandros Improving VAE generations of multimodal data through data-dependent conditional priors Conference 24th European Conference on Artificial Intelligence, 325 , IOS Press, 2020. @conference{Lavda2020ecai, title = {Improving VAE generations of multimodal data through data-dependent conditional priors}, author = {Frantzeska Lavda and Magda Gregorová and Alexandros Kalousis}, url = {http://ebooks.iospress.nl/volumearticle/55021 https://bitbucket.org/dmmlgeneva/cp-vae/src/master/}, doi = {10.3233/FAIA200226}, year = {2020}, date = {2020-08-01}, booktitle = {24th European Conference on Artificial Intelligence}, journal = {IOS press}, volume = {325}, pages = {1254-1261}, publisher = {IOS Press}, abstract = {One of the major shortcomings of variational autoencoders is the inability to produce generations from the individual modalities of data originating from mixture distributions. This is primarily due to the use of a simple isotropic Gaussian as the prior for the latent code in the ancestral sampling procedure for the data generations. We propose a novel formulation of variational autoencoders, conditional prior VAE (CP-VAE), which learns to differentiate between the individual mixture components and therefore allows for generations from the distributional data clusters. We assume a two-level generative process with a continuous (Gaussian) latent variable sampled conditionally on a discrete (categorical) latent component. The new variational objective naturally couples the learning of the posterior and prior conditionals, and the learning of the latent categories encoding the multimodality of the original data in an unsupervised manner. The data-dependent conditional priors are then used to sample the continuous latent code when generating new samples from the individual mixture components corresponding to the multimodal structure of the original data. Our experimental results illustrate the generative performance of our new model comparing to multiple baselines.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } One of the major shortcomings of variational autoencoders is the inability to produce generations from the individual modalities of data originating from mixture distributions. This is primarily due to the use of a simple isotropic Gaussian as the prior for the latent code in the ancestral sampling procedure for the data generations. We propose a novel formulation of variational autoencoders, conditional prior VAE (CP-VAE), which learns to differentiate between the individual mixture components and therefore allows for generations from the distributional data clusters. We assume a two-level generative process with a continuous (Gaussian) latent variable sampled conditionally on a discrete (categorical) latent component. The new variational objective naturally couples the learning of the posterior and prior conditionals, and the learning of the latent categories encoding the multimodality of the original data in an unsupervised manner. The data-dependent conditional priors are then used to sample the continuous latent code when generating new samples from the individual mixture components corresponding to the multimodal structure of the original data. Our experimental results illustrate the generative performance of our new model comparing to multiple baselines. |
Ramapuram, Jason; Gregorova, Magda; Kalousis, Alexandros Lifelong generative modeling Journal Article Neurocomputing, 404 , pp. 381 - 400, 2020, ISSN: 0925-2312, (Code: https://bitbucket.org/dmmlgeneva/lifelonggenerativemodeling). @article{RAMAPURAM2020381, title = {Lifelong generative modeling}, author = {Jason Ramapuram and Magda Gregorova and Alexandros Kalousis}, url = {http://www.sciencedirect.com/science/article/pii/S0925231220303623 https://bitbucket.org/dmmlgeneva/lifelonggenerativemodeling}, doi = {https://doi.org/10.1016/j.neucom.2020.02.115}, issn = {0925-2312}, year = {2020}, date = {2020-01-01}, journal = {Neurocomputing}, volume = {404}, pages = {381 - 400}, abstract = {Lifelong learning is the problem of learning multiple consecutive tasks in a sequential manner, where knowledge gained from previous tasks is retained and used to aid future learning over the lifetime of the learner. It is essential towards the development of intelligent machines that can adapt to their surroundings. In this work we focus on a lifelong learning approach to unsupervised generative modeling, where we continuously incorporate newly observed distributions into a learned model. We do so through a student-teacher Variational Autoencoder architecture which allows us to learn and preserve all the distributions seen so far, without the need to retain the past data nor the past models. Through the introduction of a novel cross-model regularizer, inspired by a Bayesian update rule, the student model leverages the information learned by the teacher, which acts as a probabilistic knowledge store. The regularizer reduces the effect of catastrophic interference that appears when we learn over sequences of distributions. We validate our model’s performance on sequential variants of MNIST, FashionMNIST, PermutedMNIST, SVHN and Celeb-A and demonstrate that our model mitigates the effects of catastrophic interference faced by neural networks in sequential learning scenarios.}, note = {Code: https://bitbucket.org/dmmlgeneva/lifelonggenerativemodeling}, keywords = {}, pubstate = {published}, tppubtype = {article} } Lifelong learning is the problem of learning multiple consecutive tasks in a sequential manner, where knowledge gained from previous tasks is retained and used to aid future learning over the lifetime of the learner. It is essential towards the development of intelligent machines that can adapt to their surroundings. In this work we focus on a lifelong learning approach to unsupervised generative modeling, where we continuously incorporate newly observed distributions into a learned model. We do so through a student-teacher Variational Autoencoder architecture which allows us to learn and preserve all the distributions seen so far, without the need to retain the past data nor the past models. Through the introduction of a novel cross-model regularizer, inspired by a Bayesian update rule, the student model leverages the information learned by the teacher, which acts as a probabilistic knowledge store. The regularizer reduces the effect of catastrophic interference that appears when we learn over sequences of distributions. We validate our model’s performance on sequential variants of MNIST, FashionMNIST, PermutedMNIST, SVHN and Celeb-A and demonstrate that our model mitigates the effects of catastrophic interference faced by neural networks in sequential learning scenarios. |
2018 |
Gregorova, Magda Sparse learning for variable selection with structures and nonlinearities PhD Thesis 2018, (PhD Thesis ID: unige:115678). @phdthesis{, title = {Sparse learning for variable selection with structures and nonlinearities}, author = {Magda Gregorova}, url = {https://archive-ouverte.unige.ch/unige:115678}, year = {2018}, date = {2018-11-01}, abstract = {In this thesis we discuss machine learning methods performing automated variable selection for learning sparse predictive models. There are multiple reasons for promoting sparsity in the predictive models. By relying on a limited set of input variables the models naturally counteract the overfitting problem ubiquitous in learning from finite sets of training points. Sparse models are cheaper to use for predictions, they usually require lower computational resources and by relying on smaller sets of inputs can possibly reduce costs for data collection and storage. Sparse models can also contribute to better understanding of the investigated phenomenons as they are easier to interpret than full models.}, type = {PhD Thesis}, note = {PhD Thesis ID: unige:115678}, keywords = {}, pubstate = {published}, tppubtype = {phdthesis} } In this thesis we discuss machine learning methods performing automated variable selection for learning sparse predictive models. There are multiple reasons for promoting sparsity in the predictive models. By relying on a limited set of input variables the models naturally counteract the overfitting problem ubiquitous in learning from finite sets of training points. Sparse models are cheaper to use for predictions, they usually require lower computational resources and by relying on smaller sets of inputs can possibly reduce costs for data collection and storage. Sparse models can also contribute to better understanding of the investigated phenomenons as they are easier to interpret than full models. |
Lavda, Frantzeska; Ramapuram, Jason; Gregorova, Magda; Kalousis, Alexandros Continual Classification Learning Using Generative Models Workshop Continual learning Workshop NeurIPS 2018, 2018. @workshop{DBLP:journals/corr/abs-1810-10612, title = {Continual Classification Learning Using Generative Models}, author = {Frantzeska Lavda and Jason Ramapuram and Magda Gregorova and Alexandros Kalousis}, url = {http://arxiv.org/abs/1810.10612}, year = {2018}, date = {2018-10-24}, booktitle = {Continual learning Workshop NeurIPS 2018}, journal = {CoRR}, keywords = {}, pubstate = {published}, tppubtype = {workshop} } |
Gregorová, Magda; Ramapuram, Jason; Kalousis, Alexandros; Marchand-Maillet, Stéphane Large-Scale Nonlinear Variable Selection via Kernel Random Features Inproceedings Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Dublin, Ireland, September 10-14, 2018, Proceedings, Part II, pp. 177–192, 2018. @inproceedings{DBLP:conf/pkdd/GregorovaRKM18, title = {Large-Scale Nonlinear Variable Selection via Kernel Random Features}, author = {Magda Gregorová and Jason Ramapuram and Alexandros Kalousis and Stéphane Marchand-Maillet}, url = {https://doi.org/10.1007/978-3-030-10928-8_11}, doi = {10.1007/978-3-030-10928-8_11}, year = {2018}, date = {2018-01-01}, booktitle = {Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Dublin, Ireland, September 10-14, 2018, Proceedings, Part II}, pages = {177--192}, crossref = {DBLP:conf/pkdd/2018-2}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } |
Gregorová, Magda; Kalousis, Alexandros; Marchand-Maillet, Stéphane Structured nonlinear variable selection Inproceedings Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018, Monterey, California, USA, August 6-10, 2018, pp. 23–32, 2018. @inproceedings{DBLP:conf/uai/GregorovaKM18, title = {Structured nonlinear variable selection}, author = {Magda Gregorová and Alexandros Kalousis and Stéphane Marchand-Maillet}, url = {http://auai.org/uai2018/proceedings/papers/17.pdf}, year = {2018}, date = {2018-01-01}, booktitle = {Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018, Monterey, California, USA, August 6-10, 2018}, pages = {23--32}, crossref = {DBLP:conf/uai/2018}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } |
2017 |
Gregorová, Magda; Kalousis, Alexandros; Marchand-Maillet, Stéphane Learning Predictive Leading Indicators for Forecasting Time Series Systems with Unknown Clusters of Forecast Tasks Inproceedings Proceedings of The 9th Asian Conference on Machine Learning, ACML 2017, Seoul, Korea, November 15-17, 2017., pp. 161–176, 2017. @inproceedings{DBLP:conf/acml/GregorovaKM17, title = {Learning Predictive Leading Indicators for Forecasting Time Series Systems with Unknown Clusters of Forecast Tasks}, author = {Magda Gregorová and Alexandros Kalousis and Stéphane Marchand-Maillet}, url = {http://proceedings.mlr.press/v77/gregorova17a/gregorova17a.pdf http://proceedings.mlr.press/v77/gregorova17a/gregorova17a-supp.pdf https://bitbucket.org/dmmlgeneva/var-leading-indicators}, year = {2017}, date = {2017-01-01}, booktitle = {Proceedings of The 9th Asian Conference on Machine Learning, ACML 2017, Seoul, Korea, November 15-17, 2017.}, pages = {161--176}, crossref = {DBLP:conf/acml/2017}, abstract = {We present a new method for forecasting systems of multiple interrelated time series. The method learns the forecast models together with discovering leading indicators from within the system that serve as good predictors improving the forecast accuracy and a cluster structure of the predictive tasks around these. The method is based on the classical linear vector autoregressive model (VAR) and links the discovery of the leading indicators to inferring sparse graphs of Granger causality. We formulate a new constrained optimisation problem to promote the desired sparse structures across the models and the sharing of information amongst the learning tasks in a multi-task manner. We propose an algorithm for solving the problem and document on a battery of synthetic and real-data experiments the advantages of our new method over baseline VAR models as well as the state-of-the-art sparse VAR learning methods. }, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We present a new method for forecasting systems of multiple interrelated time series. The method learns the forecast models together with discovering leading indicators from within the system that serve as good predictors improving the forecast accuracy and a cluster structure of the predictive tasks around these. The method is based on the classical linear vector autoregressive model (VAR) and links the discovery of the leading indicators to inferring sparse graphs of Granger causality. We formulate a new constrained optimisation problem to promote the desired sparse structures across the models and the sharing of information amongst the learning tasks in a multi-task manner. We propose an algorithm for solving the problem and document on a battery of synthetic and real-data experiments the advantages of our new method over baseline VAR models as well as the state-of-the-art sparse VAR learning methods. |
Gregorová, Magda; Kalousis, Alexandros; Marchand-Maillet, Stéphane Forecasting and Granger Modelling with Non-linear Dynamical Dependencies Inproceedings Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2017, Skopje, Macedonia, September 18-22, 2017, Proceedings, Part II, pp. 544–558, 2017. @inproceedings{DBLP:conf/pkdd/GregorovaKM17, title = {Forecasting and Granger Modelling with Non-linear Dynamical Dependencies}, author = {Magda Gregorová and Alexandros Kalousis and Stéphane Marchand-Maillet}, url = {https://hesso.tind.io/record/2097/files/Gregorova_Kalousis_2017_forecasting_and_granger.pdf https://bitbucket.org/dmmlgeneva/nonlinear-granger}, doi = {10.1007/978-3-319-71246-8_33}, year = {2017}, date = {2017-01-01}, booktitle = {Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2017, Skopje, Macedonia, September 18-22, 2017, Proceedings, Part II}, pages = {544--558}, crossref = {DBLP:conf/pkdd/2017-2}, abstract = {Traditional linear methods for forecasting multivariate time series are not able to satisfactorily model the non-linear dependencies that may exist in non-Gaussian series. We build on the theory of learning vector-valued functions in the reproducing kernel Hilbert space and develop a method for learning prediction functions that accommodate such non-linearities. The method not only learns the predictive function but also the matrix-valued kernel underlying the function search space directly from the data. Our approach is based on learning multiple matrix-valued kernels, each of those composed of a set of input kernels and a set of output kernels learned in the cone of positive semi-definite matrices. In addition to superior predictive performance in the presence of strong non-linearities, our method also recovers the hidden dynamic relationships between the series and thus is a new alternative to existing graphical Granger techniques. }, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Traditional linear methods for forecasting multivariate time series are not able to satisfactorily model the non-linear dependencies that may exist in non-Gaussian series. We build on the theory of learning vector-valued functions in the reproducing kernel Hilbert space and develop a method for learning prediction functions that accommodate such non-linearities. The method not only learns the predictive function but also the matrix-valued kernel underlying the function search space directly from the data. Our approach is based on learning multiple matrix-valued kernels, each of those composed of a set of input kernels and a set of output kernels learned in the cone of positive semi-definite matrices. In addition to superior predictive performance in the presence of strong non-linearities, our method also recovers the hidden dynamic relationships between the series and thus is a new alternative to existing graphical Granger techniques. |
Ramapuram, Jason; Gregorova, Magda; Kalousis, Alexandros Lifelong Generative Modeling Journal Article Elsevier Neurocomputing 2020, abs/1705.09847 , 2017. @article{DBLP:journals/corr/RamapuramGK17, title = {Lifelong Generative Modeling}, author = {Jason Ramapuram and Magda Gregorova and Alexandros Kalousis}, url = {http://arxiv.org/abs/1705.09847}, year = {2017}, date = {2017-01-01}, journal = {Elsevier Neurocomputing 2020}, volume = {abs/1705.09847}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
2015 |
Gregorova, Magda; Kalousis, Alexandros; Dinuzzo, Francesco Functional learning of time-series models preserving Granger-causality structures Inproceedings Proceedings of the Time Series Workshop of the 29th Neural Information Processing Systems conference, NIPS-2015, 11th December 2015 2015. @inproceedings{gregorova2015functional, title = {Functional learning of time-series models preserving Granger-causality structures}, author = {Magda Gregorova and Alexandros Kalousis and Francesco Dinuzzo}, url = {http://hesso.tind.io/record/1650}, year = {2015}, date = {2015-01-01}, booktitle = {Proceedings of the Time Series Workshop of the 29th Neural Information Processing Systems conference, NIPS-2015}, number = {CONFERENCE}, organization = {11th December 2015}, abstract = {We develop a functional learning approach to modelling systems of time series which preserves the ability of standard linear time-series models (VARs) to uncover the Granger-causality links in between the series of the system while allowing for richer functional relationships. We propose a framework for learning multiple output-kernels associated with multiple input-kernels over a structured input space and outline an algorithm for simultaneous learning of the kernels with the model parameters with various forms of regularization including non-smooth sparsity inducing norms. We present results of synthetic experiments illustrating the benefits of the described approach.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We develop a functional learning approach to modelling systems of time series which preserves the ability of standard linear time-series models (VARs) to uncover the Granger-causality links in between the series of the system while allowing for richer functional relationships. We propose a framework for learning multiple output-kernels associated with multiple input-kernels over a structured input space and outline an algorithm for simultaneous learning of the kernels with the model parameters with various forms of regularization including non-smooth sparsity inducing norms. We present results of synthetic experiments illustrating the benefits of the described approach. |
Gregorova, Magda; Kalousis, Alexandros; Marchand-Maillet, Stéphane Learning coherent Granger-causality in panel vector autoregressive models Inproceedings Proceedings of the Demand Forecasting Workshop of the 32nd International Conference on Machine Learning, ICML 2015. @inproceedings{gregorova2015learning, title = {Learning coherent Granger-causality in panel vector autoregressive models}, author = {Magda Gregorova and Alexandros Kalousis and Stéphane Marchand-Maillet}, url = {http://hesso.tind.io/record/1077}, year = {2015}, date = {2015-01-01}, booktitle = {Proceedings of the Demand Forecasting Workshop of the 32nd International Conference on Machine Learning}, number = {CONFERENCE}, organization = {ICML}, abstract = {We consider the problem of forecasting multiple time series across multiple cross-sections based solely on the past observations of the series. We propose to use panel vector autoregressive model to capture the inter-dependencies on the past values of the multiple series. We restrict the panel vector autoregressive model to exclude the cross-sectional relationships and propose a method to learn models with sparse Granger-causality structures coherent across the panel sections. The method extends the concepts of group variable selection and support union recovery into the panel setting by extending the group lasso penalty (Yuan & Lin, 2006) into matrix output regression setting with 3d-tensor of model parameters.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } We consider the problem of forecasting multiple time series across multiple cross-sections based solely on the past observations of the series. We propose to use panel vector autoregressive model to capture the inter-dependencies on the past values of the multiple series. We restrict the panel vector autoregressive model to exclude the cross-sectional relationships and propose a method to learn models with sparse Granger-causality structures coherent across the panel sections. The method extends the concepts of group variable selection and support union recovery into the panel setting by extending the group lasso penalty (Yuan & Lin, 2006) into matrix output regression setting with 3d-tensor of model parameters. |