# IDA Machine Learning Seminars - Spring 2018

### Wednesday, February 28, 3.15 pm, 2018

**Conditionally Independent Multiresolution Gaussian Processes**

Jalil Taghia, Dept. of Information Technology, Uppsala University

Jalil Taghia

*Abstract*: Multiresolution Gaussian processes (GPs) based on hierarchical application of predictive processes assume full independence among GPs across resolutions. The full independence assumption results in models which are inherently susceptible to overfitting, and approximations which are non-smooth at the boundaries. Here, we consider a model variant which assumes conditional independence among GPs across resolutions. We characterize each GP using a particular representation of the Karhunen-LoÃ©ve expansion where each basis vector of the representation consists of an axis and a scale factor, referred to as the basis axis and the basis-axis scale. The basis axes have unique characteristics: They are zero-mean by construction and are on the unit sphere. The axes are modeled using Bingham distributions---a natural choice for modeling axial data. Given the axes, all GPs across resolutions are independent---this is in direct contrast to the common assumption of full independence between GPs. More specifically, all GPs are tied to the same set of axes but the basis-axis scales of each GP are specific to the resolution on which they are defined. Relaxing the full independence assumption helps in reducing overfitting which can be of a problem in an otherwise identical model architecture with full independence assumption. We consider a Bayesian treatment of the model using variational inference.

Location: Ada Lovelace (Visionen)

Organizer: Mattias Villani

### Wednesday, March 28, 3.15 pm, 2018

**Data-Driven Text Simplification**

Sanja Štajner, Data and Web Science Group, University of Mannheim

Sanja Štajner

*Abstract*: Syntactically and lexically complex texts and sentences pose difficulties both for humans (especially people with various reading or cognitive impairments, or non-native speakers) and for natural language processing systems (e.g. information extraction, machine translation, summarization, semantic role labeling). In the last 30 years, many systems have been proposed that attempt at automatically simplifying vocabulary and sentence structure of complex sentences. This talk will present the existing resources for data-driven text simplification and the latest data-driven approaches to text simplification, based on the use of word embeddings and neural machine translation architectures. The emphasis will be on comparative evaluation of those systems and discussion about possible avenues to improve them.

Location: Ada Lovelace (Visionen)

Organizer: Arne Jönsson

### Wednesday, April 25, 3.15 pm, 2018

**Invariant Causal Prediction**

Jonas Peters, Dept. of Mathematical Sciences, University of Copenhagen

Jonas Peters

*Abstract*: Why are we interested in the causal structure of a process? In classical prediction tasks as regression, for example, it seems that no causal knowledge is required. In many situations, however, we want to understand how a system reacts under interventions, e.g., in gene knock-out experiments. Here, causal models become important because they are usually considered invariant under those changes. A causal prediction uses only direct causes of the target variable as predictors; it remains valid even if we intervene on predictor variables or change the whole experimental setting. In this talk, we show how we can exploit this invariance principle to estimate causal structure from data. We apply the methodology to data sets from biology, epidemiology, and finance. The talk does not require any knowledge about causal concepts.

Location: Ada Lovelace (Visionen)

Organizer: Jose M Pena

### Wednesday, May 23, 3.15 pm, 2018

**Transformation Forests**

Torsten Hothorn, Epidemiology, Biostatistics and Prevention Institute, University of Zurich

Torsten Hothorn

*Abstract*: Regression models for supervised learning problems with a continuous response are commonly understood as models for the conditional mean of the response given predictors. This notion is simple and therefore appealing for interpretation and visualisation. Information about the whole underlying conditional distribution is, however, not available from these models. A more general understanding of regression models as models for conditional distributions allows much broader inference from such models, for example the computation of prediction intervals. Several random forest-type algorithms aim at estimating conditional distributions, most prominently quantile regression forests (Meinshausen, 2006, JMLR). We propose a novel approach based on a parametric family of distributions characterised by their transformation function. A dedicated novel 'transformation tree' algorithm able to detect distributional changes is developed. Based on these transformation trees, we introduce 'transformation forests' as an adaptive local likelihood estimator of conditional distribution functions. The resulting predictive distributions are fully parametric yet very general and allow inference procedures, such as likelihood-based variable importances, to be applied in a straightforward way. The procedure allows general transformation models to be estimated without the necessity of a priori specifying the dependency structure of parameters. Applications include the computation of probabilistic forecasts, modelling differential treatment effects, or the derivation of counterfactural distributions for all types of response variables.

Technical Report available from arXiv.

Location: Ada Lovelace (Visionen)

Organizer: Oleg Sysoev

Page responsible: Fredrik Lindsten

Last updated: 2018-09-03