### UAI 2010 Tutorials

Sanjoy Dasgupta
**Linear-algebraic methods for learning latent variable models**
Abstract

Manfred Jaeger
**Learning and Reasoning With Incomplete Data: Foundations and Algorithms**
Abstract Slides.

Shohei Shimizu and Yoshinobu Kawahara
**Non-Gaussian methods for learning linear structural equation models**
Abstract Slides.

#### Abstracts

Sanjoy Dasgupta **Linear-algebraic methods for learning latent variable models**

The past decade has seen steady and significant progress in efficient algorithms for learning mixtures of Gaussians, hidden Markov models, and various other probabilistic models with latent variables. In contrast with typical local search methods (eg, EM), these algorithms have strong guarantees about recovering (globally) good models.

The common theme of all this work is that it heavily exploits linear projections: random projections, principal component analysis, and canonical correlation analysis. In this tutorial, I will give a broad overview of these results: algorithms, theory, experimental findings, and open problems. No particular background is required, other than a familiarity with common latent variable models.

Manfred Jaeger **Learning and Reasoning With Incomplete Data: Foundations and Algorithms**

The EM algorithm is one of the cornerstones of machine learning. It is well-known that the use of this algorithm needs to be justified by the Missing at Random (MAR) assumption. But what exactly is the MAR assumption, how do we find out whether it can realistically be made, and what should we do if we can't? In this tutorial I will give an overview of recent results that have improved our understanding of these issues. The tutorial consists of three parts:

Part 1 introduces the missing at random assumption, and the crucial role it plays in learning from incomplete data, and in making probabilistic inferences by conditioning. Using the more general concept of Coarsened at Random (CAR) it is shown that this modeling assumption comes in subtly different versions, and that the different versions can have significantly different implications for statistical learning and probabilistic reasoning.

Part 2 is concerned with the question of whether (and how) one can construct canonical generative models for data coarsening processes that satisfy the CAR assumption. Such models are interesting, because they can help to decide whether the CAR (or MAR) assumption can realistically be made for a given dataset at hand by investigating whether the process that caused the current data to be incomplete can be matched against one of the canonical models. Recent results have provided a fairly comprehensive picture of the existence and nature of canonical CAR models.

Part 3 is devoted to alternative learning methods when the MAR/CAR assumption appears unreasonable. It is shown that an algorithm similar to the EM algorithm (and in special cases equivalent to the EM algorithm) can be used for incomplete data that is not CAR. An implementation of this algorithm (christened AI&M by the speaker, but sometimes also less fancifully called Alternating Kullback- Leibler Minimisation) for parameter learning in Bayesian networks shows that for non-CAR data better accuracy is obtained than with the EM algorithm. Furthermore, likelihood ratios of AI&M and EM computed parameters can be used for statistical tests of the CAR assumption relative to a given parametric model.

Shohei Shimizu and Yoshinobu Kawahara **Non-Gaussian methods for learning linear structural equation models**

Linear structural equation models (Linear SEM) are widely applied in many empirical studies including social sciences, neuroinformatics and bioinformatics. Estimation of linear SEMs for continuous variables typically uses covariance structure of data alone and poses serious identifiability problems so that many important models including path analysis models are indistinguishable with no prior knowledge on the structures. A linear acyclic model which is a special case of path analysis models is typically used to learn data generating processes. Covariance information alone is not sufficient to uniquely estimate such a linear acyclic model and in most cases cannot identify the full structure (path coefficients and path directions) of the model. Bentler (1983, Psychometrika) proposed that non-Gaussian structures of data could be useful to overcome such identifiability problems of covariance-based estimation of SEMs. Recently it was shown that use of non-Gaussianity allows the full structure of a linear acyclic model to be identified without pre-specifying any path directions between the variables (Shimizu et al., 2006, JMLR). The new method is based on a fairly recent statistical technique called independent component analysis (Hyvarinen et al., 2001, Wiley). The non-Gaussian framework has been extended in various directions for learning wider variety of SEMs. In this tutorial, we will first present an overview of non-Gaussian methods for learning SEMs and then go to some recent advances.