Research Statement


In general, my field of research can be characterized broadly as multivariate statistical machine learning (I'm a multivariatician). In the past I focussed mostly on the setting in which the number of observations (n) exceeded the number of features (p). Today, high-throughput technology and data-stream mining algorithms produce data characterized by small samples and massive numbers of features. I thus focus on developing methods for the analysis of such high-dimensional data for which p > n. Especially methods that are able to automatically infer a possible data-generating mechanism. In doing so, I try to balance theory and computation for effective application. My approach is non-dogmatic, in the sense that I might rely on Bayesianism, Frequentism, or a hybrid, depending on the nature of the problem at hand. Specific research themes and programmes are listed below in order of activity.

Research Focus


High-Dimensional Multivariate Statistics & Statistical Machine Learning: This theme focusses on the connections between precision matrices, conditional independence properties of (directed) networks, and the automated learning of (data-generating) model structure from complex p > n data. One strand uses the support of an ℓ2 (ridge) regularized precision matrix to infer the topology of undirected networks. The second strand combines the preceding strand with previous work in latent variable modeling to learn the topology of large-scale (latent variable) path diagrams. Both approaches may be used to extract mechanistic information from high-dimensional data of mixed origin or to support prediction and classification exercises.
Programmes:
▬  High-Dimensional Structured Precision Modeling & Psychometric Learning Theory
▬  High-Dimensional Covariance and Precision Matrix Estimation & Markov Random Fields

Low-Dimensional Multivariate Statistics & Factor Analysis: This theme focusses on Bayesian constrained-model selection in the n > p setting. There are two main foci, both placed in the perspective of functional constraints, being: (I) variable selection, which is viewed in terms of selection of the dimension of a model or the selection of exclusion constraints in the design matrix; and (II) the selection of appropriate truncations of the parameter space, which essentially entails placement of (approximate) inequality constraints on the model parameters of interest. I especially consider the factor analytic model as it provides multiple challenges due to its many indeterminacies.
Programme:
▬  Specificity in Factor Analytic Modeling

(Sociometrics of) Scientific & Public Integrity: This theme focusses on assessing (non-)compliance with relevant (codified) norms and rules in the public and scientific spheres. For the public sphere emphasis is placed on the development and application of survey methodology for probing behaviors which are deemed sensitive. The methodology intends to elicit more thruthful responses when evasive response behavior is expected in standard survey settings. For the scientific sphere the emphasis lies with case studies evaluating the veracity of reported data.
Programmes:
▬  Surveying Sensitive Topics
▬  Evaluating Evidence for Low Data Veracity from Reported Summary Measures