Research Statement
In general, my field of research can be characterized broadly as multivariate statistical machine learning (I'm a multivariatician). In the past I focussed mostly on the setting in which the number of observations (n) exceeded the number of features (p). Today, high-throughput technology and data-stream mining algorithms produce data characterized by small samples and massive numbers of features. I thus focus on developing methods for the analysis of such high-dimensional data for which p > n. Especially methods that are able to automatically infer a possible data-generating mechanism. In doing so, I try to balance theory and computation for effective application. My approach is non-dogmatic, in the sense that I might rely on Bayesianism, Frequentism, or a hybrid, depending on the nature of the problem at hand. Specific research themes and programmes are listed below in order of activity.
Research Focus
High-Dimensional Multivariate Statistics &
Statistical Machine Learning:
This theme focusses on the connections between precision matrices,
conditional independence properties of (directed) networks, and the
automated learning of (data-generating) model structure
from complex p > n data.
One strand uses the support of an ℓ2 (ridge) regularized precision
matrix to infer the topology of undirected networks.
The second strand combines the preceding strand with previous work in
latent variable modeling to learn the topology of large-scale (latent
variable) path diagrams.
Both approaches may be used to extract mechanistic information from
high-dimensional data of mixed origin or to support prediction and
classification exercises.
Programmes:
▬ High-Dimensional Structured Precision Modeling &
Psychometric Learning Theory
▬ High-Dimensional Covariance and
Precision Matrix Estimation & Markov Random Fields
Low-Dimensional Multivariate Statistics & Factor Analysis:
This theme focusses on Bayesian constrained-model selection in the
n > p setting. There are two main foci,
both placed in the perspective of functional constraints, being: (I)
variable selection, which is viewed in terms of selection of the
dimension of a model or the selection of exclusion constraints
in the design matrix; and (II) the selection of appropriate
truncations of the parameter space, which essentially entails placement
of (approximate) inequality constraints on the model parameters
of interest.
I especially consider the factor analytic model
as it provides multiple challenges due to its many indeterminacies.
Programme:
▬ Specificity in Factor Analytic Modeling
(Sociometrics of) Scientific & Public Integrity:
This theme focusses on assessing (non-)compliance with relevant
(codified)
norms and rules in the public and scientific spheres.
For the public sphere emphasis is placed on the development and
application of survey methodology for
probing behaviors which are deemed sensitive.
The methodology intends to elicit more thruthful
responses when evasive response behavior is expected in
standard survey settings.
For the scientific sphere the emphasis lies with case studies evaluating
the veracity of reported data.
Programmes:
▬ Surveying Sensitive Topics
▬ Evaluating Evidence for Low Data Veracity
from Reported Summary Measures