Biometric twin modeling using twinflex

3 minute read


Hi! Thanks you stopped by! This is the first post of my blog series about twinflex, a R function I wrote to make biometric twin modeling easier. While you can find a shorter description of the function here, the purpose of the blog series is to dig deeper into the statistics behind the function in order to let you know what is actually going on “under the hood”. Everything you can do with twinflex, you can do it with OpenMx as well, which is a very flexible package for Structural Equation Modeling (SEM) in R. The reason for that is that twinflex is a simple wrapper for OpenMx. So why twinflex you may ask?

The basic idea of SEM

The basic idea of every SEM model is very simple: You estimate the expected covariance matrix as a function of the model parameters:


Here, f2 (sigma) is the population covariance matrix, while f3 (theta) is a vector containing the model parameters and f4 is the covariance matrix as a function of the model parameters.

The inclusion of the means vector in SEM is optional, but e.g. if you handle missing data with the Full Information Maximum Likelihood Estimator (FIML) you have to include the mean vector into the model. But here, the same idea applies: Estimate the expected means vector as a function of the model parameters:


Here, the idea behing the formula is the same as with the covariance matrix, only that f6 (mu) refers to the means vector. Well, you might think now, but what is the exact function that links the parameters with the covariances or the means?

The RAM Notation: General formulas

There are several options: You can use the path rules developed by Wright, you can use covariance algebra or you can use a matrix notation system developed in the SEM framework like LISREL or RAM (short for Reticular Action Model, see McArdle 2005, McArdle and Boker 1990, McArdle and McDonald 1984 and Cheung 2015 or this video). twinflex uses the RAM notation, which defines the function for the expected covariances as:


and the function for the expected means as:


So, there are four matrices that are the determine the elements of the expected matrices: F, I, A, S and M. What’s the point with these matrices? Before we discuss the matrices in detail, we have to make some assumptions regarding the notation: In the following, m refers to the number of manifest variables in the model, while l refers to the number of latent variables in the model. Finally, t is the number of variables in total that are part of the model, i.e. t=m+l. So let’s start with the matrices.

The RAM Notation: Matrices

F is a “filter” matrix (f9) that makes the computer know which of the matrices are manifest and which of them are latent. The number of rows of the F matrix equals the number of manifest variables, while the number of columns equal the number of the t variables in total. The purpose of the filter matrix is to separate the manifest variables from the latent variables, since only the manifest variables are part of the covariance matrix.

You can have many headings

Aren’t headings cool?