Simulate sample covariances or datasets — createS • rags2ridges

Simulate data from a p-dimensional (zero-mean) gaussian graphical model (GGM) with a specified (or random) topology and return the sample covariance matrix or matrices. Can also return the original simulated data or underlying precision matrix.

createS(
  n,
  p,
  topology = "identity",
  dataset = FALSE,
  precision = FALSE,
  nonzero = 0.25,
  m = 1L,
  banded.n = 2L,
  invwishart = FALSE,
  nu = p + 1,
  Plist
)

Arguments

n: A numeric vector giving number of samples. If the length is larger than 1, the covariance matrices are returned as a list.
p: A numeric of length 1 giving the dimension of the samples/covariance.
topology: character. The topology to use for the simulations. See the details.
dataset: A logical value specifying whether the sample covariance or the simulated data itself should be returned.
precision: A logical value. If TRUE the constructed precision matrix is returned.
nonzero: A numeric of length 1 giving the value of the nonzero entries used in some topologies.
m: A integer giving the number of blocks (i.e. conditionally independent components) to create. If m is greater than 1, then the given topology is used on m blocks of approximately equal size.
banded.n: A integer of length one giving the number of bands. Only used if topology is one of "banded", "small-world", or "Watts-Strogatz".
invwishart: logical. If TRUE the constructed precision matrix is used as the scale matrix of an inverse Wishart distribution and class covariance matrices are drawn from this distribution.
nu: numeric greater than p + 1 giving the degrees of freedom in the inverse Wishart distribution. A large nu implies high class homogeneity. A small nu near p + 1 implies high class heterogeneity.
Plist: An optional list of numeric matrices giving the precision matrices to simulate from. Useful when random matrices have already been generated by setting precision = TRUE.

Value

The returned type is dependent on n and covariance. The function generally returns a list of numeric matrices with the same length as n. If covariance is FALSE the simulated datasets with size n[i] by p are given in the i entry of the output. If covariance is TRUE the p by p sample covariances of the datasets are given. When n has length 1 the list structure is dropped and the matrix is returned.

Details

The data is simulated from a zero-mean p-dimensional multivariate gaussian distribution with some precision matrix determined by the argument topology which defines the GGM. If precision is TRUE the population precision matrix is returned. This is useful to see what the actual would-be-used precision matrices are. The available values of topology are described below. Unless otherwise stated the diagonal entries are always one. If m is 2 or greater block diagonal precision matrices are constructed and used.

"identity": uses the identity matrix (diag(p)) as precision matrix. Corresponds to no conditional dependencies.
"star": simulate from a star topology. Within each block the first node is selected as the "hub". The off-diagonal entries \((1,j)\) and \((j,1)\) values taper off with the value \(1/(j + 1)\).
"clique": simulate from clique topology where each block is a complete graph with off-diagonal elements equal to nonzero.
"complete": alias for (and identical to) "clique".
"chain": simulate from a chain topology where the precision matrix is a tridiagonal matrix with off-diagonal elements (in each block) given by argument nonzero.
"banded": precision elements (i,j) are given by \(1/(|i-j|+1)\) if \(|i-j|\) is less than or equal to banded.n and zero otherwise.
"scale-free": The non-zero pattern of each block is generated by a Barabassi random graph. Non-zero off-diagonal values are given by nonzero. Gives are very "hubby" network.
"Barabassi": alias for "scale-free".
"small-world": The non-zero pattern of each block is generated by a 1-dimensional Watts-Strogatz random graph with banded.n starting neighbors and \(5\%\) probability of rewiring. Non-zero off-diagonal values are given by nonzero. Gives are very "bandy" network.
"Watts-Strogatz": alias for "small-world"
"random-graph": The non-zero pattern of each block is generated by a Erdos-Renyi random graph where each edge is present with probability \(1/p\). Non-zero off-diagonal values are given by nonzero.
"Erdos-Renyi": alias for "random-graph"

When n has length greater than 1, the datasets are generated i.i.d. given the topology and number of blocks.

Arguments invwishart and nu allows for introducing class homogeneity. Large values of nu imply high class homogeneity. nu must be greater than p + 1. More precisely, if invwishart == TRUE then the constructed precision matrix is used as the scale parameter in an inverse Wishart distribution with nu degrees of freedom. Each class covariance is distributed according to this inverse Wishart and independent.

Author

Anders E. Bilgrau, Carel F.W. Peeters <carel.peeters@wur.nl>, Wessel N. van Wieringen

Examples

## Generate some simple sample covariance matrices
createS(n = 10, p = 3)
#>              A            B          C
#> A  0.714338666 -0.008603045 -0.4495777
#> B -0.008603045  0.848783685 -0.4070938
#> C -0.449577686 -0.407093834  1.6648221
createS(n = c(3, 4, 5), p = 3)
#> $class1
#>             A           B           C
#> A  0.23541993 -0.06808345 -0.09331714
#> B -0.06808345  0.02188217  0.02633239
#> C -0.09331714  0.02633239  0.03718524
#> 
#> $class2
#>           A           B           C
#> A 0.4048721  0.18554770  0.15656767
#> B 0.1855477  0.11035641 -0.03908228
#> C 0.1565677 -0.03908228  0.59124579
#> 
#> $class3
#>            A          B           C
#> A  0.9965404 0.26490592 -0.18057305
#> B  0.2649059 0.43924828  0.03874123
#> C -0.1805730 0.03874123  1.94661771
#> 
createS(n = c(32, 55), p = 7)
#> $class1
#>             A           B           C           D           E           F
#> A  0.63178208  0.06467505  0.19670069  0.06289789  0.04188822 -0.19040730
#> B  0.06467505  0.93135667  0.08449495  0.02381718 -0.22198857 -0.02349797
#> C  0.19670069  0.08449495  0.93995769 -0.34139993  0.14346997  0.33424291
#> D  0.06289789  0.02381718 -0.34139993  0.73176859 -0.17331770 -0.21105145
#> E  0.04188822 -0.22198857  0.14346997 -0.17331770  1.36921587  0.11591828
#> F -0.19040730 -0.02349797  0.33424291 -0.21105145  0.11591828  0.81224374
#> G -0.01227575 -0.03793069  0.10206489  0.18693117 -0.20170543  0.02724070
#>             G
#> A -0.01227575
#> B -0.03793069
#> C  0.10206489
#> D  0.18693117
#> E -0.20170543
#> F  0.02724070
#> G  0.56993517
#> 
#> $class2
#>             A           B           C           D           E           F
#> A  1.04034367 -0.23721483 -0.16969116  0.04907348 -0.07136711 -0.11950913
#> B -0.23721483  0.96099733 -0.13451093 -0.14600484 -0.07506789  0.18548682
#> C -0.16969116 -0.13451093  1.12555789 -0.18544152 -0.05878663  0.11898834
#> D  0.04907348 -0.14600484 -0.18544152  0.93703353  0.11422203 -0.11126085
#> E -0.07136711 -0.07506789 -0.05878663  0.11422203  0.82855277  0.02398115
#> F -0.11950913  0.18548682  0.11898834 -0.11126085  0.02398115  0.99802168
#> G -0.06728589  0.11392344  0.13948464  0.17997138  0.04343981 -0.28291270
#>             G
#> A -0.06728589
#> B  0.11392344
#> C  0.13948464
#> D  0.17997138
#> E  0.04343981
#> F -0.28291270
#> G  1.00373644
#> 

## Generate some datasets and not sample covariance matrices
createS(c(3, 4), p = 6, dataset = TRUE)
#> $class1
#>                A          B          C          D          E          F
#> [1,]  0.90493393  0.5602515 1.79739015  1.1985082  0.3703148 -1.0219351
#> [2,] -0.06181858 -0.8432687 0.60311808  0.1489683 -1.9410410  0.5510601
#> [3,] -0.81701756 -0.1890840 0.05481809 -0.2684883 -1.8716184 -1.1819590
#> 
#> $class2
#>               A          B          C          D           E           F
#> [1,] -1.9289744 -1.1150966  0.5732693 -0.9779805  0.93539734 -2.52598044
#> [2,]  1.9132912 -1.6585732 -1.9060648 -1.5551617 -0.47828835  0.77125979
#> [3,] -0.3644607 -0.3126735 -0.4776542  0.6235130 -0.35469283 -0.23236273
#> [4,] -1.7555148  2.8456476  0.6439718 -0.2382009 -0.06902261  0.02795027
#> 

## Generate sample covariance matrices from other topologies:
A <- createS(2000, p = 4, topology = "star")
round(solve(A), 3)
#>       A     B     C     D
#> A 0.958 0.464 0.334 0.270
#> B 0.464 0.996 0.012 0.017
#> C 0.334 0.012 0.955 0.016
#> D 0.270 0.017 0.016 0.988
B <- createS(2000, p = 4, topology = "banded", banded.n = 2)
round(solve(B), 3)
#>       A     B     C     D
#> A 0.993 0.492 0.344 0.036
#> B 0.492 0.990 0.482 0.323
#> C 0.344 0.482 1.024 0.541
#> D 0.036 0.323 0.541 1.024
C <- createS(2000, p = 4, topology = "clique")  # The complete graph (as m = 1)
round(solve(C), 3)
#>       A     B     C     D
#> A 1.009 0.265 0.252 0.235
#> B 0.265 1.015 0.251 0.234
#> C 0.252 0.251 0.996 0.295
#> D 0.235 0.234 0.295 0.975
D <- createS(2000, p = 4, topology = "chain")
round(solve(D), 3)
#>        A     B      C     D
#> A  1.042 0.261 -0.005 0.029
#> B  0.261 0.963  0.255 0.044
#> C -0.005 0.255  1.002 0.253
#> D  0.029 0.044  0.253 0.983

## Generate smaple covariance matrices from block topologies:
C3 <- createS(2000, p = 10, topology = "clique", m = 3)
round(solve(C3), 1)
#>     A   B   C   D   E   F   G   H   I   J
#> A 1.0 0.3 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0
#> B 0.3 1.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0
#> C 0.3 0.3 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
#> D 0.0 0.0 0.0 1.0 0.3 0.3 0.0 0.0 0.0 0.0
#> E 0.0 0.0 0.0 0.3 1.0 0.2 0.0 0.0 0.0 0.0
#> F 0.0 0.0 0.0 0.3 0.2 1.0 0.0 0.0 0.0 0.0
#> G 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.3 0.2 0.3
#> H 0.0 0.0 0.0 0.0 0.0 0.0 0.3 1.0 0.2 0.3
#> I 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.2 1.0 0.3
#> J 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.3 0.3 1.0
C5 <- createS(2000, p = 10, topology = "clique", m = 5)
round(solve(C5), 1)
#>     A   B   C    D   E   F   G    H   I   J
#> A 1.0 0.3 0.0  0.0 0.0 0.0 0.0  0.0 0.0 0.0
#> B 0.3 1.1 0.0  0.0 0.0 0.0 0.0  0.0 0.0 0.0
#> C 0.0 0.0 1.0  0.2 0.0 0.0 0.0  0.0 0.0 0.0
#> D 0.0 0.0 0.2  1.0 0.0 0.0 0.0 -0.1 0.0 0.0
#> E 0.0 0.0 0.0  0.0 1.0 0.2 0.0  0.0 0.0 0.0
#> F 0.0 0.0 0.0  0.0 0.2 1.0 0.0  0.0 0.0 0.0
#> G 0.0 0.0 0.0  0.0 0.0 0.0 1.0  0.3 0.0 0.0
#> H 0.0 0.0 0.0 -0.1 0.0 0.0 0.3  1.0 0.0 0.0
#> I 0.0 0.0 0.0  0.0 0.0 0.0 0.0  0.0 1.0 0.2
#> J 0.0 0.0 0.0  0.0 0.0 0.0 0.0  0.0 0.2 1.0

## Can also return the precision matrix to see what happens
## m = 2 blocks, each "banded" with 4 off-diagonal bands
round(createS(1, 12, "banded", m = 2, banded.n = 4, precision = TRUE), 2)
#>      A    B    C    D    E    F    G    H    I    J    K    L
#> A 1.00 0.50 0.33 0.25 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00
#> B 0.50 1.00 0.50 0.33 0.25 0.20 0.00 0.00 0.00 0.00 0.00 0.00
#> C 0.33 0.50 1.00 0.50 0.33 0.25 0.00 0.00 0.00 0.00 0.00 0.00
#> D 0.25 0.33 0.50 1.00 0.50 0.33 0.00 0.00 0.00 0.00 0.00 0.00
#> E 0.20 0.25 0.33 0.50 1.00 0.50 0.00 0.00 0.00 0.00 0.00 0.00
#> F 0.00 0.20 0.25 0.33 0.50 1.00 0.00 0.00 0.00 0.00 0.00 0.00
#> G 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.50 0.33 0.25 0.20 0.00
#> H 0.00 0.00 0.00 0.00 0.00 0.00 0.50 1.00 0.50 0.33 0.25 0.20
#> I 0.00 0.00 0.00 0.00 0.00 0.00 0.33 0.50 1.00 0.50 0.33 0.25
#> J 0.00 0.00 0.00 0.00 0.00 0.00 0.25 0.33 0.50 1.00 0.50 0.33
#> K 0.00 0.00 0.00 0.00 0.00 0.00 0.20 0.25 0.33 0.50 1.00 0.50
#> L 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.20 0.25 0.33 0.50 1.00

## Simulation using graph-games
round(createS(1, 10, "small-world", precision = TRUE), 2)
#>      A    B    C    D    E    F    G    H    I    J
#> A 1.00 0.25 0.25 0.00 0.00 0.00 0.00 0.00 0.25 0.25
#> B 0.25 1.00 0.25 0.25 0.00 0.00 0.00 0.00 0.00 0.25
#> C 0.25 0.25 1.00 0.25 0.25 0.00 0.00 0.00 0.00 0.00
#> D 0.00 0.25 0.25 1.00 0.25 0.25 0.00 0.00 0.00 0.00
#> E 0.00 0.00 0.25 0.25 1.00 0.25 0.25 0.00 0.00 0.00
#> F 0.00 0.00 0.00 0.25 0.25 1.00 0.25 0.25 0.00 0.00
#> G 0.00 0.00 0.00 0.00 0.25 0.25 1.00 0.25 0.25 0.00
#> H 0.00 0.00 0.00 0.00 0.00 0.25 0.25 1.00 0.25 0.25
#> I 0.25 0.00 0.00 0.00 0.00 0.00 0.25 0.25 1.00 0.25
#> J 0.25 0.25 0.00 0.00 0.00 0.00 0.00 0.25 0.25 1.00
round(createS(1, 5, "scale-free", precision = TRUE), 2)
#>      A    B    C    D    E
#> A 1.00 0.25 0.25 0.00 0.00
#> B 0.25 1.00 0.00 0.00 0.25
#> C 0.25 0.00 1.00 0.25 0.00
#> D 0.00 0.00 0.25 1.00 0.00
#> E 0.00 0.25 0.00 0.00 1.00
round(createS(1, 5, "random-graph", precision = TRUE), 2)
#>      A B    C    D    E
#> A 1.00 0 0.00 0.25 0.00
#> B 0.00 1 0.00 0.00 0.00
#> C 0.00 0 1.00 0.00 0.25
#> D 0.25 0 0.00 1.00 0.00
#> E 0.00 0 0.25 0.00 1.00

## Simulation using inverse Wishart distributed class covariance
## Low class homogeneity
createS(n = c(10,10), p = 5, "banded", invwishart = TRUE, nu = 10)
#> $class1
#>            A           B            C           D            E
#> A  0.3407703 -0.15905885 -0.252470276  0.28600211 -0.468624961
#> B -0.1590589  0.73069749 -0.348215530  0.06845775  1.042960473
#> C -0.2524703 -0.34821553  0.942682156 -0.61086235 -0.007764689
#> D  0.2860021  0.06845775 -0.610862347  1.10894996 -0.887920766
#> E -0.4686250  1.04296047 -0.007764689 -0.88792077  2.846779236
#> 
#> $class2
#>            A          B          C          D          E
#> A  0.8862115  0.3589084 -0.1860808  0.4253942  0.4074453
#> B  0.3589084  0.9585110 -0.1128333 -0.1451391  0.2661174
#> C -0.1860808 -0.1128333  0.5340195 -0.2696945 -0.5983405
#> D  0.4253942 -0.1451391 -0.2696945  0.7112546  0.4634702
#> E  0.4074453  0.2661174 -0.5983405  0.4634702  0.9329733
#> 
## Extremely high class homogeneity
createS(n = c(10,10), p = 5, "banded", invwishart = TRUE, nu = 1e10)
#> $class1
#>             A          B           C          D           E
#> A  0.95869737 -0.3809540  0.02415766  0.1725129 -0.54093975
#> B -0.38095403  1.3124689  0.29900232 -1.2067459  0.57232835
#> C  0.02415766  0.2990023  2.04928194 -0.3400434 -0.07280502
#> D  0.17251292 -1.2067459 -0.34004344  2.1347086 -1.04191852
#> E -0.54093975  0.5723283 -0.07280502 -1.0419185  1.57782754
#> 
#> $class2
#>             A           B          C            D            E
#> A  0.53744832 -0.03487138  0.1114366  0.256137126  0.118536460
#> B -0.03487138  1.50640587 -1.2125845 -0.432946266  0.775990246
#> C  0.11143658 -1.21258451  2.5358652 -0.116654635 -1.278033424
#> D  0.25613713 -0.43294627 -0.1166546  0.633726286  0.001914981
#> E  0.11853646  0.77599025 -1.2780334  0.001914981  1.631697726
#> 

# The precision argument can again be used to see the actual realised class
# precision matrices used when invwishart = TRUE.

# The Plist argument is used to reuse old precision matrices or
# user-generated ones
P <- createS(n = 1, p = 5, "banded", precision = TRUE)
lapply(createS(n = c(1e5, 1e5), p = 5, Plist = list(P, P+1)), solve)
#> $class1
#>              A            B         C            D            E
#> A  1.003376159  0.503060389 0.3339784 -0.002151475 -0.001472794
#> B  0.503060389  0.994602675 0.4989885  0.323371776 -0.002491459
#> C  0.333978381  0.498988513 1.0010289  0.496668132  0.333095351
#> D -0.002151475  0.323371776 0.4966681  0.995563778  0.502545053
#> E -0.001472794 -0.002491459 0.3330954  0.502545053  1.003209189
#> 
#> $class2
#>          A        B        C        D        E
#> A 1.995161 1.495985 1.323942 1.000188 1.002387
#> B 1.495985 1.996956 1.490973 1.334418 0.997494
#> C 1.323942 1.490973 1.982567 1.495479 1.329760
#> D 1.000188 1.334418 1.495479 2.005061 1.500588
#> E 1.002387 0.997494 1.329760 1.500588 2.000471
#>