Simulate data from a p-dimensional (zero-mean) gaussian graphical model (GGM) with a specified (or random) topology and return the sample covariance matrix or matrices. Can also return the original simulated data or underlying precision matrix.
createS(
n,
p,
topology = "identity",
dataset = FALSE,
precision = FALSE,
nonzero = 0.25,
m = 1L,
banded.n = 2L,
invwishart = FALSE,
nu = p + 1,
Plist
)
A numeric
vector giving number of samples. If the length is
larger than 1, the covariance matrices are returned as a list.
A numeric
of length 1 giving the dimension of the
samples/covariance.
character. The topology to use for the simulations. See the details.
A logical
value specifying whether the sample
covariance or the simulated data itself should be returned.
A logical
value. If TRUE
the constructed
precision matrix is returned.
A numeric
of length 1 giving the value of the nonzero
entries used in some topologies.
A integer
giving the number of blocks (i.e. conditionally
independent components) to create. If m
is greater than 1, then the
given topology
is used on m
blocks of approximately equal
size.
A integer
of length one giving the number of bands.
Only used if topology
is one of "banded"
,
"small-world"
, or "Watts-Strogatz"
.
logical
. If TRUE
the constructed precision
matrix is used as the scale matrix of an inverse Wishart distribution and
class covariance matrices are drawn from this distribution.
numeric
greater than p + 1
giving the degrees of
freedom in the inverse Wishart distribution. A large nu
implies
high class homogeneity. A small nu
near p + 1
implies high
class heterogeneity.
An optional list
of numeric
matrices giving the
precision matrices to simulate from. Useful when random matrices have
already been generated by setting precision = TRUE
.
The returned type is dependent on n
and covariance
. The
function generally returns a list
of numeric
matrices with
the same length as n
. If covariance
is FALSE
the
simulated datasets with size n[i]
by p
are given in the
i
entry of the output. If covariance
is TRUE
the
p
by p
sample covariances of the datasets are given. When
n
has length 1 the list
structure is dropped and the matrix
is returned.
The data is simulated from a zero-mean p
-dimensional multivariate
gaussian distribution with some precision matrix determined by the argument
topology
which defines the GGM. If precision
is TRUE
the
population precision matrix is returned. This is useful to see what the
actual would-be-used precision matrices are. The available values of
topology
are described below. Unless otherwise stated the diagonal
entries are always one. If m
is 2 or greater block diagonal precision
matrices are constructed and used.
"identity"
: uses
the identity matrix (diag(p)
) as precision matrix. Corresponds to no
conditional dependencies.
"star"
: simulate from a star
topology. Within each block the first node is selected as the "hub". The
off-diagonal entries \((1,j)\) and \((j,1)\) values taper off with the
value \(1/(j + 1)\).
"clique"
: simulate from clique topology
where each block is a complete graph with off-diagonal elements equal to
nonzero
.
"complete"
: alias for (and identical to)
"clique"
.
"chain"
: simulate from a chain topology where
the precision matrix is a tridiagonal matrix with off-diagonal elements (in
each block) given by argument nonzero
.
"banded"
:
precision elements (i,j)
are given by \(1/(|i-j|+1)\) if \(|i-j|\)
is less than or equal to banded.n
and zero otherwise.
"scale-free"
: The non-zero pattern of each block is generated by a
Barabassi random graph. Non-zero off-diagonal values are given by
nonzero
. Gives are very "hubby" network.
"Barabassi"
:
alias for "scale-free"
.
"small-world"
: The non-zero
pattern of each block is generated by a 1-dimensional Watts-Strogatz random
graph with banded.n
starting neighbors and \(5\%\) probability of
rewiring. Non-zero off-diagonal values are given by nonzero
. Gives
are very "bandy" network.
"Watts-Strogatz"
: alias for
"small-world"
"random-graph"
: The non-zero pattern of
each block is generated by a Erdos-Renyi random graph where each edge is
present with probability \(1/p\). Non-zero off-diagonal values are given
by nonzero
.
"Erdos-Renyi"
: alias for
"random-graph"
When n
has length greater than 1, the datasets
are generated i.i.d. given the topology and number of blocks.
Arguments invwishart
and nu
allows for introducing class
homogeneity. Large values of nu
imply high class homogeneity.
nu
must be greater than p + 1
. More precisely, if
invwishart == TRUE
then the constructed precision matrix is used as
the scale parameter in an inverse Wishart distribution with nu
degrees
of freedom. Each class covariance is distributed according to this inverse
Wishart and independent.
## Generate some simple sample covariance matrices
createS(n = 10, p = 3)
#> A B C
#> A 0.714338666 -0.008603045 -0.4495777
#> B -0.008603045 0.848783685 -0.4070938
#> C -0.449577686 -0.407093834 1.6648221
createS(n = c(3, 4, 5), p = 3)
#> $class1
#> A B C
#> A 0.23541993 -0.06808345 -0.09331714
#> B -0.06808345 0.02188217 0.02633239
#> C -0.09331714 0.02633239 0.03718524
#>
#> $class2
#> A B C
#> A 0.4048721 0.18554770 0.15656767
#> B 0.1855477 0.11035641 -0.03908228
#> C 0.1565677 -0.03908228 0.59124579
#>
#> $class3
#> A B C
#> A 0.9965404 0.26490592 -0.18057305
#> B 0.2649059 0.43924828 0.03874123
#> C -0.1805730 0.03874123 1.94661771
#>
createS(n = c(32, 55), p = 7)
#> $class1
#> A B C D E F
#> A 0.63178208 0.06467505 0.19670069 0.06289789 0.04188822 -0.19040730
#> B 0.06467505 0.93135667 0.08449495 0.02381718 -0.22198857 -0.02349797
#> C 0.19670069 0.08449495 0.93995769 -0.34139993 0.14346997 0.33424291
#> D 0.06289789 0.02381718 -0.34139993 0.73176859 -0.17331770 -0.21105145
#> E 0.04188822 -0.22198857 0.14346997 -0.17331770 1.36921587 0.11591828
#> F -0.19040730 -0.02349797 0.33424291 -0.21105145 0.11591828 0.81224374
#> G -0.01227575 -0.03793069 0.10206489 0.18693117 -0.20170543 0.02724070
#> G
#> A -0.01227575
#> B -0.03793069
#> C 0.10206489
#> D 0.18693117
#> E -0.20170543
#> F 0.02724070
#> G 0.56993517
#>
#> $class2
#> A B C D E F
#> A 1.04034367 -0.23721483 -0.16969116 0.04907348 -0.07136711 -0.11950913
#> B -0.23721483 0.96099733 -0.13451093 -0.14600484 -0.07506789 0.18548682
#> C -0.16969116 -0.13451093 1.12555789 -0.18544152 -0.05878663 0.11898834
#> D 0.04907348 -0.14600484 -0.18544152 0.93703353 0.11422203 -0.11126085
#> E -0.07136711 -0.07506789 -0.05878663 0.11422203 0.82855277 0.02398115
#> F -0.11950913 0.18548682 0.11898834 -0.11126085 0.02398115 0.99802168
#> G -0.06728589 0.11392344 0.13948464 0.17997138 0.04343981 -0.28291270
#> G
#> A -0.06728589
#> B 0.11392344
#> C 0.13948464
#> D 0.17997138
#> E 0.04343981
#> F -0.28291270
#> G 1.00373644
#>
## Generate some datasets and not sample covariance matrices
createS(c(3, 4), p = 6, dataset = TRUE)
#> $class1
#> A B C D E F
#> [1,] 0.90493393 0.5602515 1.79739015 1.1985082 0.3703148 -1.0219351
#> [2,] -0.06181858 -0.8432687 0.60311808 0.1489683 -1.9410410 0.5510601
#> [3,] -0.81701756 -0.1890840 0.05481809 -0.2684883 -1.8716184 -1.1819590
#>
#> $class2
#> A B C D E F
#> [1,] -1.9289744 -1.1150966 0.5732693 -0.9779805 0.93539734 -2.52598044
#> [2,] 1.9132912 -1.6585732 -1.9060648 -1.5551617 -0.47828835 0.77125979
#> [3,] -0.3644607 -0.3126735 -0.4776542 0.6235130 -0.35469283 -0.23236273
#> [4,] -1.7555148 2.8456476 0.6439718 -0.2382009 -0.06902261 0.02795027
#>
## Generate sample covariance matrices from other topologies:
A <- createS(2000, p = 4, topology = "star")
round(solve(A), 3)
#> A B C D
#> A 0.958 0.464 0.334 0.270
#> B 0.464 0.996 0.012 0.017
#> C 0.334 0.012 0.955 0.016
#> D 0.270 0.017 0.016 0.988
B <- createS(2000, p = 4, topology = "banded", banded.n = 2)
round(solve(B), 3)
#> A B C D
#> A 0.993 0.492 0.344 0.036
#> B 0.492 0.990 0.482 0.323
#> C 0.344 0.482 1.024 0.541
#> D 0.036 0.323 0.541 1.024
C <- createS(2000, p = 4, topology = "clique") # The complete graph (as m = 1)
round(solve(C), 3)
#> A B C D
#> A 1.009 0.265 0.252 0.235
#> B 0.265 1.015 0.251 0.234
#> C 0.252 0.251 0.996 0.295
#> D 0.235 0.234 0.295 0.975
D <- createS(2000, p = 4, topology = "chain")
round(solve(D), 3)
#> A B C D
#> A 1.042 0.261 -0.005 0.029
#> B 0.261 0.963 0.255 0.044
#> C -0.005 0.255 1.002 0.253
#> D 0.029 0.044 0.253 0.983
## Generate smaple covariance matrices from block topologies:
C3 <- createS(2000, p = 10, topology = "clique", m = 3)
round(solve(C3), 1)
#> A B C D E F G H I J
#> A 1.0 0.3 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0
#> B 0.3 1.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0
#> C 0.3 0.3 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
#> D 0.0 0.0 0.0 1.0 0.3 0.3 0.0 0.0 0.0 0.0
#> E 0.0 0.0 0.0 0.3 1.0 0.2 0.0 0.0 0.0 0.0
#> F 0.0 0.0 0.0 0.3 0.2 1.0 0.0 0.0 0.0 0.0
#> G 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.3 0.2 0.3
#> H 0.0 0.0 0.0 0.0 0.0 0.0 0.3 1.0 0.2 0.3
#> I 0.0 0.0 0.0 0.0 0.0 0.0 0.2 0.2 1.0 0.3
#> J 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.3 0.3 1.0
C5 <- createS(2000, p = 10, topology = "clique", m = 5)
round(solve(C5), 1)
#> A B C D E F G H I J
#> A 1.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
#> B 0.3 1.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
#> C 0.0 0.0 1.0 0.2 0.0 0.0 0.0 0.0 0.0 0.0
#> D 0.0 0.0 0.2 1.0 0.0 0.0 0.0 -0.1 0.0 0.0
#> E 0.0 0.0 0.0 0.0 1.0 0.2 0.0 0.0 0.0 0.0
#> F 0.0 0.0 0.0 0.0 0.2 1.0 0.0 0.0 0.0 0.0
#> G 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.3 0.0 0.0
#> H 0.0 0.0 0.0 -0.1 0.0 0.0 0.3 1.0 0.0 0.0
#> I 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.2
#> J 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.2 1.0
## Can also return the precision matrix to see what happens
## m = 2 blocks, each "banded" with 4 off-diagonal bands
round(createS(1, 12, "banded", m = 2, banded.n = 4, precision = TRUE), 2)
#> A B C D E F G H I J K L
#> A 1.00 0.50 0.33 0.25 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00
#> B 0.50 1.00 0.50 0.33 0.25 0.20 0.00 0.00 0.00 0.00 0.00 0.00
#> C 0.33 0.50 1.00 0.50 0.33 0.25 0.00 0.00 0.00 0.00 0.00 0.00
#> D 0.25 0.33 0.50 1.00 0.50 0.33 0.00 0.00 0.00 0.00 0.00 0.00
#> E 0.20 0.25 0.33 0.50 1.00 0.50 0.00 0.00 0.00 0.00 0.00 0.00
#> F 0.00 0.20 0.25 0.33 0.50 1.00 0.00 0.00 0.00 0.00 0.00 0.00
#> G 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.50 0.33 0.25 0.20 0.00
#> H 0.00 0.00 0.00 0.00 0.00 0.00 0.50 1.00 0.50 0.33 0.25 0.20
#> I 0.00 0.00 0.00 0.00 0.00 0.00 0.33 0.50 1.00 0.50 0.33 0.25
#> J 0.00 0.00 0.00 0.00 0.00 0.00 0.25 0.33 0.50 1.00 0.50 0.33
#> K 0.00 0.00 0.00 0.00 0.00 0.00 0.20 0.25 0.33 0.50 1.00 0.50
#> L 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.20 0.25 0.33 0.50 1.00
## Simulation using graph-games
round(createS(1, 10, "small-world", precision = TRUE), 2)
#> A B C D E F G H I J
#> A 1.00 0.25 0.25 0.00 0.00 0.00 0.00 0.00 0.25 0.25
#> B 0.25 1.00 0.25 0.25 0.00 0.00 0.00 0.00 0.00 0.25
#> C 0.25 0.25 1.00 0.25 0.25 0.00 0.00 0.00 0.00 0.00
#> D 0.00 0.25 0.25 1.00 0.25 0.25 0.00 0.00 0.00 0.00
#> E 0.00 0.00 0.25 0.25 1.00 0.25 0.25 0.00 0.00 0.00
#> F 0.00 0.00 0.00 0.25 0.25 1.00 0.25 0.25 0.00 0.00
#> G 0.00 0.00 0.00 0.00 0.25 0.25 1.00 0.25 0.25 0.00
#> H 0.00 0.00 0.00 0.00 0.00 0.25 0.25 1.00 0.25 0.25
#> I 0.25 0.00 0.00 0.00 0.00 0.00 0.25 0.25 1.00 0.25
#> J 0.25 0.25 0.00 0.00 0.00 0.00 0.00 0.25 0.25 1.00
round(createS(1, 5, "scale-free", precision = TRUE), 2)
#> A B C D E
#> A 1.00 0.25 0.25 0.00 0.00
#> B 0.25 1.00 0.00 0.00 0.25
#> C 0.25 0.00 1.00 0.25 0.00
#> D 0.00 0.00 0.25 1.00 0.00
#> E 0.00 0.25 0.00 0.00 1.00
round(createS(1, 5, "random-graph", precision = TRUE), 2)
#> A B C D E
#> A 1.00 0 0.00 0.25 0.00
#> B 0.00 1 0.00 0.00 0.00
#> C 0.00 0 1.00 0.00 0.25
#> D 0.25 0 0.00 1.00 0.00
#> E 0.00 0 0.25 0.00 1.00
## Simulation using inverse Wishart distributed class covariance
## Low class homogeneity
createS(n = c(10,10), p = 5, "banded", invwishart = TRUE, nu = 10)
#> $class1
#> A B C D E
#> A 0.3407703 -0.15905885 -0.252470276 0.28600211 -0.468624961
#> B -0.1590589 0.73069749 -0.348215530 0.06845775 1.042960473
#> C -0.2524703 -0.34821553 0.942682156 -0.61086235 -0.007764689
#> D 0.2860021 0.06845775 -0.610862347 1.10894996 -0.887920766
#> E -0.4686250 1.04296047 -0.007764689 -0.88792077 2.846779236
#>
#> $class2
#> A B C D E
#> A 0.8862115 0.3589084 -0.1860808 0.4253942 0.4074453
#> B 0.3589084 0.9585110 -0.1128333 -0.1451391 0.2661174
#> C -0.1860808 -0.1128333 0.5340195 -0.2696945 -0.5983405
#> D 0.4253942 -0.1451391 -0.2696945 0.7112546 0.4634702
#> E 0.4074453 0.2661174 -0.5983405 0.4634702 0.9329733
#>
## Extremely high class homogeneity
createS(n = c(10,10), p = 5, "banded", invwishart = TRUE, nu = 1e10)
#> $class1
#> A B C D E
#> A 0.95869737 -0.3809540 0.02415766 0.1725129 -0.54093975
#> B -0.38095403 1.3124689 0.29900232 -1.2067459 0.57232835
#> C 0.02415766 0.2990023 2.04928194 -0.3400434 -0.07280502
#> D 0.17251292 -1.2067459 -0.34004344 2.1347086 -1.04191852
#> E -0.54093975 0.5723283 -0.07280502 -1.0419185 1.57782754
#>
#> $class2
#> A B C D E
#> A 0.53744832 -0.03487138 0.1114366 0.256137126 0.118536460
#> B -0.03487138 1.50640587 -1.2125845 -0.432946266 0.775990246
#> C 0.11143658 -1.21258451 2.5358652 -0.116654635 -1.278033424
#> D 0.25613713 -0.43294627 -0.1166546 0.633726286 0.001914981
#> E 0.11853646 0.77599025 -1.2780334 0.001914981 1.631697726
#>
# The precision argument can again be used to see the actual realised class
# precision matrices used when invwishart = TRUE.
# The Plist argument is used to reuse old precision matrices or
# user-generated ones
P <- createS(n = 1, p = 5, "banded", precision = TRUE)
lapply(createS(n = c(1e5, 1e5), p = 5, Plist = list(P, P+1)), solve)
#> $class1
#> A B C D E
#> A 1.003376159 0.503060389 0.3339784 -0.002151475 -0.001472794
#> B 0.503060389 0.994602675 0.4989885 0.323371776 -0.002491459
#> C 0.333978381 0.498988513 1.0010289 0.496668132 0.333095351
#> D -0.002151475 0.323371776 0.4966681 0.995563778 0.502545053
#> E -0.001472794 -0.002491459 0.3330954 0.502545053 1.003209189
#>
#> $class2
#> A B C D E
#> A 1.995161 1.495985 1.323942 1.000188 1.002387
#> B 1.495985 1.996956 1.490973 1.334418 0.997494
#> C 1.323942 1.490973 1.982567 1.495479 1.329760
#> D 1.000188 1.334418 1.495479 2.005061 1.500588
#> E 1.002387 0.997494 1.329760 1.500588 2.000471
#>