Multi‐subject stochastic blockmodels with mixed effects for adaptive analysis of individual differences in human brain network cluster structure

Recently, there has been a renewed interest in the class of stochastic blockmodels (SBM) and their applications to multi‐subject brain networks. In our most recent work, we have considered an extension of the classical SBM, termed heterogeneous SBM (Het‐SBM), that models subject variability in the cluster‐connectivity profiles through the addition of a logistic regression model with subject‐specific covariates on the level of each block. Although this model has proved to be useful in both the clustering and inference aspects of multi‐subject brain network data, including fleshing out differences in connectivity between patients and controls, it does not account for dependencies that may exist within subjects. To overcome this limitation, we propose an extension of Het‐SBM, termed Het‐Mixed‐SBM, in which we model the within‐subject dependencies by adding subject‐ and block‐level random intercepts in the embedded logistic regression model. Using synthetic data, we investigate the accuracy of the partitions estimated by our proposed model as well as the validity of inference procedures based on the Wald and permutation tests. Finally, we illustrate the model by analyzing the resting‐state fMRI networks of 99 healthy volunteers from the Human Connectome Project (HCP) using covariates like age, gender, and IQ to explain the clustering patterns observed in the data.


INTRODUCTION
Network analysis and clustering methods have emerged as powerful platforms for investigating how some units of interest (i.e., nodes) mutually interact to facilitate the functioning of a system as a whole. As illustrated in Figure 1, data from resting-state fMRI can be abstracted to network like objects so that the anatomically defined regions of interest (ROIs) represent n number of nodes and the binary-valued links between them represent edges (with {1, 0} denoting the presence and absence of an edge, respectively).
The underlying topology of a brain network, which we refer to as the cluster structure, is revealed through its decomposition into Q number of clusters, which flesh out the most prominent connectivity profiles. Generally, this topology is estimated using one of the many existing clustering methods, the majority of which are based on the optimization of an objective function called modularity (e.g., the Newman spectral and fast Louvain algorithms; Achard, Salvador, Whitcher, Suckling, & Bullmore, 2006;Bassett & Bullmore, 2009;Bullmore & Sporns, 2009;Hilgetag, Burns, O'Neill, Scannell, & Young, 2000;Pan, Chatterjee, & Sinha, 2010;Rubinov & Sporns, 2010;Sporns, Chialvo, Kaiser, & Hilgetag, 2004). Nevertheless, one of the limitations of methods based on modularity is that they are confined to the identification of a limited range of patterns of connectivity, which typically fall into the framework of the "modular organization" (in which, the within-cluster connectivities are much higher than the between-cluster connectivities). In an extensive set of simulations (Pavlovic et al., 2020), we have benchmarked these methods on modular topologies but also on topologies with a core-modular structure, which diverge from a modular architecture by combining a subset of clusters with a modular like organization and a subset of strongly interconnected clusters. On these simulations, the modular algorithms showed a very high degree of inaccuracy in retrieving the correct cluster labels when the structure was heterogeneous modular but especially when the topology was nonmodular. A more accurate alternative to these methods is contained to the class of probabilistic network clustering methods called stochastic blockmodel (SBM), which were developed in the work of Snijders and Nowicki (Nowicki & Snijders, 2001;Snijders & Nowicki, 1997). The SBMs use a framework of mixture models to explain variations in the distribution of edges on a single network. Depending on the edge values, one would select a unique family of probability density (e.g., the Bernoulli density), which would parametrize the connection probabilities within each cluster and between each cluster pair. In the Bernoulli density example, these parameters represent the mean values in the cluster structure , which are given as a Q × Q matrix of probabilities. The participation of each Bernoulli cluster in the overall mixture of densities is controlled by the parameter , which F I G U R E 1 Illustration of the main steps involved in the derivation of multi-subject binary networks from resting-state fMRI data. The nodes, labeled as V i and V j , represent n regions of interest usually based on some predefined brain parcellations or anatomical atlases. The time-series data in each such region is represented by the mean time series across its constituent voxels. There are typically four ways to generate binary networks from these time series. (i) The first way is to compute Pearson's correlations between the nodal time courses, then transform them using Fisher's z scores and finally binarize them based on the series of tests {H 0 : r ijk = 0; H 1 : r ijk > 0} in which the significant edges take value one and the nonsignificant edges take value zero. To carry out these tests, we have used the "xDF" method recently proposed by Afyouni, Smith, and Nichols (2019), which accounts for the autocorrelation in regional time series, including the instantaneous and lagged cross-correlations, and computes more accurate variances for each edge. (ii) Another way is to directly threshold the correlation matrices according to some predefined correlation value (e.g., 0.8), which treats only the strongest correlations as edges. (iii) The third way is to decompose the nodal time series into discrete wavelets and compute the correlations between the wavelet coefficients within a low-frequency scale for which fMRI dynamics of neuronal origin are the most prominent (e.g., < 0.1 Hz). (iv) The fourth way is to use partial correlation, which takes each nodal time series pair and regresses out the other n − 2 nodal time courses from them. At the end of any of the four pipelines, each subject is awarded a binary symmetric adjacency matrix x k , and subject-specific covariates are collected alongside the resting-state fMRI data is a vector of length Q. By parameterizing each element in a cluster structure, the SBM model captures a wide range of topologies, and it is not limited to the stringent assumptions of modular organizations. Our SBM application on the C. elegans network (Pavlovic, Vértes, Bullmore, Schafer, & Nichols, 2014) captured its locomotion circuit and delineated clusters of sensory neurons maintaining the forward and backward locomotion, suggesting that the model can be used to identify clusters with functionally similar nodes.
Among this literature, we especially highlight the work of Mariadassou et al. (2010) who developed the generalized SBMs for the analysis of a single network. This work extends the variational fitting procedure of SBMs for the exponential family of densities and uses generalized linear models with edge-based covariate values to explain the network cluster structure . The interplay between the cluster structure and covariates can be such that the effects of covariates are the same across the entire cluster structure (i.e., homogeneous effects) or affect each element of the cluster structure with different intensity (i.e., heterogeneous effects). Building upon the work of Mariadassou et al. (2010), we have proposed three multi-subject SBMs (MS-SBMs) among which we especially highlight the heterogeneous-effects SBM (Het-SBM; Pavlovic et al., 2020), which serves as a basis for the development of the new model in this work.
Het-SBM assumes that the cluster labels are the same for all subjects but uses subject-specific covariates like age or gender in a logistic regression model to explain between-subject variations in the cluster structure. To intuitively understand this model, in Figure 2, we showcase a straightforward example in which the age effect of three subjects acts differently on the connectivity in each block. For example, there is a decrease of connectivity in Block (1, 1) with increasing age, and the opposite directional effect is observed in Block (1, 2). By tying each covariate effect to a block cell, the model can capture a wide range of cluster structures. For example, the first subject has a modular topology as the within-block connections are higher than those between-block connections. By contrast, the third subject has a topology that resembles a disassortative mixing as the bulk of connections in the profile of Block 1 goes to maintaining ties with Blocks 2 and 3. The estimation of model parameters related to the logistic regression is based on Firth-type estimators (Firth, 1993) which belong to a class of bias preventative procedures. At first glance, the assumption of common cluster labels across subjects may appear too strong and potentially discount the use of covariates in the estimation of cluster labels. However, since the effects of the covariates are tied to each element of the cluster structure, it is possible to encounter examples in which the effects of covariates iron out the differences in the connectivity profiles across the clusters and this information can be used toward a more accurate estimation of cluster labels. Thus, it is not advisable to separate the cluster label estimation from the logistic regression. In addition to its clustering abilities, the Het-SBM model also provides inference tools to detect group differences in the connectivity rates or, more generally, effects of covariates.
However, although the Het-SBM model is useful for the analysis of independent multi-subject networks, it may not be appropriate in cases where some form of dependence exists in the data. This may be generally attributed to two sources. First, if the covariates do not fully explain the intersubject differences in connectivity rates, there may still be some dependence within the individual elements of a block structure. For example, the prevalence of edges in one subject's block (e.g., the block (q, l)) may be consistently overestimated and, therefore, this lack of fit may be randomly distributed over subjects. Second, the data may comprise more than one network per each subject (i.e., visits) inducing repeated measures correlations. The latter type of data typically occurs in studies with multiple visits of subjects, who are either scanned at different time points or after being exposed to different experimental conditions, such as different drug effects (e.g., a crossover design). Other examples of such a type of data may include the combined outcomes of F I G U R E 2 Illustration of heterogeneous SBM (Het-SBM). Simulated data are available for three subjects aged 20, 40, and 90 years, and each subject's network is represented as a reorganized adjacency matrix comprised of three blocks, labeled numerically (1-3). In Het-SBM, the subject-level covariates can affect the connectivity rates specifically within each block or between each pair of blocks. The effect of age is seen as a locally heterogeneous increase or decrease of connectivity in each block. For example, the intrablock connectivity in Block (1, 1) decreases as a function of age, whereas the interblock connectivity in Block (1, 2) increases with age different imaging modalities or data that combine different classes of connectivity (e.g., structural and functional), where the goal would be to answer questions regarding the similarities of their networks' organizations.
In the present work, we continue the characterization of the human brain network topology by extending previous works in several ways. First, we derive the Het-Mixed-SBM model (see Pavlovic, 2015, chapter 5), which uses the heterogeneous fixed effects from the Het-SBM model and poses block-and subject-level random effects. Second, we outline its estimation strategy and consider the parametric Wald test and permutation test based on the Wald statistics. Third, we validate the model on a range of simulation scenarios where we benchmark this model against Het-SBM and showcase instances where Het-Mixed-SBM can provide more accurate clustering. In another set of simulations, we show the control of false positives and, finally, we consider an application to a resting-state fMRI study obtained from the Human Connectome Project (Van Essen et al., 2012) with 99 healthy subjects and 114 nodes. This data set has only one visit per each subject and, among the wealth of information available for these subjects, we focus on covariates such as age, gender, and IQ score, which are often linked to between-subject variations.

METHOD
Hereafter, we will be using the classical convention of roman capital letters to denote random variables and lower case letters to denote their observed realizations. Scalar and nonscalar values will be denoted by light and boldface fonts, respectively.

Heterogeneous mixed-effects stochastic blockmodel (Het-Mixed-SBM)
Let the set of nodes, labeled as {V 1 , … , V n }, be divided into Q unknown (latent) blocks or clusters. For a fixed number of blocks, Q, the indices q, l ∈ {1, … , Q} are used to label the individual blocks. The block membership of a particular node V i will be indicated by a random vector Z i = (Z i1 , … , Z iQ ), whose elements Z iq take the value 1 if V i ∈ qth group and 0 otherwise. For example, fixing Q = 3, the estimatedẑ i = (0, 1, 0) indicates that the node V i is in the second cluster. Pooling this information across nodes, the n × Q random matrix Z can be defined such that the vectors Z i are mutually independent and follow a categorical density with Q possible outcomes where is a 1 × Q dimensional vector of success probabilities = ( 1 , … , Q ) with a specific q being the probability that a randomly selected node falls into the qth block, and ∑ Q q q = 1. Here, it is important to note that the choice of categorical (or equivalently single-trial multinomial) density implies that the fitted blocks form a partition of all nodes, in which, each node belongs to only one block (i.e., disjoint blocks). This is formally noted as ∑ Q q=1 z iq = 1. Writing Z = ((Z iq )) 1≤i≤n,1≤q≤Q for the n × Q matrix, the probability mass function is given as (2) The random variable X ijkt represents the possibility of an edge between the nodes V i and V j in the kth subject and its tth measurement. Hence, x ijkt denotes a binary realization of X ijkt with 1 being an edge and 0 no edge. Therefore, for the kth subject and tth measurement, X kt = ((X ijkt )) 1≤i≠j≤n denotes an n × n random, symmetric adjacency matrix with elements X ijkt , and, for a total of K subjects and T measurements, X denotes the set of random matrices X = {X kt ∶ k ∈ {1, … K}, t ∈ {1, … , T}}. The individual matrices x kt = ((x ijkt )) 1≤i≠j≤n are assumed to be undirected, without self-connected nodes and without multiple edges between the nodes. Hence, they are binary and symmetric matrices with 0s on their principal diagonal and a total of n(n − 1)∕2 data points. Conditional on the cluster labels, the edges follow a Bernoulli density where R qlk represents a subject-specific random intercept for the block (q, l), d kt is a P × 1 vector of covariates associated with the kth subject and its tth measurement, and ql is a P × 1 vector of regression coefficients for the block (q, l). The total of Q(Q + 1)∕2 individual regression vectors ql is collectively noted as . We note that 2 ql is the random effect variance of each block or block-to-block regression and it is collectively noted as 2 = (( 2 ql )) 1≤q,l≤Q . In particular, the probability mass function of x given z can be written as

Estimation
The model parameters are estimated by optimizing the variational bound  (f * (z; ); , , 2 ) defined as where f * (z; ) is the variational density with parameter (n × Q matrix of cluster probabilities), and it denotes the parametric family that is the closest (in the Kullback-Leibler sense) to f (z|x; , 2 , ). The evaluation of Equation (7) requires taking the expectation of Z with respect to its variational density f * (z; ), which is taken to be a product of the individual densities of Z i . Each density is categorical with block-specific probabilities that are independent in each node, where In particular, iq is the strength of evidence that a node V i is a member of block q having observed the data. Let us suppose an example in which the node V i has a vector̂i = (0.1, 0.01, 0.89), expressing the likelihood of its membership in three clusters. Since this node has the strongest affiliation to the third block, itsẑ i = (0, 0, 1). For simplicity of notation, we take advantage of the fact that ql = lq and define ijql as According to this, the variational bound in Equation (7) is given as For a fixed Q, we want to maximize the variational bound defined by Equation (10) with respect to the variational parameter as well as the parameters , , and 2 . The optimal variational parameter satisfies the fixed point relationŝ where we use the notation I qlk1 and I iqlk to denote the following integral expressions The estimate of each element of is given aŝ We next turn our attention to the optimization of the variational bound for and 2 . The estimates of ql and 2 ql can be found with the Newton-Raphson formula so that, for the (m)th step, we have ( The derivation details of the Fisher information matrix and the score can be found in Appendix A1.
For some initial values for 0 , we iteratively update the model parameters according to the two steps procedure ] .
In the first step, is updated according to Equation (14) while and 2 are updated according to the Newton-Raphson method outlined in Equation (15). In the second step, is updated according to Equation (12). The algorithm cycles through these two steps until convergence is obtained. The convergence is measured by the relative changes of the parameter estimates and the improvement of the variational bound. The Newton-Raphson algorithm uses a naïve estimate of̂q l as starting values for the intercept INT ql (i.e.,̂I NT ql = log(̂q l ∕(1 −̂q l )). In particular, for a given block element (i.e., Block (q, l)),̂q l is the ratio of the sum of its observed edges across the K subjects and T measurements and the total number of possible edges. The starting values for 2 are based on a strategy outlined in Demidenko (2004) whose objective is to estimate the random effects r qlk and their sample variance, from which, we finally obtain the initial estimate of 2 . The full details of this procedure can be found in Appendix B1 and further implementation details can be found in Appendix C1.

Model selection
The integrated classification likelihood (ICL) criterion (Biernacki et al., 1998) is one of the most common procedures for determining the optimal number of clusters in mixture models, where the main goal is to cluster network data (Daudin et al., 2008;Mariadassou et al., 2010;Pavlovic et al., 2014). The benefit of this approach is that it favors the models with well-separated clusters, and it penalizes for model complexity, which ensures parsimony. Using m Q to denote a candidate model with Q clusters, we derive the ICL criterion from log f (x, z|m Q ), which can be viewed as a sum of log f (x|z, m Q ) and log f (z|m Q ). The individual components log f (x|z, m Q ) and log f (z|m Q ) are each approximated with the Schwarz criterion (Schwarz, 1978). In the first component, log f (x|z, m Q ), the total number of parameters is Q(Q+1) 2 P in and Q(Q+1) 2 in 2 and the total number of data points in x is n(n−1) 2 KT. In the second component, log f (z|m Q ), the total number of parameters in is Q − 1, while the total number of data points is n. With this, the ICL criterion of Het-Mixed-SBM is defined as

Inference
Het-Mixed SBM offers the possibility to estimate a common cluster structure across subjects that can serve as a common ground for making comparisons between them. Since it poses a mixed effect logistic regression model on each element of the block structure, its methodological framework can be used to estimate differences between groups of subjects or effects of covariates on the connectivity rate at each block. More precisely, we can statistically test if these quantities are different from 0 or, in a more general sense, if linear combinations of these quantities are different from a specified constant. In the continuation of this section, we consider the Wald test to make such a kind of hypothesis testing and assume it to be conditional onẑ.

Wald test.
The null hypothesis is given as  0 ∶ L ql ql = b ql0 and the Wald statistic can then take the form where L ql is a matrix (or a vector) defining the combination of the parameters (contrast) tested and c ql is the rank of L ql . Asymptotically, W ql follows a 2 c ql distribution. If L ql is a vector, then asymptotically follows a standard normal distribution. The standard errors of the model parameters are estimated with the Fisher information matrix (see Appendix A1, Equation (A9)).

Multiple testing.
Inference procedures for the block level parameters comprise a multiple testing problem as we are effectively making Q(Q + 1)∕2 individual tests. To control the familywise error rate (FWE), defined as the probability of making at least one Type I error, we use the Bonferroni correction (Holm, 1979). This correction is valid for any dependence structure and is easy to apply: instead of using a nominal 0 significance value (e.g., 0.05), 0 ∕n T is used instead, where n T is the number of tests (here, n T = Q(Q + 1)∕2).

Permutation testing.
In addition to the Wald test, which depends on asymptotic sampling distributions, permutation tests are also considered (Good, 2000). Permutation tests are based on the premise that, under the null hypothesis, the data can be exchanged without altering its distribution. This implies that the null distribution of any test statistic can be found empirically through repeated evaluations of randomly rearranged (or permuted) data. In the context of Het-Mixed-SBM, we use the permutation strategy proposed by Potter (2005). In this approach, the covariate of interest is regressed on the remaining nuisance covariates (using a linear regression model), and the residuals from this model are used in place of the original covariate. These residuals are then permuted M times and, for each permutation b = 1, … , M, the logistic mixed-effect model is refitted yielding a permuted Wald test statistic w qlb . The p-value of the original Wald test statistic w ql0 (recomputed by refitting the logistic mixed-effect model with the residual covariate) is then computed as where I(⋅) is the indicator function.
Improved multiple testing procedures with permutation.
The Bonferroni correction for controlling the FWE is known to be conservative in the presence of dependence across tests. Conveniently, the use of permutation can provide a less conservative control of the FWE by comparing the original statistics to the null distribution of the maximum statistic across blocks (Westfall & Young, 1993). For the original Wald statistic w ql0 , the FWE corrected p-value based on this procedure is given as

Heterogeneous SBM (Het-SBM)
Het-SBM can be seen as Het-Mixed-SBM without random effects. Formally, for this model, we have where qlk is the connectivity of block (q, l) associated with subject k, d ⊤ k is the 1 × P vector of covariates associated with subject k (typically the first element will be 1, representing the intercept), and ql is a P × 1 vector of regression parameters. Its estimation details and inference strategies can be found in Pavlovic et al. (2020).

SIMULATION STUDY
This section first details the methodology of each simulation setting, labeled Simulation I and Simulation II, and then it proceeds to describe their results. Given that there is an overlap in the methodology of Simulation I and Simulation II, they are discussed jointly (see Section 3.1), while their results are treated separately (see Sections 3.2 and 3.3). Figure 3 illustrates a bird's eye view of each simulation pipeline. Each simulation pipeline is broken down into four components: goals, input parameters, models, and evaluation strategies.

Goals.
The goal of Simulation I is to investigate the accuracy of Het-SBM and Het-Mixed-SBM in estimating the correct cluster labels under the presence of random effects in the data, and to gauge how the random effects can influence this accuracy. The goal of Simulation II is to investigate, for both Het-SBM and Het-Mixed-SBM, the accuracy of the Wald parametric test and the nonparametric test (permutation test based on the Wald statistics) to control the Type I error rate and to give some recommendations for real data applications.

Input parameters.
The input parameters in both simulations are the cluster structure, the fixed-effects sizes, the random-effects variances, the proportion design (i.e., the cluster sizes), the network size, the number of subjects, and the number of visits. In Simulation I, the cluster structures considered are the heterogeneous modular (Het-Modular), core-modular, and weak structures shown in Figure 3. The heterogeneous modular structure is similar to the commonly assumed perfectly modular structure in the sense that the within-cluster connectivities are larger than the between-cluster connectivities. However, it allows the connectivities to vary within and between clusters while the perfectly modular structure imposes them to be constant. The core-modular structure is a hybrid structure where some parts of the network are modular, while the remaining parts contain clusters with high mutual connectivity patterns that comprise, therefore, a densely integrated core of clusters. For example, Blocks 1 and 2 are core clusters because they are densely connected, while Blocks 2 and 3 are modular because their within-cluster connectivities are much higher than their between-cluster connectivities. These two structures were chosen because of the extensive evidence that brain networks demonstrate at least some degree of modularity (Meunier, F I G U R E 3 Simulation pipelines: goals, input parameters, models, and evaluation methods. The input parameters consist of a cluster structure (capturing the underlying network topology), covariate effects (how a cluster structure varies across subjects as a function of their covariates), random effects (introducing dependencies within-blocks and within-subjects), a network size (i.e., the total number of nodes n), a proportion design (i.e., the individual cluster sizes), a total number of visits (T), and a total number of subjects (K) Lambiotte, Fornito, Ersche, & Bullmore, 2009) across many different species, such as the C. elegans, macaque and human brain networks (Achard et al., 2006;Bassett & Bullmore, 2009;Bullmore & Sporns, 2009;Hilgetag et al., 2000;Pan et al., 2010;Rubinov & Sporns, 2010;Sporns et al., 2004). However, there is also strong evidence that the cluster structure of a brain consists of a core of rich club nodes. For example, our previous work on the stochastic blockmodel analysis of the C. elegans brain network (Pavlovic et al., 2014) revealed a hybrid structure or "core-on-modules" structure, which deviated from a purely modular organization by the inclusion of densely inter-and intraconnected core blocks, which governed the worm's forward and backward motion. This kind of structure is hypothetically represented in the simulation by the core-modular topology. Finally, the weak cluster structure is an example of a topology where, on average, there is weak evidence for a three cluster solution as the connection profiles of Blocks 1 and 2 are identical. Such pathological cases may occur in noisy data sets. In Simulation II, however, we consider only the nonambiguous cluster structures with strong clustering evidence like the Het-Modular and core-modular structures so that there could be a fair comparison in the hypothesis testing procedures between Het-SBM and Het-Mixed-SBM. The rest of the simulation input parameters are self-explanatory in Figure 3. For example, in both simulations, the effect of the age covariate is set to zero, and the actual values for the covariates are randomly sampled. The random effect variances vary according to three values ( 2 ∈ {0 . 5, 1, 2}), which are assumed to be constant across the entire cluster structure. The proportion designs specify the cluster sizes, and, among these, we consider three cases: balanced (clusters with similar sizes), mildly unbalanced (M. Unbalanced; clusters with a mild decrease in sizes), and unbalanced (clusters with a strong decrease in sizes). The exact number of nodes in each cluster for each proportion design is given in Table 1. The network sizes are n ∈ {50, 100}, the total number of subjects is K ∈ {20, 40, 80}, and the total number of visits is T ∈ {1, 3} (with the exception of Simulation II where T ∈ {1}). For each such simulation scenario, we generate 100 Monte Carlo samples.
For both simulations, we fit Het-SBM and Het-Mixed-SBM. In Simulation I, the models are fitted on the range of different cluster valuesQ ∈ {2, 3, 4} using the same initializations, while, in Simulation II, only the fitted parameters are used to benchmark the accuracy of the tests. For the data sets with multiple visits, we submit multiple visits to Het-SBM as independent subjects.

Evaluation strategies.
The evaluation of the results in Simulation I is based on the accurate estimation of the optimal number of clusters and cluster labels. The comparison between the estimated clustering and the actual partition is carried out using the Adjusted Rand Index (ARI; Hubert & Arabie, 1985), which indicates the overall agreement between the selected best fits and the real clustering (with 1 denoting a perfect agreement). In Simulation II, we evaluate how well the parametric (Wald) and nonparametric (permutation) inference procedures control the false positive rate (FPR) at a level of significance of 5%. The p-values of the Monte Carlo permutation test statistic are obtained by computing 1,000 permutations of the age covariate for each of the realizations.

Simulation I: Results
For the data sets with the Het-Modular and core-modular cluster structures, Het-SBM and Het-Mixed-SBM both prove to be very accurate in the cluster label estimation and in selecting the optimal number of clusters (ARI = 1). This trend is noted for all the network sizes, subject totals, and random effect sizes (see Supplementary Information Figures S1-S8). This seems to indicate that, even under the presence of random effects, when the cluster profiles are well delineated, the choice between the fixed-effect or mixed-effect versions of the model is inconsequential in terms of clustering. However, as noted in Figure 4, this is not the case in the weak cluster structure data sets where the Het-SBM fits tend to be less accurate especially in the scenarios with small random effect variances (i.e., 2 = 0.5). For such values, Het-SBM struggles to pick on the between-subject variations in the connectivity profiles of Blocks 1 and 2, and it tends to underestimate the optimal number of clusters by one. It is interesting to point out that such effects are less pronounced when the network size is doubled, that is, when the individual clusters contain more nodes (see Figure 5). It is also worth pointing out that, in the cases of larger random effect variances, Het-SBM seems to be able to pick more accurately on the between-subject variations and F I G U R E 4 Box plots of ARI scores between the estimated and true cluster labels for the weak cluster structure, 50 nodes, and one visit. The x-axis shows two fitted models, Het-SBM and Het-Mixed-SBM, while the y-axis shows the distribution of their ARI scores. The rows show the subject totals, while the columns indicate the proportion designs and random effect variances F I G U R E 5 Box plots of ARI scores between the estimated and true cluster labels for the weak cluster structure, 100 nodes, and one visit. The x-axis shows two fitted models, Het-SBM and Het-Mixed-SBM, while the y-axis shows the distribution of their ARI scores. The rows show the subject totals, while the columns indicate the proportion designs and random effect variances to retrieve the correct clustering. By contrast, Het-Mixed-SBM is accurate in all situations. This is because the model is able to use the block-level variance of random effect to guide the clustering.

Simulation II: Results
The purpose of simulating the age effect under the null setting (i.e., H 0 : AGE ql = 0) is to evaluate the effectiveness of the Wald test and the permutation test based on the Wald statistics to control the Type I error rate at 5% significance. For convenience, we will hereafter refer to these procedures as the Wald test and permutation test, and we will base our discussion only on the simulation settings with the random-effect variances 2 = 1 as the trends observed are consistent with those observed in the other settings (whose results are given in Supplementary Information Appendix B). Figures 6 and 7 show the observed false positive rates (FPRs) at a level of significance of 5% obtained from the tests on each block-specific age coefficient and how these results contrast across Het-SBM and Het-Mixed-SBM. Both figures display the results in the settings with 50 nodes, one visit and the variance of random effects 2 = 1 but with different cluster structures, namely, the Het-Modular cluster structure in Figure 6 and the core-modular cluster structure in Figure 7. In the case of the Het-Modular structure, we may expect a moderate bias on the age coefficients as some cluster-connection probabilities may be very close to zero or one. The Firth-type correction used in Het-SBM accounts for such a type of bias, but, for Het-Mixed-SBM, as this kind of bias prevention correction was not utilized, we may expect some degree of liberal behavior for the parametric test. As noted in Figure 6, the Wald test for Het-Mixed-SBM in Block (1, 2) displays F I G U R E 6 Observed false positive rates (FPRs) at a level of significance of 5% in the Het-Modular network with 50 nodes, one visit, and random effect variance 2 = 1. The columns of the plot show three proportion designs, while its rows show three different subject totals (K ∈ {20, 40, 80}). The x-axis shows the fitted cluster-level regression coefficients submitted to hypothesis testing for the effect of age (i.e., H 0 : AGE ql = 0) and the y-axis shows their corresponding observed FPR. The observed FPR are computed for the Wald test (solid line) and the permutation test based on the Wald statistics (dashed line). The results are shown for Het-SBM (blue line with square symbols) and Het-Mixed-SBM (red line with triangle symbols). The shaded green band represents the 95% confidence interval based on a normal approximation exactly that behavior, which is not surprising as the input connection probability for that block was 0.01. Coupling such a small probability with a moderate number of subjects (K = 20) and a moderate number of edges introduces samples dominated by an excessive amount of zero-edges. With larger pulls of data (more edges and subjects), such effects seem to be ameliorated, resulting in better control of the FPR. For the core-modular structure in Figure 7, such effects seem to be absent and the Wald test appears to be valid in all scenarios. Similar trends are noted for other simulation settings (see Supplementary Information Figures S11-S14). In stark contrast to the results of Het-Mixed-SBM, the observed FPRs for the Wald test of Het-SBM are extremely inflated, suggesting complete unreliability of the test in the presence of random effects. One potential cause for this behavior lies in the assumption of edge independence within a block, which is no longer valid in the presence of random effects. In this particular setting, the fixed effects fail to explain the between-subject variation in the cluster-connectivity rates fully, and miss out on the dependence within the individual elements of a block structure introduced by the random effects. Notably, this trend is prevalent in all simulation settings and seems to moderately decrease only with a F I G U R E 7 Observed false positive rates (FPRs) at a level of significance of 5% in the core-modular network with 50 nodes, one visit, and random effect variance 2 = 1. The columns of the plot show three proportion designs, while its rows show three different subject totals (K ∈ {20, 40, 80}). The x-axis shows the fitted cluster-level regression coefficients submitted to hypothesis testing for the effect of age (i.e., H 0 ∶ AGE ql = 0) and the y-axis shows their corresponding observed FPR. The observed FPRs are computed for the Wald test (solid line) and the permutation test based on the Wald statistics (dashed line). The results are shown for Het-SBM (blue line with square symbols) and Het-Mixed-SBM (red line with triangle symbols). The shaded green band represents the 95% confidence interval based on a normal approximation smaller variance of the random effects (see Supplementary Information Figures S11 and S13 for 2 = 0 . 5). Finally, regarding the permutation test, it seems accurate in all simulation settings and for both models.

Data
In this work, we consider a resting-state functional magnetic resonance imaging (fMRI) data set obtained from the Human Connectome Project (HCP), containing 100 unrelated and healthy subjects. Each subject fMRI scan has been fully preprocessed to account for potential confounding factors such as head motion and slice timing errors (Smith et al., 2013). Due to excessive head motion and loss of signal in the frontal lobe, one subject has been removed from the study using the exclusion criterion recommended by Afyouni and Nichols (2018), thus yielding 99 subjects (K = 99). Next, each subject image has been further parcellated into 114 nonoverlapping regions of interests (i.e., n = 114) utilizing the methodological framework outlined in Thomas Yeo et al. (2011). Given that each region of interest contains several voxel-level time series, we use their mean time series for summarizing the functional activity in each node. However, even at this stage, the signal can still be contaminated by respiratory and cardiac confounds, which are also typically contained in the grand average time series (i.e., the global brain signal). Using the recommendations of Murphy and Fox (2017) and Li et al. (2019), the global mean signal is regressed out from each of the nodal time series. As noted in Figure 1, a functional connectivity matrix is obtained by correlating the nodal data in a pairwise fashion, and a binary network is derived from it by testing if each sample correlation is different from zero (i.e., H 0 ∶ r ijkt = 0 vs. H 1 ∶ r ijkt > 0). The negative correlation values bear very little neuroscientific information and, as such, are not considered in the analysis (Fornito, Zalesky, & Bullmore, 2016). The weakness of this approach, however, is that the fMRI time series are strongly autocorrelated (Woolrich, Ripley, Brady, & Smith, 2001), which produces a very poor estimate of the variance for Fisher Z scores used for the hypothesis tests, yielding strongly inflated false positives. In order to account for autocorrelation and cross-correlation across time series pairs, the approach termed "xDF" has been utilized (Afyouni et al., 2019). As shown in Afyouni et al. (2019), the xDF method produced much more accurate estimates than a naive approach. Using the xDF method, adjusted Z scores have been obtained, and the resulting p-values have been adjusted for multiple comparison error using a false discovery rate (Benjamini & Yekutieli, 2001). This part of the analysis has been completed using the Matlab xDF Toolbox (https://github.com/asoroosh/xDF, accessed on September 15, 2019; Afyouni et al., 2019). Finally, to obtain the binary adjacency matrix, any edge with an adjusted p-value lower than the nominal level (5%) has been set to one and otherwise to zero. Finally, for this data set, we only had one visit per subject, so T = 1 and all the subject covariates were not temporal. In this data set, each subject is linked to the following covariates: age (29.20±3.62), gender (+1 for a female with 54 subjects and −1 for a male with 45 subjects), and fluid intelligence score (IQ) (16.21±4.62) given in terms of their means and standard deviations, respectively. Fluid intelligence has been evaluated using the Penn Progressive Matrices (PMAT) test in which the subjects have been tasked with predicting a correct pattern given a prior set of sequential patterns. The PMAT test consists of 24 items and the problems are arranged in order of increasing difficulty. The design matrix has been set up to contain a global intercept and the three covariates mentioned above (with age and IQ standardized). The three covariates have been chosen based on the fact that primary subject covariates like age and gender are known to have high explanatory power, especially for capturing differences between subjects (Dosenbach et al., 2010). Similar evidence has also been given in support of fluid intelligence (Dubois, Galdi, Paul, & Adolphs, 2018). It is worth mentioning, however, that even though the HCP data contain a wealth of behavioral scores, we have decided to select only the most standard covariates since there is very little empirical evidence that the available behavioral measures are informative of specific functional processes in the brain. Even less evidence exists about how informative these are to clustering. While we do not discount the potential research value of linking behavioral data to the functional connectivity data, we view this question as secondary to this work and beyond the scope of this article. Another possibility would be to use task-based fMRI data as a subject-specific covariate and to see how these relate to the clustering. This would allow to study more directly the link between specific tasks and resting-state data. However, given that this model is one of the first steps toward linking covariates to clustering, more sophisticated analyses seem beyond the scope of this work.

Clustering results
To this data, we fit Het-Mixed-SBM and Het-SBM using initializations based on the k-means clustering from the R package amap (Lucas, 2014) and the hierarchical clustering from the R package stats (R Core Team, 2017). For the k-means clustering, we perform 1,000 initializations for each subject and the grand average matrix while, for the hierarchical clustering, we perform only one initialization per each subject and the grand average since this method is deterministic. For demonstration purposes, the analysis only considers 10 clusters, and the ICL criterion in both models is used to select the best fits across initializations. The best fits for both models converge to an identical clustering solution, suggesting that the data do not contain noise levels, which would iron out connectivity profiles in some of the clusters as we have observed in our simulations. Therefore, the choice between the fixed-effect and mixed-effect modeling does not seem to impact the selected clustering fit. Although we do not observe differences in the clustering between Het-Mixed-SBM and Het-SBM in our real-data analysis, it is worth mentioning that this does not mean that this situation generalizes to all data sets. For example, it is possible to have similar mean connectivity profiles in some clusters but very different random effects and this would obscure the clustering with fixed effects (Het-SBM).
To understand the fit, in Figure 8, we list the individual nodes in each of the ten fitted blocks. The label of each node is sourced from the 17-network parcellation found in Thomas Yeo et al. (2011), which depicts typical neural activations while a subject is at rest. These labels are the central visual, control (A, B, and C), default (A, B, and C), dorsal attention (A and B), limbic (A and B), peripheral visual, salience/ventral (A and B), somatomotor (A and B), and temporal-parietal networks. The central visual network is engaged in conscious processing of visual stimuli, and it supports a depth of visual perception and conscious awareness of visual stimuli. The network comprises nodes from the striate and extrastriate cortex. The control network is supporting cognitive processes related to working memory, attention and decision making. In the parcellation, this network is split along three components labeled A, B, and C; the control A component consists of nodes in the temporal cortex, intraparietal sulcus, dorsal, lateral and lateral-ventral prefrontal cortex, and midcingulate cortex; the control B component consists of nodes in the temporal cortex, inferior parietal lobule, dorsal, lateral, lateral-ventral, medial-posterior, lateral-dorsal and lateral-ventral prefrontal cortex, and inferior parietal lobule; the control C component consists of nodes in the precuneus and posterior cingulate cortex. The default mode network is active during tasks such as daydreaming, future planning, accessing memories, and thinking about others or themselves. The network is split into three components: A, B, and C; the default A component consists of nodes in the inferior parietal lobule, dorsal prefrontal cortex, precuneus posterior cingulate cortex, medial prefrontal cortex, and temporal cortex; the default B component consists of nodes in the temporal cortex, anterior temporal lobe, inferior parietal lobule, dorsal, lateral, and ventral prefrontal cortex; the default C component consists of nodes in the precuneus, inferior parietal lobule, retrosplenial, and parahippocampal cortex. The dorsal attention network is linked to voluntary orienting of attention to external tasks. It is divided into two components: A and B; component A consists of nodes in the temporal and parietal occipital cortex and superior parietal lobule; component B consists of nodes in the temporal occipital cortex, postcentral cortex, frontal The peripheral visual network is supporting peripheral vision. It consists of nodes in the striate cortex, extrastriate inferior and extrastriate superior cortex. The salience/ventral network is linked to attention such as redirecting our attention to sudden stimuli as well as for deciding which such stimuli are worthy of attention. It is divided into components A and B; component A consists of nodes in the parietal operculum, frontal operculum, insula, parietal medial cortex, frontal medial cortex, and precentral cortex; component B consists of nodes in the cingulate anterior cortex, inferior parietal lobule, lateral prefrontal cortex, medial posterior prefrontal cortex, and dorsal prefrontal cortex. The somatomotor network is linked to the coordination of motor tasks. It consists of components A and B; the component A consists of nodes in the primary somatosensory cortex; the component B consists of nodes in the insula, central cortex, secondary somatosensory cortex, and auditory cortex. The temporal-parietal network contains regions in the temporal parietal cortex.
Next, we summarize each block according to its functional specification (see Figure 9). Block 1 contains regions from five networks: the central visual, somatomotor B, salience/ventral attention B, control A and C networks. Given the diversity of its network associations, it is not easy to attach a unique functional label. Nevertheless, each region in this block seems to be very weakly connected with the other regions in this block and the regions in the other blocks, which can F I G U R E 9 Average connectivity matrix (for mean age, gender, and IQ) thresholded at 10% connectivity rate. The clusters are plotted along a circle, and their relative sizes correspond to their total number of nodes. Within-cluster connections are given in percent, and between-cluster links are provided in terms of their edges whose thicknesses indicate their relative strength of association. Each block is also awarded an approximate functional label explain why they have been clustered together. For this reason, we refer to this block as "weakly connected." By contrast, the remaining blocks are more straightforward to classify functionally. Indeed, Block 2 can be categorized as "visual" as it contains elements from the central and peripheral visual networks, Block 3 as "somatomotor and salience/ventral attention," Block 4 as "dorsal attention," Block 5 as "salience/ventral attention B," Block 6 as "limbic and temporal-parietal," Block 7 as "default mode A-B," Block 8 as "control A," Block 9 as "control B," and Block 10 as "default mode C." Figure 10 shows the average connectivity matrix for mean age, gender, and IQ. The within-cluster connectivity of Block 1 is fragile and weaker than several of its connections with other blocks (e.g., the connectivity between Block 1 and Block 5), suggesting a so-called "disassortative mixing" pattern. The nodes in this block appear to be classified together based on their substantial dissimilarity in connectivity profiles to the nodes in the rest of the networks. This departure from the main clustering patterns may potentially explain its functional heterogeneity, which was identified earlier. It is also interesting to point out that Blocks 2 (visual), 3 (somatomotor and salience/ventral attention), and 4 (dorsal attention) exhibit relatively high within-cluster connectivities but also relatively strong connections between one another. The high within-cluster connectivity is generally expected since regions in close anatomical proximity are involved in local processes. The relatively high connectivity between these clusters suggests an interplay between visual, somatomotor, and attention tasks. Although the components of salience/ventral attention network are split between Blocks 3 and 5 due to their differing connectivity profiles (e.g., Block (3,9) and Block (5,9)), these blocks maintain relatively high mutual connections, which are expected within the same functional network. Blocks 6 F I G U R E 10 Average connectivity matrix for mean age, gender, and IQ. The emerging cluster structure seems to diverge from a classical modular organization and shows evidence of high connectivity patterns between some blocks (e.g., between Blocks 2, 3, and 4, between Blocks 6 and 7, and between Blocks 8 and 10). The blocks are defined according to their connectivity profile and even though Blocks 3 and 4 appear to have similar within-and between-cluster probabilities, they are split into separate blocks on the grounds of their different connections to Blocks 5 and 8. An exception from the overall structural pattern in the cluster structure is Block 1, which has more connections to the other blocks in the network than within its cluster. This pattern is generally known as a disassortative mixing (limbic and temporal-parietal) and 7 (default mode A-B) also exhibit strong mutual connections, while the component C of the default mode network in Block 10 is not as connected to Block 6 as Block 7. Finally, the components A and B of the control network are split into Blocks 8 and 9, respectively.

Inference results
In Figure 11, we show the uncorrected p-values (in −log 10 scale) thresholded at 5% significance for the null hypothesis of no effect for each covariate (age, gender, and IQ). The results are shown for four testing procedures based on the combination of the two models (Het-SBM and Het-Mixed-SBM) and two tests (Wald and permutation tests). The Wald test for Het-SBM exhibits a much higher number of significant blocks than the three other testing procedures. This behavior seems to be consistent with the results shown in Simulation II (see Section 3.3), where we have observed a very strong liberal control of the FPR for the Wald test of Het-SBM. In general, the results of the three other testing procedures seem relatively similar.
In Figure 12, we show the familywise error rate corrected p-values (in −log 10 scale) thresholded at 5% significance for the null hypothesis of no effect for each covariate (age, gender, and IQ) and each testing procedure. As with the uncorrected p-values, the Wald test for Het-SBM F I G U R E 11 Uncorrected significant p-values for the null hypothesis of no effect for each covariate (age, gender, and IQ). The results are given for Het-SBM and Het-Mixed-SBM and in terms of the Wald and the permutation tests. The p-values are stated in terms of −log 10 (p-values) (e.g., the legend score of 36 corresponds to a p-value of 10 −36 ) and the nonsignificant p-values at 5% significance are given in gray exhibits a much higher number of significant blocks than the three other testing procedures. The permutation test for Het-SBM declared two significant results that occur for age and gender in Block (2,3) (between the visual and somatomotor and salience/ventral attention blocks). In contrast to this, the Wald and permutation tests for Het-Mixed-SBM both consistently declared a single significant result that occurs for gender in Block (1,5) (between the weakly connected and salience/ventral attention blocks). As shown in Nichols and Hayasaka (2003), one potential explanation for this discrepancy between the two models is that the permutation test based on the maximum statistic is sensitive to situations where the statistics are not pivotal. As the statistics in Het-SBM do not account for random effects, their null distributions are likely to vary across blocks, especially if the variance of the random effects is changing across blocks. In such cases, F I G U R E 12 Familywise error rate corrected significant p-values for the null hypothesis of no effect for each covariate (age, gender, and IQ). The results are given for Het-SBM and Het-Mixed-SBM and in terms of the Wald and the permutation tests. The p-values are stated in terms of −log 10 (p-values) (e.g., the legend score of 35 corresponds to a p-value of 10 −35 ) and the nonsignificant p-values at 5% significance are given in gray. For the Wald test, the familywise error rate corrected p-values were obtained using a Bonferroni correction. For the permutation test, the familywise error rate corrected p-values were obtained based on the null distribution of the maximum statistics the null distribution of the maximum statistic is likely to be driven by the blocks that bear the most familywise error risks. As a consequence of that, some blocks will be more penalized than others. On the contrary, as Het-Mixed-SBM accounts for some random effects, we may expect the statistics to be more pivotal. This means that the contribution of each block to the null distribution of the maximum statistic is likely to be more homogeneous, penalizing blocks more equally. It is worth mentioning that, based on this explanation, both results are valid even if they do not exhibit the same significant blocks because they are both controlling the familywise error rate.

DISCUSSION
In this article, we proposed a novel multi-subject model, termed Het-Mixed-SBM, which can be used for the analysis of multi-subject binary network data, allowing for random effects per subjects and blocks.
In Simulation I, we validated the clustering ability of the model on a range of different cluster structures, and we have compared its results with the Het-SBM fits. In the instances of clearly delineated cluster structures in which the blocks show starkly segregated connectivity profiles (i.e., the Het-Modular and Core-Modular structures), both models converged to the correct clustering solutions. Nevertheless, in the cluster structure with merged connectivity profiles, such as Blocks 1 and 2 in the weak topology, the variance of the random effects can be used by Het-Mixed-SBM to correctly separate Blocks 1 and 2. Interestingly, for small variations of the random effects, it seemed hard for Het-SBM to split Blocks 1 and 2 across the subjects, and the model tended to underestimate the optimal number of clusters. However, when the random effect variances were larger, Het-SBM yielded an accurate clustering. The overall conclusion of Simulation I is that, for cluster structures with strong clustering evidence, we can expect a similar performance between the two models. Still, for cluster structures with weaker evidence, it might be possible to get erroneous solutions with the fixed-effects model (Het-SBM). In combination with the conclusions of Pavlovic et al. (2020) where it was shown that the fixed-effect covariates might be informative of the clustering, it seems that the random effects can also be useful in guiding the model toward identifying the correct clustering solution in multi-subject networks. As such, it may not be appropriate to do the clustering separately from the logistic regression or logistic-mixed regression.
Having established in Simulation I the influence of random effects on the clustering, in Simulation II, we focused on validating the inference procedures based on the parametric (Wald) and nonparametric (permutation) tests in Het-SBM and Het-Mixed-SBM. As previously anticipated, the Wald test in Het-SBM showed highly inflated FPRs in the presence of random effects in the data, and such results seem to improve with the permutation testing. By contrast, the Wald and permutation testing in Het-Mixed-SBM showed more consistent results. The only departure from this trend appears in the particular block of the Het-Modular structure whose low connectivity rate introduced samples with hugely disproportionate ratios of zeros to ones, which are known to yield highly biased estimates and inflated FPRs for the Wald test. To remove this type of bias, one solution would be to use Firth-type estimators like in Het-SBM. Nevertheless, this is not straightforward due to the presence of random effects in the model. From a purely practical viewpoint, in such circumstances, it may be advisable to resort to a more straightforward alternative like permutation testing. Nevertheless, for probability rates that are away from the zero/one boundaries, it may be simpler to use the Wald test.
In the analysis of the HCP data set, we obtained identical clustering fits for Het-SBM and Het-Mixed-SBM. The clustering fit highlighted functionally meaningful blocks. The inference results based on the Wald and permutation testing in Het-SBM showed widely different results. These results echo the conclusions of Simulation II, which suggested a liberal behavior of the Wald test with Het-SBM. By contrast, there seems to be a high degree of agreement between the Wald and the permutation testing in Het-Mixed-SBM. These results seem to reinforce the notion that it is important to account for random effects to better model the data. A shortcoming of our real-data analysis is that we fixed the total number of clusters to 10 to ensure that the blocks are reasonably sized for realistic inferences and to avoid potential small sample issues. Further analyses using higher numbers of clusters would be useful to uncover finer-grained fits that might be more informative.
One shortcoming of Het-Mixed-SBM is that it only considers a random intercept per block as random effects. While this is an improvement upon the classic Het-SBM which does not model any form of within-subject dependencies, it assumes that, for a given block, the within-subject dependencies are the same between all the within-subject connections. This assumption may be too restrictive for repeated measures data as we may expect the dependencies to be more complicated. For example, in the case of longitudinal networks, we may expect the dependencies to decrease with time, which would not be modeled with our current version of Het-Mixed-SBM. One potential remedy in modeling more complex types of within-subject dependencies is to include additional random effects to the model. For example, in the case of longitudinal networks, adding a random effect of time would allow to model dependencies that also vary between time points. Similar considerations might be given to data that combined different imaging modalities (e.g., a combination of diffusion tensor imaging and resting-state fMRI).
Finally, as part of our future work, we will also consider a potentially useful simplification of our model that may speed up computation while accounting for subject-level randomness. Specifically, we intend to consider a model that combines "homogeneous random effects" (per-subject random intercepts common to the entire network) with "heterogeneous fixed effects" regression for each block element. We expect this model to have a similar clustering potential as the Het-Mixed-SBM but with a much reduced computational burden.

SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of this article.

APPENDIX A. PARAMETER ESTIMATION IN HET-MIXED-SBM
For the clarity of the subsequent discussions and easy referencing, we provide a list of integrals I qlk1 − I qlk6 that will be used in the estimating equations of and 2 . Thus, we have Taking the partial derivatives of the variational bound  (f * (z; ); , , 2 ) (Equation (10)) with respect to ql and 2 ql , for an individual block (q, l), we can form a score vector U( ql , 2 ql ) as a (P + 1) × 1 vector of first derivatives U( ql , 2 ql ) = (U( ql ) ⊤ , U( 2 ql )) ⊤ whose equations are given as and finally the estimate of a starting point iŝ

APPENDIX C. REPARAMETRIZATION OF THE INTEGRALS IN THE HET-MIXED-SBM
At each step of the Newton-Raphson algorithm, we require a numerical approximation of the six integrals (Equations (A1)-(A6)). In practice, such integrals can be reasonably well estimated with the adaptive Gauss-Hermite quadrature approximation (Lesaffre & Spiessens, 2001;Liu & Pierce, 1994) whose implementation in R is available via the function integrate (Piessens, Doncker-Kapenga, Überhuber, & Kahaner, 1983;R Core Team, 2015). Although this computational strategy offers relatively quick and accurate approximations, in a few examples, we encountered some instabilities and numerical issues. First, some integrals were evaluated as zeros, which caused the variational bound to diverge to −∞. The main reason for this was that the maximal value attained by the function h qlk was very small (e.g., −1,000), making the integrand numerically equal to zero over the whole domain of integration. To treat this point of numerical instability, we can add an offset value c qlk to the exponent h qlk . In particular, this offset value can be chosen to be the negative maximum of h qlk , taken with respect to r qlk , while iterative values of Using this, we can now write the integral ∫ +∞ −∞ e h qlk dr qlk = e −c qlk ∫ +∞ −∞ e h qlk +c qlk dr qlk and the edge based component of the variational bound is given as (C3)