Tutorial: Matlab Libraries for Civet

This page provides a brief tutorial into the use of Andrew Reid's Matlab libraries for loading, processing, analyzing, and outputting Civet data. Updated 22/05/2013.

Introduction

Below I will outline step-by-step how to load Civet data into Matlab and perform some basic processing and analysis, including statistics using the SurfStat library of Keith Worsley, and the construction of ROI-wise correlation matrices.

For this tutorial, you will need the following Matlab codes*:

http://resources.modelgui.org/mgui/wiki/neuro/tutorials/adni_areid_040512.zip
http://resources.modelgui.org/mgui/wiki/neuro/tutorials/fdr_bky.zip
http://resources.modelgui.org/mgui/wiki/neuro/tutorials/gretna.zip
http://resources.modelgui.org/mgui/wiki/neuro/tutorials/lib_areid.zip

* Note, these are experimental scripts and are in no way guaranteed by the author. They are also not part of the ModelGUI Project. That being said, if you'd like to alert me to bugs please email Andrew at ten.yllacipyt|werdna#ten.yllacipyt|werdna.

Loading Civet output

The Civet pipeline (see here) produces its output as a folder containing a single subfolder for each processed subject, with that subject's unique identifier (UID) as the folder name. Each subject folder contains a set of subfolders containing specific output of the pipeline. The folders are organized as follows:

classify
The results of the tissue classification: i.e., partial volume estimates (PVEs) for grey matter (GM), white matter (WM), and cerebrospinal fluid (CSF); as Minc format volume files.
final
Minc volume files, in stereotaxic space (final), and transformed by linear (tal) and nonlinear (nl) transformations.
logs
Logging files for the Civet pipeline; this is useful, for instance, if the pipeline fails
mask
Brain, skull, and cortex masks; Minc format volume files
native
The native image, and the nonuniformity-corrected (NUC) image
surfaces
All output surfaces, including raw, resampled, and calibrated surfaces
temp
Some random stuff produced in the intermediate process
thickness
Computed cortical thickness (CT; in native space), both raw and resampled
transforms
The actual linear and non-linear transforms which warp from native to stereotaxic space, as well as the surface registration map
verify
Images to verify that the output is valid
tutorial_civet_dir_window.png

Above: typical Civet output for a number of subjects.

Having obtained some Civet output for our subjects, we now want to load it into Matlab. This can be done easily using the load_civet_data function. To load the above output into a variable called "Y", I did the following:

>> help load_civet_data
 LOAD_CIVET_DATA Loads vertex-wise CT data from a Civet output 
                 directory, for a list of subjects

  PARAMETERS:

  civet_dir         The Civet output directory
  subjects          A list of subject ids corresponding to subject
                    directories in civet_dir; pass an empty array [] to
                    process all available subjects
  prefix            The prefix for the data files (part preceding the
                    subject id in the filename)
  suffix            The prefix for the data files (part following the
                    subject id and preceding the '_left' or 'right' in the 
                    filename) 
  hemi              Hemisphere; one of 'left', 'right', 'both' (default)
  subdir            Data subdirectory; default is 'thickness' 
  ext               File extension (include dot); default is '.txt'

 AUTHOR: Andrew Reid, BIC, MNI 2011

>> Y=load_civet_data('C:\Users\Andrew\Documents\Projects\ADNI\civet_data',[],'adni','native_rms_rsl_tlink_20mm');
8 x 2 files to read, % remaining: 100 Done
>>

This results in an 8x81942 matrix, with 8 subjects and 40971 vertices per hemisphere. Clearly 8 subjects is insufficient to obtain any meaningful statistics, but you will likely have more subjects than this. To learn how to analyze these data, I cannot improve upon Keith's page.

Nuisance variables

Suppose you have some nuisance variable whose variance you'd like to remove from the raw CT measures. This could include sex, age, or mean CT (if you are only interested in relative correlations; think hard about this last one however). To do this, you'll need subject-wise values converted to SurfStat terms. Supposing you have vectors age (double), sex (cell strings), and mean_ct (double), do the following to obtain residuals:

>> Age=term(age);
>> Sex=term(sex);
>> MT=term(mean_ct);
>> M=1+Age+Sex+Age*Sex+MT;
>> slm=SurfStatLinMod2(Y,M)

slm = 

        X: [8x7 double]
       df: 3
     coef: [7x81924 double]
      SSE: [1x81924 double]
    resid: [8x81924 double]

>> Y_resid=single(slm.resid);
>>

You can now operate with these residuals as you would with your raw values, having removed the effect of your nuisance variables from the variance.

Regions of interest

For some purposes it is useful to parcellate the cortical surface into anatomical regions-of-interest (ROIs). The Automated anatomical labelling (AAL) atlas is a popular cortical parcellation [1]. To do this we need a vector of integers labelling each vertex as belonging to a particular ROI. Download:

aal_civet.mat
The AAL template; vertex-wise values, index/name map, and ROI centroids; see also readme and license files.

…and load it into your Matlab session.

We can compute mean thickness for ROIs using the get_regional_means function:

>> help get_regional_means

  get_regional_means(Y, labels, label_list)

  Computes a mean for each ROI in label_list (an R-by-1 vector of integer 
  label ids) from labels (an N-by-1 vector of labels) and Y (an N-by-M 
  vector of values).

  Returns: regional_means, an M-by-R matrix of mean values

  Author: Andrew Reid, BIC, MNI 2011

>> roi_means=get_regional_means(Y',aal_labels,aal_list);
>> roi_means_resid=get_regional_means(Y_resid',aal_labels,aal_list);
>>

This will give you an 8x74 matrix, with ROI means (raw CT or residuals) for each subject.

Correlative networks

One application of ROI-wise means is to create a correlative network; i.e, compute Pearson coefficients between each pair ROIi, ROIj. To do this, use the function get_correlation_matrix2:

 help get_correlation_matrix2
  GET_CORRELATION_MATRIX() Computes a correlation matrix C from the N x S
  input matrix Y, and threshold using p and (optionally) FDR q values. Also
  optionally returns the effect sizes (betas)

  PARAMETERS:
    Y           An N x S matrix, where N is the number of subjects, and 
                S is the number of values per subject
    threhsold   [optional] Threshold for p and q values (default = 0.05)
    do_fdr      [optional] If true, apply an FDR threshold (default =
                false)

    do_regress  [optional] If true, also compute regression + residuals; if
                false, the B and R values will be empty matrices

  RETURNS:
    C           An S x S matrix of correlation coefficients, thresholded by
                p- and optionally q-values

    P           The corresponding p-values

    B           S x S x 2 matrix of corresponding effect sizes (betas)

    R           S x S x N matrix of residuals for each pairwise regression

  AUTHOR: Andrew Reid, MNI, 2011

>> C=get_correlation_matrix2(roi_means);
>>

This will give you a 74x74 correlation matrix, thresholded by p = 0.05, but uncorrected for multiple comparisons. To view the matrix, type:

>> figure
>> imagesc(C)

Uncorrected p-values are not exactly kosher; we can apply a false discovery rate (FDR) correction with the command:

>> C_corr=get_correlation_matrix2(roi_means,0.05,true);
>> figure
>> imagesc(C_corr)
>>

The resulting matrices, for my data, are:

tutorial_civet_corr_both.png

Z comparisons (Fisher transformation)

Suppose you have two groups (say, Alzheimer's patients, AD, and normal controls, NC) and you want to compare their correlation matrices. Differences in ROI-wise correlations may convey some information about the specific connections affected in AD, for instance. These matrices, computed from residuals as above, are called R_ad and R_nc. Note: a major difference is that, for a Z comparison, you should not apply any threshold to the correlation coefficients, as this can falsely inflate the differences. Thus, the matrices should be obtained by calling get_correlation_matrix2(roi_means, 1.0, false) (thanks to Nick Foster for pointing this out).

Correlation coefficients can be compared by first converting them to z-scores using the Fisher transformation and comparing the difference to the normal distribution to obtain a p-value. The following obtains a matrix Z_ad_nc_sig, which contains the z-differences for our two matrices, thresholded at p=0.05.

>> help z_comparison
  Compares correlation matrices R1 and R2, given a sample size of N. Does
  this by converting the R values to Z values and comparing the difference
  to the normal distribution.

  PARAMETERS:

  R1, R2        - The correlation matrices
  N             - The sample size

  OUTPUT:

  D             - The difference as a z-score
  P             - The p-values associated with D

  AUTHOR: Andrew Reid, MNI, 2012

>> [D, P]=z_comparison(R_ad, R_nc, 8);
>> Z_ad_nc_sig=D.*(P<0.05);
>>

This gives you a matrix of differences, but again they haven't been corrected for multiple comparisons. To do this you can apply an FDR correction using the fdr_bky function, as follows:

>> help fdr_bky
Contents of fdr_bky:

fdr_bky                        - () - Executes the "two-stage" Benjamini, Krieger, & Yekutieli (2006)

fdr_bky is both a directory and a function.

  fdr_bky() - Executes the "two-stage" Benjamini, Krieger, & Yekutieli (2006)
              procedure for controlling the false discovery rate (FDR) of a
              family of hypothesis tests. FDR is the expected proportion of
              rejected hypotheses that are mistakenly rejected (i.e., the null
              hypothesis is actually true for those tests). FDR is a
              somewhat less conservative/more powerful method for correcting
              for multiple comparisons than procedures like Bonferroni
              correction that provide strong control of the family-wise
              error rate (i.e., the probability that one or more null
              hypotheses are mistakenly rejected).
                The procedure implemented by this function is more powerful
              than the original Benjamini & Hochberg (1995) procedure when 
              a considerable percentage of the hypotheses in the family are
              false. To the best of my knowledge, this procedure is only
              guaranteed to control FDR if the tests are independent.
              However, simulations suggest that it can control FDR even
              when the tests are positively correlated (Benjamini et al., 
              2006).

  Usage:
   >> [h, crit_p]=fdr_bky(pvals,q,report);

  Required Input:
    pvals - A vector or matrix (two dimensions or more) containing the
            p-value of each individual test in a family of tests.

  Optional Inputs:
    q       - The desired false discovery rate. {default: 0.05}
    report  - ['yes' or 'no'] If 'yes', a brief summary of FDR results are
              output to the MATLAB command line {default: 'no'}

  Outputs:
    h       - A binary vector or matrix of the same size as the input "pvals."
              If the ith element of h is 1, then the test that produced the 
              ith p-value in pvals is significant (i.e., the null hypothesis
              of the test is rejected).
    crit_p  - All p-values less than or equal to crit_p are significant
              (i.e., their null hypotheses are rejected).  If no p-values are
              significant, crit_p=0.

  References:
    Benjamini, Y., Krieger, A.M., & Yekutieli, D. (2006) Adaptive linear 
      step-up procedures that control the false discovery rate. Biometrika.
      93(3), 491-507.

    Benjamini, Y. & Hochberg, Y. (1995) Controlling the false discovery
      rate: A practical and powerful approach to multiple testing. Journal
      of the Royal Statistical Society, Series B (Methodological). 57(1),
      289-300.

  Example:
    [dummy p_null]=ttest(randn(12,15)); %15 tests where the null hypothesis
                                        %is true
    [dummy p_effect]=ttest(randn(12,5)+1); %5 tests where the null
                                           %hypothesis is false
    [h crit_p]=fdr_bky([p_null p_effect],.05,'yes');

  Author:
  David M. Groppe
  Kutaslab
  Dept. of Cognitive Science
  University of California, San Diego
  March 25, 2010

>> fdr=fdr_bky(P,0.05);
>> Z_ad_nc_sig=Z_ad_nc_sig.*fdr;

..however, you may notice that your correlation matrix is symmetric; thus the bottom and top triangular parts are identical. To apply a correction only to the independent values, you want only the top half. Two convenience functions allow you to generate FDR values in this way:

>> P_top=get_top(P);
>> fdr_top=fdr_bky(P_top);
>> fdr=put_bottom_back_on(fdr_top,74);
>>

Binary graph analysis

Another way of analyzing ROI-wise correlation matrices is by extracting graph theoretic measures from them. This approach makes some assumptions about the nature of the Pearson coefficients, namely that they represent connectivity between two regions (e.g., via mutually trophic influence). To extract graph measures (local, global, regional efficiency; see [2]), and compare these measures across groups, we need to use a bootstrapping (with replacement) approach, from which we can obtain two distributions that can be compared.

This section covers binarized graphs, in which edges are determined by thresholding: either by the Pearson coefficients directly, or by the "sparsity" of the graph. The latter approach ensures that the graphs being compared have the same number of edges, but may also force an arbitrary configuration on a graph by setting different Pearson thresholds. This can be appreciated from the following plot of sparsity vs. Pearson coefficient for two groups with different overall correlation strengths:

tutorial_civet_sparsity_pearson.png

In this case we will threshold by sparsity, across a range of sparsity values, and compare the integral of the graph measures under the line over this range. To begin, we need to specify parameters, using the analyze_corr_net_binary_params function, which simply produces a struct of default values:

>> params=analyze_corr_net_binary_params

params = 

ans = 

          thr_min: 0.1000
          thr_max: 0.6000
         thr_step: 0.1000
           boot_n: 20
           boot_m: 100
        int_start: 0.1000
          int_end: 0.6000
    integral_type: 'trap'
       remove_neg: 1
       show_plots: 1
       keep_boots: 0
     threshold_by: 'sparsity'
        fdr_thres: 0.0500
         fdr_type: 'normal'

>>

These parameters define:

thr_min, thr_max, thr_step
The range of threshold values to use, and the increment between them
boot_n
The number of bootstrap samples to generate; this should be sufficiently high to avoid spurious results, but is the main determinant of processing time (per threshold increment)
boot_m
The size of each bootstrap sample
int_start, int_end
The range over which to compute the integrals for comparison
integral_type
The type of integration to perform; one of 'trap' or 'fit'
remove_neg
Indicates whether negative correlations are to be removed prior to processing
show_plots
If true, will produce a set of plots illustrating the results
keep_boots
Whether to retain the bootstrap samples in the result
threshold_by
The type of threshold; one of 'sparsity' or 'pearson'

Next, we want to perform the group-wise analysis. For this purpose you will need your mean thickness values or residuals (roi_means or roi_means_resid from above), a list of group assignments, one per each subject, and a list of groups. For example, if you want to compare AD and NC groups, your groups variable will be populated with either 'AD' or 'NC' assignments, and your group_list variable will be [{'AD'},{'NC'}]. In the example below, I use a large(r) data set (101 subjects) and three groups (Normal, MCI, and AD; where MCI = mild cognitive impairment):

>> help analyze_corr_net_binary
 Analyzes regional thickness values by generating bootstrap samples (with
  replacement) and computing graph efficiency measures on each sample. This
  method generates binary graphs using a range of threshold values,
  specified by the params argument. Only positive values are used for graph
  construction.

  Thresholds can be 'sparsity' or 'pearson' coefficients.

  PARAMETERS:

   X                        ROI-wise thickness values (or residuals); R x N
                            where N = no. subjects, R = no. ROIs
   groups                   N x 1 list of group assignments, as cell
                            strings
   group_list               G x 1 list of groups to analyze
   params                   Parameters defining the analysis; see
                            analyze_corr_net_binary_params

  RETURNS:
    A result structure containing all bootstrap results, as well as mean,
    standard deviation, and integrals calculated across a threshold range
    (this is the entire range unless specified by the params argument).

  Andrew Reid
  Montreal Neurological Institute
  2011

>> group_list=[{'Normal'},{'MCI'},{'AD'}];
>> result = analyze_corr_net_binary(roi_means',groups,group_list,params)

Analyzing Normal ......OK.
Analyzing MCI ......OK.
Analyzing AD ......OK.
Computing stats and integrals...
Done.

result = 

        Normal: [1x1 struct]
    thresholds: [0.1000 0.2000 0.3000 0.4000 0.5000 0.6000]
           MCI: [1x1 struct]
            AD: [1x1 struct]
         ttest: [1x1 struct]

NOTE: The roi_means matrix must be transposed before being passed to analyze_corr_net_binary. Failing to do this may not result in an error, and may even result in significant results due to the structure of the data, so this is important to note.

The function will also output four figures, as shown:

tutorial_civet_graph_eff.png

These aren't particularly impressive results, but this may be due to a number of factors: a smallish N (101; I know, I said it was large but neuroimaging datasets are getting larger by the day), a small number of bootstraps (20), and a large threshold increment (0.1). Ideally you will want a larger N if possible, 500-1000 bootstrap samples, and an increment of 0.5 or even 0.25 if you have time to let it run (this makes the curves smoother and better defines the trend).

Weighted graph analysis

Binary graph analysis is conceptually problematic, in that distance is not accounted for — despite the obvious observation that metabolic cost (both for signal transmission, spatial requirements, and tissue maintenance) with be proportional to distance. We can start to address this issue if we have a good approximation of interregional connection distances. Euclidean distance between regional centroids is the easiest way to approximate this, but it is a very poor one, because it ignores the convoluted shape of cortical white matter — particularly for extreme anterior or posterior contralateral ROIs.

Perhaps the best available method for approximating interregional distances, at present, is via diffusion-weighted imaging (DWI) tractography. Most tractography tools (e.g., FSL's probtrackx) will provide an average length for each pair of seed/target ROIs. If no DWI data are available, the next best option is to compute shortest geometric paths through the white matter, as described in Bojak et al., 2010 [3]*.

Weighted analysis is very similar to the binary analysis described above. The parameters are obtained via analyze_corr_net_weighted_params.m:

>> params=analyze_corr_net_weighted_params;
>> params.distances=aal_dist_76

params = 

          thr_min: 0.1000
          thr_max: 0.6000
         thr_step: 0.1000
           boot_n: 20
           boot_m: 100
        int_start: 0.1000
          int_end: 0.6000
    integral_type: 'trap'
       remove_neg: 1
       show_plots: 1
       keep_boots: 0
     threshold_by: 'sparsity'
        fdr_thres: 0.0500
         fdr_type: 'normal'
      window_size: [600 500]
        distances: [76x76 double]

The only change is that we now must set the distances field; in this case I have a distance variable called aal_dist_76, which I assign to params.distances. Next we pass our data in the same way as before, to the function analyze_corr_net_weighted:

results=analyze_corr_net_weighted(roi_means',groups,group_list,params);

And you get similar output as before, only distance-weighted.

* If you are interesting in obtaining code for the shortest-distance-via-WM method, please contact ac.lligcm.inm.cib|diera#ac.lligcm.inm.cib|diera.

References

1. Tzourio-Mazoyer N et al. Automated anatomical labelling of activations in spm using a macroscopic anatomical parcellation of the MNI MRI single subject brain. Neuroimage 2002; 15: 273-289. [PubMed]
2. Latora V, Marchiori M. Efficient behavior of small-world networks. Phys. Rev. Lett. 2001; 87, 198701 [PubMed]
3. Bojak I, Oostendorp TF, Reid AT, Kötter R. Towards a model-based integration of co-registered electroencephalography/functional magnetic resonance imaging data with realistic neural population meshes. Phil. Trans. Roy. Soc. A 2010; 369: 1952. [PubMed]

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License