This page provides a brief tutorial into the use of Andrew Reid's Matlab libraries for loading, processing, analyzing, and outputting Civet data. Updated 22/05/2013.
Introduction
Below I will outline step-by-step how to load Civet data into Matlab and perform some basic processing and analysis, including statistics using the SurfStat library of Keith Worsley, and the construction of ROI-wise correlation matrices.
For this tutorial, you will need the following Matlab codes*:
http://resources.modelgui.org/mgui/wiki/neuro/tutorials/adni_areid_040512.zip
http://resources.modelgui.org/mgui/wiki/neuro/tutorials/fdr_bky.zip
http://resources.modelgui.org/mgui/wiki/neuro/tutorials/gretna.zip
http://resources.modelgui.org/mgui/wiki/neuro/tutorials/lib_areid.zip
* Note, these are experimental scripts and are in no way guaranteed by the author. They are also not part of the ModelGUI Project. That being said, if you'd like to alert me to bugs please email Andrew at ten.yllacipyt|werdna#ten.yllacipyt|werdna.
Loading Civet output
The Civet pipeline (see here) produces its output as a folder containing a single subfolder for each processed subject, with that subject's unique identifier (UID) as the folder name. Each subject folder contains a set of subfolders containing specific output of the pipeline. The folders are organized as follows:
- classify
- The results of the tissue classification: i.e., partial volume estimates (PVEs) for grey matter (GM), white matter (WM), and cerebrospinal fluid (CSF); as Minc format volume files.
- final
- Minc volume files, in stereotaxic space (final), and transformed by linear (tal) and nonlinear (nl) transformations.
- logs
- Logging files for the Civet pipeline; this is useful, for instance, if the pipeline fails
- mask
- Brain, skull, and cortex masks; Minc format volume files
- native
- The native image, and the nonuniformity-corrected (NUC) image
- surfaces
- All output surfaces, including raw, resampled, and calibrated surfaces
- temp
- Some random stuff produced in the intermediate process
- thickness
- Computed cortical thickness (CT; in native space), both raw and resampled
- transforms
- The actual linear and non-linear transforms which warp from native to stereotaxic space, as well as the surface registration map
- verify
- Images to verify that the output is valid
Above: typical Civet output for a number of subjects.
Having obtained some Civet output for our subjects, we now want to load it into Matlab. This can be done easily using the load_civet_data function. To load the above output into a variable called "Y", I did the following:
>> help load_civet_data
LOAD_CIVET_DATA Loads vertex-wise CT data from a Civet output
directory, for a list of subjects
PARAMETERS:
civet_dir The Civet output directory
subjects A list of subject ids corresponding to subject
directories in civet_dir; pass an empty array [] to
process all available subjects
prefix The prefix for the data files (part preceding the
subject id in the filename)
suffix The prefix for the data files (part following the
subject id and preceding the '_left' or 'right' in the
filename)
hemi Hemisphere; one of 'left', 'right', 'both' (default)
subdir Data subdirectory; default is 'thickness'
ext File extension (include dot); default is '.txt'
AUTHOR: Andrew Reid, BIC, MNI 2011
>> Y=load_civet_data('C:\Users\Andrew\Documents\Projects\ADNI\civet_data',[],'adni','native_rms_rsl_tlink_20mm');
8 x 2 files to read, % remaining: 100 Done
>>
This results in an 8x81942 matrix, with 8 subjects and 40971 vertices per hemisphere. Clearly 8 subjects is insufficient to obtain any meaningful statistics, but you will likely have more subjects than this. To learn how to analyze these data, I cannot improve upon Keith's page.
Nuisance variables
Suppose you have some nuisance variable whose variance you'd like to remove from the raw CT measures. This could include sex, age, or mean CT (if you are only interested in relative correlations; think hard about this last one however). To do this, you'll need subject-wise values converted to SurfStat terms. Supposing you have vectors age (double), sex (cell strings), and mean_ct (double), do the following to obtain residuals:
>> Age=term(age);
>> Sex=term(sex);
>> MT=term(mean_ct);
>> M=1+Age+Sex+Age*Sex+MT;
>> slm=SurfStatLinMod2(Y,M)
slm =
X: [8x7 double]
df: 3
coef: [7x81924 double]
SSE: [1x81924 double]
resid: [8x81924 double]
>> Y_resid=single(slm.resid);
>>
You can now operate with these residuals as you would with your raw values, having removed the effect of your nuisance variables from the variance.
Regions of interest
For some purposes it is useful to parcellate the cortical surface into anatomical regions-of-interest (ROIs). The Automated anatomical labelling (AAL) atlas is a popular cortical parcellation [1]. To do this we need a vector of integers labelling each vertex as belonging to a particular ROI. Download:
- aal_civet.mat
- The AAL template; vertex-wise values, index/name map, and ROI centroids; see also readme and license files.
…and load it into your Matlab session.
We can compute mean thickness for ROIs using the get_regional_means function:
>> help get_regional_means
get_regional_means(Y, labels, label_list)
Computes a mean for each ROI in label_list (an R-by-1 vector of integer
label ids) from labels (an N-by-1 vector of labels) and Y (an N-by-M
vector of values).
Returns: regional_means, an M-by-R matrix of mean values
Author: Andrew Reid, BIC, MNI 2011
>> roi_means=get_regional_means(Y',aal_labels,aal_list);
>> roi_means_resid=get_regional_means(Y_resid',aal_labels,aal_list);
>>
This will give you an 8x74 matrix, with ROI means (raw CT or residuals) for each subject.
Correlative networks
One application of ROI-wise means is to create a correlative network; i.e, compute Pearson coefficients between each pair ROIi, ROIj. To do this, use the function get_correlation_matrix2:
help get_correlation_matrix2
GET_CORRELATION_MATRIX() Computes a correlation matrix C from the N x S
input matrix Y, and threshold using p and (optionally) FDR q values. Also
optionally returns the effect sizes (betas)
PARAMETERS:
Y An N x S matrix, where N is the number of subjects, and
S is the number of values per subject
threhsold [optional] Threshold for p and q values (default = 0.05)
do_fdr [optional] If true, apply an FDR threshold (default =
false)
do_regress [optional] If true, also compute regression + residuals; if
false, the B and R values will be empty matrices
RETURNS:
C An S x S matrix of correlation coefficients, thresholded by
p- and optionally q-values
P The corresponding p-values
B S x S x 2 matrix of corresponding effect sizes (betas)
R S x S x N matrix of residuals for each pairwise regression
AUTHOR: Andrew Reid, MNI, 2011
>> C=get_correlation_matrix2(roi_means);
>>
This will give you a 74x74 correlation matrix, thresholded by p = 0.05, but uncorrected for multiple comparisons. To view the matrix, type:
>> figure
>> imagesc(C)
Uncorrected p-values are not exactly kosher; we can apply a false discovery rate (FDR) correction with the command:
>> C_corr=get_correlation_matrix2(roi_means,0.05,true);
>> figure
>> imagesc(C_corr)
>>
The resulting matrices, for my data, are:
Z comparisons (Fisher transformation)
Suppose you have two groups (say, Alzheimer's patients, AD, and normal controls, NC) and you want to compare their correlation matrices. Differences in ROI-wise correlations may convey some information about the specific connections affected in AD, for instance. These matrices, computed from residuals as above, are called R_ad and R_nc. Note: a major difference is that, for a Z comparison, you should not apply any threshold to the correlation coefficients, as this can falsely inflate the differences. Thus, the matrices should be obtained by calling get_correlation_matrix2(roi_means, 1.0, false) (thanks to Nick Foster for pointing this out).
Correlation coefficients can be compared by first converting them to z-scores using the Fisher transformation and comparing the difference to the normal distribution to obtain a p-value. The following obtains a matrix Z_ad_nc_sig, which contains the z-differences for our two matrices, thresholded at p=0.05.
>> help z_comparison
Compares correlation matrices R1 and R2, given a sample size of N. Does
this by converting the R values to Z values and comparing the difference
to the normal distribution.
PARAMETERS:
R1, R2 - The correlation matrices
N - The sample size
OUTPUT:
D - The difference as a z-score
P - The p-values associated with D
AUTHOR: Andrew Reid, MNI, 2012
>> [D, P]=z_comparison(R_ad, R_nc, 8);
>> Z_ad_nc_sig=D.*(P<0.05);
>>
This gives you a matrix of differences, but again they haven't been corrected for multiple comparisons. To do this you can apply an FDR correction using the fdr_bky function, as follows:
>> help fdr_bky
Contents of fdr_bky:
fdr_bky - () - Executes the "two-stage" Benjamini, Krieger, & Yekutieli (2006)
fdr_bky is both a directory and a function.
fdr_bky() - Executes the "two-stage" Benjamini, Krieger, & Yekutieli (2006)
procedure for controlling the false discovery rate (FDR) of a
family of hypothesis tests. FDR is the expected proportion of
rejected hypotheses that are mistakenly rejected (i.e., the null
hypothesis is actually true for those tests). FDR is a
somewhat less conservative/more powerful method for correcting
for multiple comparisons than procedures like Bonferroni
correction that provide strong control of the family-wise
error rate (i.e., the probability that one or more null
hypotheses are mistakenly rejected).
The procedure implemented by this function is more powerful
than the original Benjamini & Hochberg (1995) procedure when
a considerable percentage of the hypotheses in the family are
false. To the best of my knowledge, this procedure is only
guaranteed to control FDR if the tests are independent.
However, simulations suggest that it can control FDR even
when the tests are positively correlated (Benjamini et al.,
2006).
Usage:
>> [h, crit_p]=fdr_bky(pvals,q,report);
Required Input:
pvals - A vector or matrix (two dimensions or more) containing the
p-value of each individual test in a family of tests.
Optional Inputs:
q - The desired false discovery rate. {default: 0.05}
report - ['yes' or 'no'] If 'yes', a brief summary of FDR results are
output to the MATLAB command line {default: 'no'}
Outputs:
h - A binary vector or matrix of the same size as the input "pvals."
If the ith element of h is 1, then the test that produced the
ith p-value in pvals is significant (i.e., the null hypothesis
of the test is rejected).
crit_p - All p-values less than or equal to crit_p are significant
(i.e., their null hypotheses are rejected). If no p-values are
significant, crit_p=0.
References:
Benjamini, Y., Krieger, A.M., & Yekutieli, D. (2006) Adaptive linear
step-up procedures that control the false discovery rate. Biometrika.
93(3), 491-507.
Benjamini, Y. & Hochberg, Y. (1995) Controlling the false discovery
rate: A practical and powerful approach to multiple testing. Journal
of the Royal Statistical Society, Series B (Methodological). 57(1),
289-300.
Example:
[dummy p_null]=ttest(randn(12,15)); %15 tests where the null hypothesis
%is true
[dummy p_effect]=ttest(randn(12,5)+1); %5 tests where the null
%hypothesis is false
[h crit_p]=fdr_bky([p_null p_effect],.05,'yes');
Author:
David M. Groppe
Kutaslab
Dept. of Cognitive Science
University of California, San Diego
March 25, 2010
>> fdr=fdr_bky(P,0.05);
>> Z_ad_nc_sig=Z_ad_nc_sig.*fdr;
..however, you may notice that your correlation matrix is symmetric; thus the bottom and top triangular parts are identical. To apply a correction only to the independent values, you want only the top half. Two convenience functions allow you to generate FDR values in this way:
>> P_top=get_top(P);
>> fdr_top=fdr_bky(P_top);
>> fdr=put_bottom_back_on(fdr_top,74);
>>
Binary graph analysis
Another way of analyzing ROI-wise correlation matrices is by extracting graph theoretic measures from them. This approach makes some assumptions about the nature of the Pearson coefficients, namely that they represent connectivity between two regions (e.g., via mutually trophic influence). To extract graph measures (local, global, regional efficiency; see [2]), and compare these measures across groups, we need to use a bootstrapping (with replacement) approach, from which we can obtain two distributions that can be compared.
This section covers binarized graphs, in which edges are determined by thresholding: either by the Pearson coefficients directly, or by the "sparsity" of the graph. The latter approach ensures that the graphs being compared have the same number of edges, but may also force an arbitrary configuration on a graph by setting different Pearson thresholds. This can be appreciated from the following plot of sparsity vs. Pearson coefficient for two groups with different overall correlation strengths:
In this case we will threshold by sparsity, across a range of sparsity values, and compare the integral of the graph measures under the line over this range. To begin, we need to specify parameters, using the analyze_corr_net_binary_params function, which simply produces a struct of default values:
>> params=analyze_corr_net_binary_params
params =
ans =
thr_min: 0.1000
thr_max: 0.6000
thr_step: 0.1000
boot_n: 20
boot_m: 100
int_start: 0.1000
int_end: 0.6000
integral_type: 'trap'
remove_neg: 1
show_plots: 1
keep_boots: 0
threshold_by: 'sparsity'
fdr_thres: 0.0500
fdr_type: 'normal'
>>
These parameters define:
- thr_min, thr_max, thr_step
- The range of threshold values to use, and the increment between them
- boot_n
- The number of bootstrap samples to generate; this should be sufficiently high to avoid spurious results, but is the main determinant of processing time (per threshold increment)
- boot_m
- The size of each bootstrap sample
- int_start, int_end
- The range over which to compute the integrals for comparison
- integral_type
- The type of integration to perform; one of 'trap' or 'fit'
- remove_neg
- Indicates whether negative correlations are to be removed prior to processing
- show_plots
- If true, will produce a set of plots illustrating the results
- keep_boots
- Whether to retain the bootstrap samples in the result
- threshold_by
- The type of threshold; one of 'sparsity' or 'pearson'
Next, we want to perform the group-wise analysis. For this purpose you will need your mean thickness values or residuals (roi_means or roi_means_resid from above), a list of group assignments, one per each subject, and a list of groups. For example, if you want to compare AD and NC groups, your groups variable will be populated with either 'AD' or 'NC' assignments, and your group_list variable will be [{'AD'},{'NC'}]. In the example below, I use a large(r) data set (101 subjects) and three groups (Normal, MCI, and AD; where MCI = mild cognitive impairment):
>> help analyze_corr_net_binary
Analyzes regional thickness values by generating bootstrap samples (with
replacement) and computing graph efficiency measures on each sample. This
method generates binary graphs using a range of threshold values,
specified by the params argument. Only positive values are used for graph
construction.
Thresholds can be 'sparsity' or 'pearson' coefficients.
PARAMETERS:
X ROI-wise thickness values (or residuals); R x N
where N = no. subjects, R = no. ROIs
groups N x 1 list of group assignments, as cell
strings
group_list G x 1 list of groups to analyze
params Parameters defining the analysis; see
analyze_corr_net_binary_params
RETURNS:
A result structure containing all bootstrap results, as well as mean,
standard deviation, and integrals calculated across a threshold range
(this is the entire range unless specified by the params argument).
Andrew Reid
Montreal Neurological Institute
2011
>> group_list=[{'Normal'},{'MCI'},{'AD'}];
>> result = analyze_corr_net_binary(roi_means',groups,group_list,params)
Analyzing Normal ......OK.
Analyzing MCI ......OK.
Analyzing AD ......OK.
Computing stats and integrals...
Done.
result =
Normal: [1x1 struct]
thresholds: [0.1000 0.2000 0.3000 0.4000 0.5000 0.6000]
MCI: [1x1 struct]
AD: [1x1 struct]
ttest: [1x1 struct]
NOTE: The roi_means matrix must be transposed before being passed to analyze_corr_net_binary. Failing to do this may not result in an error, and may even result in significant results due to the structure of the data, so this is important to note.
The function will also output four figures, as shown:
These aren't particularly impressive results, but this may be due to a number of factors: a smallish N (101; I know, I said it was large but neuroimaging datasets are getting larger by the day), a small number of bootstraps (20), and a large threshold increment (0.1). Ideally you will want a larger N if possible, 500-1000 bootstrap samples, and an increment of 0.5 or even 0.25 if you have time to let it run (this makes the curves smoother and better defines the trend).
Weighted graph analysis
Binary graph analysis is conceptually problematic, in that distance is not accounted for — despite the obvious observation that metabolic cost (both for signal transmission, spatial requirements, and tissue maintenance) with be proportional to distance. We can start to address this issue if we have a good approximation of interregional connection distances. Euclidean distance between regional centroids is the easiest way to approximate this, but it is a very poor one, because it ignores the convoluted shape of cortical white matter — particularly for extreme anterior or posterior contralateral ROIs.
Perhaps the best available method for approximating interregional distances, at present, is via diffusion-weighted imaging (DWI) tractography. Most tractography tools (e.g., FSL's probtrackx) will provide an average length for each pair of seed/target ROIs. If no DWI data are available, the next best option is to compute shortest geometric paths through the white matter, as described in Bojak et al., 2010 [3]*.
Weighted analysis is very similar to the binary analysis described above. The parameters are obtained via analyze_corr_net_weighted_params.m:
>> params=analyze_corr_net_weighted_params;
>> params.distances=aal_dist_76
params =
thr_min: 0.1000
thr_max: 0.6000
thr_step: 0.1000
boot_n: 20
boot_m: 100
int_start: 0.1000
int_end: 0.6000
integral_type: 'trap'
remove_neg: 1
show_plots: 1
keep_boots: 0
threshold_by: 'sparsity'
fdr_thres: 0.0500
fdr_type: 'normal'
window_size: [600 500]
distances: [76x76 double]
The only change is that we now must set the distances field; in this case I have a distance variable called aal_dist_76, which I assign to params.distances. Next we pass our data in the same way as before, to the function analyze_corr_net_weighted:
results=analyze_corr_net_weighted(roi_means',groups,group_list,params);
And you get similar output as before, only distance-weighted.
* If you are interesting in obtaining code for the shortest-distance-via-WM method, please contact ac.lligcm.inm.cib|diera#ac.lligcm.inm.cib|diera.
References
1. Tzourio-Mazoyer N et al. Automated anatomical labelling of activations in spm using a macroscopic anatomical parcellation of the MNI MRI single subject brain. Neuroimage 2002; 15: 273-289. [PubMed]
2. Latora V, Marchiori M. Efficient behavior of small-world networks. Phys. Rev. Lett. 2001; 87, 198701 [PubMed]
3. Bojak I, Oostendorp TF, Reid AT, Kötter R. Towards a model-based integration of co-registered electroencephalography/functional magnetic resonance imaging data with realistic neural population meshes. Phil. Trans. Roy. Soc. A 2010; 369: 1952. [PubMed]