Updated for release version: 1.0.8-alpha
This page introduces the concept of a "project" in modelGUI, explains how it is organized, and lists its XML schema.
NB. Since mgui is in an alpha phase, the concepts and implementations of projects are subject to change. What follows is more of a tentative plan for projects, rather than the finished product.
Introduction
In modelGUI, a Project is a means to represent hierarchically-organized persistent data in a useful meta-structure, which allows for easy access to specific types of data from project members, or "instances" (such as subjects in an study, land holdings of an geological exploration company, etc.). Projects also allow a user to perform parallel computations on an entire set of instances, organize instances into groups, perform statistics, browse through instances, or output instance data in a standard format. Since this concept is best understood through exemplars, the next section introduces one.
Example
Imagine you have just performed a neuroimaging experiment with a decent-sized sample of 100 subjects. For each subject, you've obtained:
- T1-weighted magnetic resonance images (sMRI), which show the structure of the brain
- Diffusion-weighted MRI (DWI), which show how liquid diffuses in the brain (and can be used to study the white matter connecting the brain)
- Task-related functional MRI (fMRI), which shows how the brain's activity changes during and following some specific tasks
- A psychological battery of tests, yielding scores such as depression, memory, and executive performance
- Demographics about the population (age, sex, education, etc.)
This is a nice data set, but before you can begin testing any hypotheses, you will need to pre-process the imaging data in order to get it in a suitable format for statistical analysis (typically this involves aligning the images to one another, and to some template space in which they can be compared). You also would like to extract cortical surfaces from the structural MRI, which can provide interesting information about its morphology, such as the thickness of the cortex.
This type of data set is perfectly designed for an mgui project. Each subject is a project "instance", and each has the same set of data, which can be organized into subdirectories. Our project can look like this:
The project has three children, which are explained next:
- Instances
- This is a list of units in a project; in this case, the units are subjects in a study.
- Project Data
- These are top-level subdirectories which contain population data that is better represented as a whole, rather than divided into instance parts. This includes the psychological scores and demographics, which can be best represented as tables, rather than being split into individual one-line text files. Relational databases might also be stored here if applicable.
- Instance Data
- These specify the directory structure underlying each instance. These directories are used for storing data which consists of one or more large binary files, or basically any data that cannot be represented as a top-level table or relational database. This includes the MR imaging files, but may also include mesh files used to represent cortical thickness (which will be generated later on..)
Handling Project Data
In order for a project to perform I/O functions it will need to also specify the type of data it describes. A project can handle any arbitrary set of data, provided there are loader and/or writer classes (or a JDBC driver) available for them. In mgui, loaders and writers are implementations of the abstract FileLoader and FileWriter classes, respectively. Some common implementations are listed here.
The information necessary for loading and writing data is contained in a "Project Data Item". Each leaf in the project tree must have an associated data item specification, which specifies (where applicable):
- File Name Form
- A way to recognize a file by its filename; for example, raw files in the DWI subdirectory above would be specified as "raw_dwi_image_{instance}_{series}.nii", where {instance} specifies the instance name and {series} specifies a series identifier, if the data are serial (such as an fMR time series, or a DWI gradient series). Thus, for subject # 1, gradient field # 2, the filename would be "raw_dwi_image_subject_1_2.nii". Note that the series tag can also specify a number format, e.g. "prefix_{series000}.ext" would specify "prefix_001.ext".
- Data Loader
- The loader to use for loading this file into memory. The project will handle a load request by loading the data with the loader and returning it as a Java Object; the calling function must know what sort of data is being served.
- Data Writer
- The writer to use for writing an appropriate object into persistent storage. The project will handle a write request by receiving a Java Object, which must have an appropriate class inheritance to be handled by the writer.