User Tutorial - Working with Data Sources

updated for version mgui-1.0.10-alpha

This tutorial will introduce data sources and get you started with establishing new sources, displaying their data in tables, creating SQL queries, and creating data links with other interface objects. This assumes you have already set up ModelGUI with a JDBC driver to a database server such as PostgreSQL or MySQL. If you haven't, try this tutorial first.

Introduction

A Data Source is a connection to a JDBC driver, and via this driver a relational database. ModelGUI provides (or will provide in its beta release) a number of useful database functions through which information can be related to shape or dynamic model objects. For instance, integer or name-mapped vertex data stored in a ShapeInterface object can be used to index a field in a Data Table, allowing all other fields in the table to be assigned to individual vertices. Through JDBC, ModelGUI also supports SQL queries, which means you can select subsets of data, combine data from separate tables, etc.

In this tutorial, we will start from the start and build our database as we go. Consider the following scenario: we have collected raw data from a group of 20 subjects, over three visits spaced six months apart; including:

  • Structural brain scans (T1-weighted magnetic resonance images, or MRI)
  • Age, sex, height, and weight information
  • A test of intelligence, the Wechsler Adult Intelligence Scale, or WAIS

For the structural data, we have done a bunch of pre-processing to get the images aligned, etc., and computed volume estimates for a set of 12 brain regions of interest (or ROIs) - 6 per brain hemisphere. In this processing step, 7 of the 60 images failed at some point, leaving us with 53 good images. From all these (fictional) data we have compiled two tables, which you can download by right-clicking:

Given all this, what we now want to do is get our study data into a database which we can use in ModelGUI.

Creating a Data Source

We need to define a data connection that will allow us to communicate with the database server (for this tutorial I will be using PostgreSQL, but the process will work for other supported servers such as MySQL). From the Datasource Panel, expand the "Sources" tab. With "<- NEW SOURCE ->" selected in the combo box, click the "Define" button:

create_new_datasource_1.png

You will see the "Define Data Source" dialog box. On the left, select "New", and ensure that the desired driver is selected. Enter "tutorial4" for both the name and the source name, and enter the login and password for the server (you will have specified these when installing the server, or have been given them by your system administrator). The dialog should look like this:

create_new_datasource_2.png

Click "OK" and after a moment you should get a success dialog. If not, check the console for any errors. You may have gotten the authentication information wrong.

Back on the Database Panel, click "Create" and then "Connect". You should now be connected to your new data source.

Importing Data

At this point, we are ready to import our study data into our new data source. Expand the "I/O" section and click "Import data..". You will see the "Import Data to Table" dialog box. Since we have comma-separated data, we need to change the "Delimited by:" value to ",". Everything else is fine; now click "Choose files.." and select the two csv files you've downloaded. The dialog box should look like this:

import_data_1.png

Before we import, we need to specify a "primary key" for each table, which is essentially a uniquely-valued column that is used by the database server for efficiency. For this purpose, your csv files have a "unique id" column called "uid". Select the "Key?" column for the uid field; the "Uniq?" and "Req?" boxes will automatically be checked too, since a key field must be unique and it must have a value for each record. Now select "subject_data" from the combo box and do the same for its uid field. Now you are ready to import, so click "OK".

Displaying a Data Table

Expanding the "Tables" and "Fields" sections will allow you to view your new tables and their fields. Next we'd like to see what the data look like. For this we need a Data Table window: right-click on the current 3D window and select "Change Type > Data Table". Now, in the "Tables" sections, the "Data Table" window should be visible in the "Show in window" box. With the "image_data" table selected, click "Show", and the data should appear in the window:

show_image_data.png

Working with SQL Queries

Now that we have our data we probably want to do something interesting with it. That where SQL queries come in. Let's say we want to select only those records in the subject_data table which corresponds to images which have been successfully processed (recall that 7 images failed). We do this with a SELECT query. In the "Queries" section, click "Add New", and you will see the "Define SQL" dialog box. Enter the SQL command as displayed:

SELECT subject_data.* FROM subject_data
INNER JOIN image_data
ON subject_data.uid = image_data.uid;
define_subject_query_1.png

Click "OK" and then click "Show". You should see a new set of data, with the same fields as the subject_data table, but only 53 records rather than 60. You may want to make this into its own table, instead of having to run a query to obtain it. To do this, create a new query "create_processed_subject_data" with the following command:

CREATE TABLE processed_subject_data AS 
(SELECT subject_data.* FROM subject_data
INNER JOIN image_data
ON subject_data.uid = image_data.uid);

Now, instead of clicking "Show", click "Execute". You should now have a new table called "processed_subject_data".

Say we wanted to do a calculation with our data. Body mass index (BMI), for instance, is computed with the formula BMI = weight / mass2. We can obtain this value with the query:

SELECT *,
weight / (height * height) AS bmi
FROM subject_data;

…and clicking "Show" — which will add a new "bmi" column to our result table.

You can even perform some simple statistics using SQL. To demonstrate, try the following command:

SELECT
avg(height) AS avr_height,
avg(weight) AS avr_weight,
corr(wais,vol_roi1_left) AS corr_wais_roi1_left
FROM subject_data
INNER JOIN image_data
ON subject_data.uid = image_data.uid;

This computes the average of columns height and weight, and the correlation between the WAIS score from subject_data and the volume of the left ROI 1 from image_data, and outputs them as three values (by clicking "Show"):

stats_output.png

SQL is very flexible tool; we've only scratched the surface here. Check out the PostgreSQL documentation to expand your SQL horizons… A list of available statistics functions is here.

Saving/Loading a Data Source

The data you imported is now safely controlled by the database server; any changes you make to it will be immediately reflected in the underlying data source. However, the structure of the tables and queries you have created are not yet saved in ModelGUI format. The queries, moreover, are not saved in the server. While you could redo the above steps every time you start a new ModelGUI session, it is more desirable to save the data source structure such that it can be easily reloaded. You can save your data source by clicking "Save" in the "Sources" section. Select an appropriate place to store the data source file, where it can be reloaded using the "Load" button.

Useful hint: when mgui is initiating, it will search the "data_sources" subdirectory of the mgui installation folder by default, and automatically load any data sources it finds there. Thus, if you would like your data source to be loaded each time the application runs, you can save it to this folder.

Exporting Data

Often you'll want to work with your data using programs other than ModelGUI (shocking as it may seem). This is fairly straightforward to do in one of two ways. Firstly, you can often simply use the copy-and-paste functionality, by selecting some or all records in a table. Try this by clicking somewhere in the table you want to copy and pressing Ctrl-A to select all records, followed by Ctrl-C to copy the data to the system clipboard (use the equivalent hot-keys for Linux or Mac, of course). You can then paste from the clipboard into your favourite text editor, or directly into a tool like Matlab using the "Paste to workspace" function; here I've pasted into Notepad++, and it appears as tab-delimited values as in the table:

copy_paste_both.png

The second option, which offers more control over the output and allows you to export multiple tables and/or queries at once, is the Export Dialog. You can access this by clicking "Export data.." in the "I/O" section. You first need to select your data source "tutorial4" from the combo box (if it's not already). This will show you a list of all tables and queries in the data source. You can select which ones you want to export, and what to call the output. Let's select the BMI query we created, and call it "bmi_scores.csv". This refers to a comma-delimited file, so we should change the delimiter to ",". You probably want a header line, so ensure that this box is checked. Finally, you can select the directory where the output will be written using the "Browse.." button. The dialog should look something like this:

export_dialog.png

Now just click "OK" and your data will be written as a csv file. This can be opened in a text editor or a spreadsheet program like Excel, or imported into a stats program such as SPSS or a math program such as Matlab.

Creating a Data Link

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License