User Manual: Data Sources

Updated for release version: 1.0.8-alpha

This page introduces the concept of data sources: how they are defined in ModelGUI, how JDBC drivers are used to connect to a database server, and how individual databases, along with tables and queries, are stored.

Introduction

Data sources provide relational database support for ModelGUI, which in essence means that any data that is stored as a relational database is directly accessible in your mgui session. For our purposes, a relational database is comprised of a set of tables, which are in turned comprised of a set of fields. A field specifies the type of data to be stored (e.g., integers, decimals, dates, booleans, or character strings of a specific length). ModelGUI communicates to specific database servers (third-party software which provides access to underlying data) using JDBC, a standard interface included with the standard Java Runtime Library.

Java Database Connectivity (JDBC)

JDBC is a standard protocol for communication between Java applications and third-party database servers. This communication is implemented by a JDBC driver, which is a special Java class that inherits the JDBC interface. For you, the end user, this simply means that, as long as a JDBC driver exists for a database server you want to connect to from within a ModelGUI session, this connectivity is (in theory) already available. In practice, however, there are always a few minor differences between both the servers and the drivers, such that it is typically necessary to define an intermediate class (an instance of DataSourceDriver) in the mgui package, which handles these small differences. At present, drivers are defined for:

  • The JDBC-ODBC bridge, allowing communication with databases via the ODBC standard
  • The MySQL server
  • The PostgreSQL server

(All of which are freely available, well supported, open source solutions). In order to utilize the database connectivity functionality of ModelGUI, you will need to have access to one of these server packages (e.g., install one on your system, if you are interested in creating/using local databases). We recommend PostgreSQL, which is most commonly used for ModelGUI development.

Data source drivers

Data source drivers are specified in the data_sources subdirectory of the mgui installation, with the extension ".driver". These files consist of a single line of text, space-delimited, which defines the essential parameters for a JDBC connection. For instance, the default driver for PostgreSQL (postgresql.driver) is specified by the line:

mgui.datasources.postgresql.PostgreSQLDriver jdbc:postgresql://localhost PostgreSQL <login> <encrypted-password>

This specifies:

  1. The mgui driver class, which is specialized to communicate with PostgreSQL: mgui.datasources.postgresql.PostgreSQLDriver
  2. The URL which accesses the default database of the server; in the case of PostgreSQL, this will default to a database with the same name as the login: jdbc:postgresql://localhost
  3. The name by which the driver with be referenced in the mgui session: PostgreSQL
  4. The login
  5. The encrypted password; this is defined in the "Admin" dialog box (link to this)

Data source drivers can be viewed and modified via the Data Source Panel, under the Admin section. The dialog box is shown here:

data_source_drivers_dialog.png

Data sources

A Data Source in ModelGUI refers to a connection to a single database via JDBC. It is defined by a data source driver, which specifies the database server to interact with, along with information specifying the database to connect to, and the appropriate credentials (login and password). All loaded data sources can be viewed in the Object Tree, by expanding the "Data Sources" node. The tree node displays the current connection status, the connection parameters, and a list of tables and queries contained by the database, which can be themselves expanded to identify fields (in the case of tables) and SQL statements (in the case of queries). The expanded node appears as shown in the example below:

data_sources_tree_node.png

You can also use the object tree to interact with or modify the data source. For example, right-clicking on the main node allows you to connect to (or disconnect from) the server. In the case of fields and queries, it is typically more convenient to use the Data Source Panel for this purpose, however.

Data connection

The connection to a data source is provided via a Data Connection object, which is a set of parameters used to establish a link to a JDBC driver, which acts as a bridge into the database. The connection is specified by:

  1. The data source driver
  2. The login
  3. The encrypted password; this is defined in the "Admin" dialog box (link to this)
  4. The database URL specifying the database from which to access data

You'll notice that this is similar to the components of the data source driver, listed above. The major difference is that a driver defines the JDBC driver class which does the work of connecting to the database server. The login and password may be (and typically are) the same; however, in some cases you may want to access a particular datasource as a user other than the one used for general administration. The database URL is different solely because it specifies a specific database, whereas the driver URL results in the default database being accessed (this allows, for instance, access to a list of available databases).

Data tables

Data tables form the core of any relational database; being the means of precisely defining the data it contains. A table definition is comprised of a set of Data Fields, each of which specify a Data Type of a fixed (or sometimes variable) length. The data in a table are organized as a set of Data Records, each of which contains a single value for each of the data fields. This arrangement is best understood in terms of a graphical table, in which the fields are columns, and the records are rows. This, indeed, is how a table's data is displayed in ModelGUI. This simple but powerful arrangement is the basis of all relational databases, and — as shown below — can be used quite effectively when combined with SQL statements that allow the user to select subsets of records, or combine fields from different tables. These concepts will be explored my thoroughly below.

Data types

A data type is a (usually pre-defined) specification for data storage; it tells the database what sort of data it is to store, and how much memory is required to store a single instance. Data comes in many forms; commonly used types are (see also the PostgreSQL types or the MySQL types):

  • Byte
  • Small integer (2-byte integer value)
  • Integer (4-byte integer value)
  • Large integer (8-byte integer value)
  • Float (4-byte floating-point value)
  • Double (8-byte floating-point value)
  • String (a string of a specific number of characters)
  • Long String (a string of indeterminate size, up to 32,700 characters)
  • Blob (Large binary object)
  • Timestamp (Date and time combination)
  • Date (Calendar date)
  • Boolean (1-bit value representing true/false)

A complete list of data types can be found here.

Note that these type names are not necessarily those used in SQL queries (see below); they refer to the names by which they are referred within the ModelGUI environment.

A typical request to a data source will return a Record Set which is a set of rows representing all the data in a given table. Each row will consist of a set of single values specified by the table's fields. Thus a table can be represented in terms of rows and columns, as below:

data-table-example.png

Data fields

A data field is defined by a name (typically, but not necessarily, lower-case), a data type, and a length (if necessary). The choice of data type should be dictated by finding the most efficient form which has sufficient range to accommodate the expected data. For instance, a Double should not be used to store integer data, since it will require twice as much memory as the more appropriate Integer data type. That's a pretty obvious example, but you may find yourself deciding between Double or Float, Integer or Short Integer, String or Long String, etc. In these cases, I suggest following one of the links above to get a better idea of which field types to use.

In the above table, there are four fields (columns) and four rows (individuals with corresponding data). The fields are specified as below:

data-fields-example.png

Data fields have a few additional attributes. They can be (1.) unique, indicating that no two records can have the same value for that field; and (2.) required, meaning that no record can contain a value of NULL for that field. A field can also be designated a a key field, which by definition must be unique and required; key fields are used by the database engine and by ModelGUI to perform efficient index-based searches, and to ensure that each record in a table is distinct (i.e., has at least one unique field).

Data queries and SQL

One big advantage of storing data in data sources is the ability to use SQL to query your data in a number of different ways; these allow you, for instance, to select subsets according to specific criteria, or combine fields from separate tables to form new ones. SQL stands for "structured query language", and was designed as a standardized approach to interrogating and modifying databases using simple intuitive lines of text. ModelGUI communicates to database servers via JDBC entirely through this language. The basic transaction in this communication involves:

  1. User request: mgui receives a request from the user to execute some SQL statement. As a simple example, to return all records for a table called "my_table", the request is structured as the following SQL statement: SELECT * FROM my_table;
  2. ModelGUI request: mgui relays this request to the JDBC driver set up for this connection
  3. Server response: The server executes the SQL statement and compiles a result set, which is returned via the JDBC driver to mgui
  4. ModelGUI output: mgui can now access the requested data, and it uses these to populate a Data Table window with the data contained in "my_table" for the user to browse

Given the power of SQL queries, it is recommended that the user who wants to utilize this functionality familiarize him/herself with the language and how it can be best leveraged to give the desired results. While SQL is generally a standardized language, it is useful to realize that there are always slightly different ways in which different database servers handle SQL commands. Thus, it is important to understand the subtleties of the particular software you want to use. The two supported platforms at present also have excellent online documentation for this: PostgreSQL's is here, MySQL's is here.

Data links

Tutorials

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License