Jigsaw refers to a set of tools to help researchers conduct observational studies using electronic health data. These tools are designed to work with data stored in a variety of common data models (CDMs) so that CDM-specific tools do not have to be re-designed. But Jigsaw is also designed to work with its own, simple data model. Most importantly, our goal is to make Jigsaw easy for others to adopt by making it openly available. Below are descriptions of the key aspects of Jigsaw. We have prototypes for many of the features listed below, while others are simply on the roadmap.

The “Jigsaw” name was chosen because a jigsaw is a tool for cutting precisely crafted pieces. For most projects with observational health data, researchers use precise algorithms to identify the specific conditions, treatments, procedures, measurements, or visits they are studying. Hence, we chose the word “Jigsaw” to reflect the process of cutting data for research using specific algorithms.

Simple structure

All tables focused around clinical codes
No vocabulary re-mapping
Relationships among data elements retained

Clear provenance

The path from the raw data to the storage data model is well-documented
Necessary source data details are retained

Automated process

Clean user interface to specify ETL requirements
No need to write custom ETL code
Specifications are used to automatically perform ETL


Data can be readily transformed into OMOP, PCORnet, or mini-Sentinel

Stores algorithms used
for research

Includes validated, commonly used, and custom algorithms
Includes common code sets


Statements stored as JSON files in the open-source ConceptQL language designed specifically for this purpose
Visualized as graphics

Includes algorithm
authoring tool
(Jigsaw Algorithm Maker)

Construct algorithms using an intuitive user interface

Works across data
storage systems

Statements compiled into queries at run-time for many systems including PostgreSQL, Oracle, Sequel Server, and Impala

Works across
data models

Algorithms operate against different data models with no changes required to the algorithms themselves

Build datasets for entire study

Lookback period
Inclusion / exclusion criteria
Baseline variables

Output all relevant records

One record per person cohort file
Multiple records per person events file

Output study meta-data

Data dictionary
Details of all algorithms used
Complete study design