Dropping patient_details

We made a change to the Generalized Data Model (GDM). Many times, when a data model undergoes a change, some new tables or columns are added. We’re doing the opposite. We’re dropping a table from the GDM, namely, the patient_details table.

Originally, we intended the patient_details table to store facts that were more demographic than clinical. For example, we used the patient_details table to store information from the SEER-Medicare linked data, such as marital status, urban/rural status, and census tract-based socio-economic data.

As we began to consider how to create algorithms based on information in the patient_details table using ConceptQL, we found ourselves questioning the purpose of the table. When we looked at the clinical_codes table and associated tables like the measurement_details table, we realized that the clinical_codes and other existing tables were well-suited to store all observations about a patient, not just those made in a clinical setting.

At that point, we were faced with either 1) expanding the patient_details table along the lines of our existing clinical_codes and measurement_details tables, or 2) moving the information from patient_details into the clinical_codes and associated tables.

We went with the latter for two reasons:

We could not identify a compelling use case for distinguishing between “patient” details and “clinical” details in the context of extracting data to create an analysis-ready dataset.
- To the extent that this distinction is important, it could be captured in ways that do not require separate tables.
- As a side note, the use of tables to partition data into separate semantic groups is an important distinction among various data models. This will be the subject of a future blog post.
Storing all observations for a patient in a single table facilitates more powerful algorithms and simpler queries.
- Creating algorithms that combine personal and clinical constructs is easier to do and to explain.
- Although this was not a requirement, it turns out that, because ConceptQL is already designed to query the clinical_codes table, no additional work was required for implementation.

It would be appropriate to rename clinical_codes to observations or something similar, but we have a lot of software that is hard-coded to use the name clinical_codes so we’ll continue to use clinical_codes for the time-being.

Interestingly, using a single table for all observations is not a new idea. After we decided on this direction for clinical_codes, we realized this is similar to how I2B2 structures its data as a star schema.

We’re very excited about this change to the GDM. It isn’t often one gets to remove something from a system and gain features as a result.

You can read the original paper on GDM here: GDM Paper