We made a change to the Generalized Data Model (GDM). Many times, when a data model undergoes a change, some new tables or columns are added. We’re doing the opposite. We’re dropping a table from the GDM, namely, the
Originally, we intended the
patient_details table to store facts that were more demographic than clinical. For example, we used the
patient_details table to store information from the SEER-Medicare linked data, such as marital status, urban/rural status, and census tract-based socio-economic data.
As we began to consider how to create algorithms based on information in the
patient_details table using ConceptQL, we found ourselves questioning the purpose of the table. When we looked at the
clinical_codes table and associated tables like the
measurement_details table, we realized that the
clinical_codes and other existing tables were well-suited to store all observations about a patient, not just those made in a clinical setting.
At that point, we were faced with either 1) expanding the
patient_details table along the lines of our existing
measurement_details tables, or 2) moving the information from
patient_details into the
clinical_codes and associated tables.
We went with the latter for two reasons:
- We could not identify a compelling use case for distinguishing between “patient” details and “clinical” details in the context of extracting data to create an analysis-ready dataset.
- To the extent that this distinction is important, it could be captured in ways that do not require separate tables.
- As a side note, the use of tables to partition data into separate semantic groups is an important distinction among various data models. This will be the subject of a future blog post.
- Storing all observations for a patient in a single table facilitates more powerful algorithms and simpler queries.
- Creating algorithms that combine personal and clinical constructs is easier to do and to explain.
- Although this was not a requirement, it turns out that, because ConceptQL is already designed to query the
clinical_codestable, no additional work was required for implementation.
It would be appropriate to rename
observations or something similar, but we have a lot of software that is hard-coded to use the name
clinical_codes so we’ll continue to use
clinical_codes for the time-being.
Interestingly, using a single table for all observations is not a new idea. After we decided on this direction for
clinical_codes, we realized this is similar to how I2B2 structures its data as a star schema.
We’re very excited about this change to the GDM. It isn’t often one gets to remove something from a system and gain features as a result.
You can read the original paper on GDM here: