CBDB Research Schema - Table and Column Documentation
Overview
This document provides detailed column information for all tables in the research schema, designed for analysis planning and code development. The research schema contains deidentified views with dates converted to ages for privacy protection.
Note: This documentation excludes adrc_values table as that documentation has been provided separately.
Table 1: research.person
Purpose: Demographics, cohort information, and insurance data Row Level: One record per person
Column List:
| Column Name | Data Type | Description |
|---|---|---|
person_id |
integer | Unique identifier for each person |
gender_concept_id |
integer | OMOP concept ID for gender |
year_of_birth |
integer | Year of birth for age calculations |
race_concept_id |
integer | OMOP concept ID for race |
ethnicity_concept_id |
integer | OMOP concept ID for ethnicity |
provider_id |
integer | Associated provider ID |
care_site_id |
integer | Associated care site ID |
gender_source_value |
varchar | Source system gender value |
gender_source_concept_id |
integer | Source system gender concept |
race_source_value |
varchar | Source system race value |
race_source_concept_id |
integer | Source system race concept |
ethnicity_source_value |
varchar | Source system ethnicity value |
ethnicity_source_concept_id |
integer | Source system ethnicity concept |
adrc_cohort |
integer | ADRC cohort flag (1=yes, 0=no) |
insurance |
varchar | Insurance information from EPIC (concatenated financial class names) |
first_condition_year_month |
varchar | First condition date as YYYY-MM format only |
source_system |
varchar | Data source system identifier |
source_table |
varchar | Source table name |
source_recordid |
varchar | Source record identifier |
Key Analysis Notes:
- Year of birth included for precise age calculations and birth cohort analysis
- Use
adrc_cohort = 1to filter for main research population - Insurance data contains concatenated financial class names from EPIC (e.g., "Medicare", "Commercial Insurance; Medicare")
first_condition_year_monthprovides timeline context without exact dates- Insurance information is available for Clarity patients who have coverage data in EPIC
Table 2: research.condition_occurrence
Purpose: Diagnosis records with age-based temporal data Row Level: One record per condition occurrence
Column List:
| Column Name | Data Type | Description |
|---|---|---|
condition_occurrence_id |
bigint | Unique identifier for each condition occurrence |
person_id |
integer | Reference to person table |
condition_concept_id |
integer | OMOP concept ID for the condition |
age_at_condition_start |
numeric | Age when condition was first diagnosed |
age_at_condition_end |
numeric | Age when condition ended (if applicable) |
condition_type_concept_id |
integer | Type of condition record (EHR, claim, etc.) |
stop_reason |
varchar | Reason condition was stopped |
provider_id |
integer | Provider who recorded the condition |
visit_occurrence_id |
bigint | Associated visit |
visit_detail_id |
bigint | Associated visit detail |
condition_source_value |
varchar | Original condition code from source |
condition_source_concept_id |
integer | Source concept for the condition |
condition_status_source_value |
varchar | Source value for condition status |
condition_status_concept_id |
integer | Concept for condition status |
facility_id |
integer | Facility where condition was recorded |
source_system |
varchar | Data source system identifier |
source_table |
varchar | Source table name |
source_recordid |
varchar | Source record identifier |
Key Analysis Notes:
- Age fields replace all date fields for deidentification
- Link to
research.conceptusingcondition_concept_idfor condition names condition_source_valuecontains original ICD codes (type:value format)
Table 3: research.drug_exposure
Purpose: Medication records with age-based temporal data Row Level: One record per drug exposure
Column List:
| Column Name | Data Type | Description |
|---|---|---|
drug_exposure_id |
bigint | Unique identifier for each drug exposure |
person_id |
integer | Reference to person table |
drug_concept_id |
integer | OMOP concept ID for the drug |
age_at_drug_start |
numeric | Age when drug exposure started |
age_at_drug_end |
numeric | Age when drug exposure ended |
age_at_verbatim_end |
numeric | Age at verbatim end date |
drug_type_concept_id |
integer | Type of drug record (prescription, OTC, etc.) |
stop_reason |
varchar | Reason drug was stopped |
refills |
integer | Number of refills |
quantity |
numeric | Quantity dispensed |
days_supply |
integer | Days supply |
sig |
varchar | Prescription instructions |
route_concept_id |
integer | Route of administration concept |
lot_number |
varchar | Drug lot number |
provider_id |
integer | Prescribing provider |
visit_occurrence_id |
bigint | Associated visit |
visit_detail_id |
bigint | Associated visit detail |
drug_source_value |
varchar | Original drug code from source |
drug_source_concept_id |
integer | Source concept for the drug |
route_source_value |
varchar | Source value for route |
dose_unit_source_value |
varchar | Source value for dose unit |
description |
varchar | Drug description |
facility_id |
integer | Facility where drug was prescribed |
source_system |
varchar | Data source system identifier |
source_table |
varchar | Source table name |
source_recordid |
varchar | Source record identifier |
Key Analysis Notes:
- Three age fields provide comprehensive temporal information
quantity,days_supply, andrefillsenable dosage analysisdrug_source_valuecontains original NDC/RxNorm codes
Table 4: research.procedure_occurrence
Purpose: Procedure records with age-based temporal data Row Level: One record per procedure
Column List:
| Column Name | Data Type | Description |
|---|---|---|
procedure_occurrence_id |
bigint | Unique identifier for each procedure |
person_id |
integer | Reference to person table |
procedure_concept_id |
integer | OMOP concept ID for the procedure |
age_at_procedure |
numeric | Age when procedure was performed |
procedure_type_concept_id |
integer | Type of procedure record |
modifier_concept_id |
integer | Procedure modifier concept |
quantity |
integer | Quantity of procedures performed |
provider_id |
integer | Provider who performed procedure |
visit_occurrence_id |
bigint | Associated visit |
visit_detail_id |
bigint | Associated visit detail |
procedure_source_value |
varchar | Original procedure code from source |
procedure_source_concept_id |
integer | Source concept for procedure |
modifier_source_value |
varchar | Source value for modifier |
facility_id |
integer | Facility where procedure was performed |
source_system |
varchar | Data source system identifier |
source_table |
varchar | Source table name |
source_recordid |
varchar | Source record identifier |
Key Analysis Notes:
procedure_source_valuecontains original CPT codes (type:value format)quantityfield enables counting multiple procedures- Link to concept table for procedure descriptions
Table 5: research.measurement
Purpose: Lab results and clinical measurements with age-based temporal data Row Level: One record per measurement
Column List:
| Column Name | Data Type | Description |
|---|---|---|
measurement_id |
bigint | Unique identifier for each measurement |
person_id |
integer | Reference to person table |
measurement_concept_id |
integer | OMOP concept ID for the measurement |
age_at_measurement |
numeric | Age when measurement was taken |
measurement_time |
varchar | Time of measurement (time only, no date) |
measurement_type_concept_id |
integer | Type of measurement record |
operator_concept_id |
integer | Operator concept (=, <, >, etc.) |
value_as_number |
numeric | Numeric measurement value |
value_as_concept_id |
integer | Categorical measurement value |
unit_concept_id |
integer | Unit of measurement concept |
range_low |
numeric | Normal range low value |
range_high |
numeric | Normal range high value |
provider_id |
integer | Provider who ordered measurement |
visit_occurrence_id |
bigint | Associated visit |
visit_detail_id |
bigint | Associated visit detail |
measurement_source_value |
varchar | Original measurement code from source |
measurement_source_concept_id |
integer | Source concept for measurement |
unit_source_value |
varchar | Source value for unit |
unit_source_concept_id |
integer | Source concept for unit |
value_source_value |
varchar | Source value as recorded |
measurement_event_id |
bigint | Associated measurement event |
meas_event_field_concept_id |
integer | Measurement event field concept |
description |
varchar | Measurement description |
facility_id |
integer | Facility where measurement was taken |
source_system |
varchar | Data source system identifier |
source_table |
varchar | Source table name |
source_recordid |
varchar | Source record identifier |
Key Analysis Notes:
- Rich measurement data with values, units, and normal ranges
value_as_numberfor quantitative labs,value_as_concept_idfor categoricalrange_low/range_highenable abnormal value detection
Table 6: research.observation
Purpose: Clinical observations and notes with age-based temporal data Row Level: One record per observation
Column List:
| Column Name | Data Type | Description |
|---|---|---|
observation_id |
bigint | Unique identifier for each observation |
person_id |
integer | Reference to person table |
observation_concept_id |
integer | OMOP concept ID for the observation |
age_at_observation |
numeric | Age when observation was made |
observation_type_concept_id |
integer | Type of observation record |
value_as_number |
numeric | Numeric observation value |
value_as_string |
varchar | Text observation value |
value_as_concept_id |
integer | Categorical observation value |
qualifier_concept_id |
integer | Qualifier concept for observation |
unit_concept_id |
integer | Unit concept for numeric values |
provider_id |
integer | Provider who made observation |
visit_occurrence_id |
bigint | Associated visit |
visit_detail_id |
bigint | Associated visit detail |
observation_source_value |
varchar | Original observation code from source |
observation_source_concept_id |
integer | Source concept for observation |
unit_source_value |
varchar | Source value for unit |
qualifier_source_value |
varchar | Source value for qualifier |
value_source_value |
varchar | Source value as recorded |
observation_event_id |
bigint | Associated observation event |
obs_event_field_concept_id |
integer | Observation event field concept |
description |
varchar | Observation description |
facility_id |
integer | Facility where observation was made |
source_system |
varchar | Data source system identifier |
source_table |
varchar | Source table name |
source_recordid |
varchar | Source record identifier |
Key Analysis Notes:
- Multiple value formats: numeric, string, and concept-based
- Includes clinical assessments, vital signs, and survey responses
qualifier_concept_idprovides additional context
Table 7: research.visit_occurrence
Purpose: Healthcare visit records with age-based temporal data Row Level: One record per visit
Column List:
| Column Name | Data Type | Description |
|---|---|---|
visit_occurrence_id |
bigint | Unique identifier for each visit |
person_id |
integer | Reference to person table |
visit_concept_id |
integer | OMOP concept ID for visit type |
age_at_visit_start |
numeric | Age at start of visit |
age_at_visit_end |
numeric | Age at end of visit |
visit_type_concept_id |
integer | Type of visit record |
provider_id |
integer | Primary provider for visit |
care_site_id |
integer | Care site where visit occurred |
visit_source_value |
varchar | Original visit code from source |
visit_source_concept_id |
integer | Source concept for visit |
admitted_from_concept_id |
integer | Where patient was admitted from |
admitted_from_source_value |
varchar | Source value for admission source |
discharged_to_concept_id |
integer | Where patient was discharged to |
discharged_to_source_value |
varchar | Source value for discharge destination |
preceding_visit_occurrence_id |
bigint | Previous visit in sequence |
facility_id |
integer | Facility where visit occurred |
source_system |
varchar | Data source system identifier |
source_table |
varchar | Source table name |
source_recordid |
varchar | Source record identifier |
Key Analysis Notes:
- Visit duration can be calculated from start/end ages
visit_concept_iddistinguishes inpatient, outpatient, emergency, etc.preceding_visit_occurrence_idenables visit sequence analysis
Table 8: research.death
Purpose: Death records with age-based temporal data Row Level: One record per person (if deceased)
Column List:
| Column Name | Data Type | Description |
|---|---|---|
person_id |
integer | Reference to person table |
age_at_death |
numeric | Age at time of death |
death_type_concept_id |
integer | Type of death record |
cause_concept_id |
integer | Primary cause of death concept |
cause_source_value |
varchar | Original cause of death code |
cause_source_concept_id |
integer | Source concept for cause of death |
source_system |
varchar | Data source system identifier |
source_table |
varchar | Source table name |
source_recordid |
varchar | Source record identifier |
Key Analysis Notes:
- Mortality endpoint for survival analysis
cause_concept_idprovides standardized cause of death- Link to person table to identify deceased patients
Table 9: research.neuro_clinic_notes
Purpose: Clinical notes from Columbia Neurology Aging and Dementia department Row Level: One record per clinical note
Column List:
| Column Name | Data Type | Description |
|---|---|---|
person_id |
bigint | Reference to person table |
pat_enc_csn_id |
bigint | EPIC-specific encounter identifier (for internal tracking only) |
contact_date |
timestamp | Date and time of the clinical note/encounter |
note_id |
bigint | EPIC-specific note identifier (for internal tracking only) |
note_text |
text | Full text of the clinical note |
Key Analysis Notes:
- Contains unstructured clinical text from neurology encounters
- Focus on patients with aging and dementia concerns
note_textfield can be analyzed for symptoms, diagnoses, treatment plans- Useful for natural language processing (NLP) and text mining
- Link to
research.personviaperson_idfor patient demographics contact_dateprovides exact temporal context (not converted to age)- Note:
pat_enc_csn_idandnote_idare EPIC system identifiers; cannot be linked to OMOP tables
Clinical Note Content May Include:
- Patient symptoms and complaints
- Clinical assessments and findings
- Medication prescriptions and changes
- Treatment recommendations
- Follow-up instructions
- Provider observations
Table 10: research.cbdb_notes
Purpose: Clinical notes from all CBDB cohorts (ADRD, pre-AD, control) Row Level: One record per clinical note
Column List:
| Column Name | Data Type | Description |
|---|---|---|
person_id |
bigint | Reference to person table |
pat_enc_csn_id |
bigint | EPIC-specific encounter identifier (for internal tracking only) |
age_at_contact |
numeric | Age at the time of the clinical note (replaces contact_date for deidentification) |
note_id |
bigint | EPIC-specific note identifier (for internal tracking only) |
note_text |
text | Full text of the clinical note |
Key Analysis Notes:
- Contains unstructured clinical text from all three CBDB cohorts (ADRD, pre-AD, control)
- Unlike
neuro_clinic_notes, this table covers all departments, not just neurology age_at_contactreplacescontact_datefor deidentification (consistent with other research tables)note_textfield can be analyzed for symptoms, diagnoses, treatment plans- Useful for natural language processing (NLP) and text mining across the full cohort
- Link to
research.personviaperson_idfor patient demographics - Note:
pat_enc_csn_idandnote_idare EPIC system identifiers; cannot be linked to OMOP tables
Clinical Note Content May Include:
- Patient symptoms and complaints
- Clinical assessments and findings
- Medication prescriptions and changes
- Treatment recommendations
- Follow-up instructions
- Provider observations
Table 11: research.concept_relationship
Purpose: OMOP vocabulary relationships between concepts (e.g., Maps to, Subsumes, Has ingredient) Row Level: One record per concept relationship
Column List:
| Column Name | Data Type | Description |
|---|---|---|
concept_id_1 |
integer | Source concept ID in the relationship |
concept_id_2 |
integer | Target concept ID in the relationship |
relationship_id |
varchar | Type of relationship (e.g., 'Maps to', 'Subsumes', 'RxNorm has ing') |
valid_start_date |
date | When relationship became valid |
valid_end_date |
date | When relationship became invalid |
invalid_reason |
varchar | Reason relationship was invalidated |
Key Analysis Notes:
- Essential for mapping between vocabularies (e.g., ICD10CM to SNOMED)
- Common relationship types:
'Maps to': Maps source concept to standard concept'Subsumes': Hierarchical parent-child relationship'RxNorm has ing': Drug to ingredient mapping'Has asso proc': Condition to associated procedure- Use to find standard concepts from source codes
- Enables concept hierarchy navigation
- Critical for drug ingredient analysis (rolling up specific drugs to ingredients)
Common Use Cases:
-- Map ICD10 codes to standard SNOMED concepts
SELECT
c1.concept_code as icd10_code,
c1.concept_name as icd10_name,
c2.concept_code as snomed_code,
c2.concept_name as snomed_name
FROM research.concept_relationship cr
JOIN research.concept c1 ON cr.concept_id_1 = c1.concept_id
JOIN research.concept c2 ON cr.concept_id_2 = c2.concept_id
WHERE c1.vocabulary_id = 'ICD10CM'
AND c2.vocabulary_id = 'SNOMED'
AND cr.relationship_id = 'Maps to';
-- Find all ingredients in a drug
SELECT
c1.concept_name as drug_name,
c2.concept_name as ingredient_name
FROM research.concept_relationship cr
JOIN research.concept c1 ON cr.concept_id_1 = c1.concept_id
JOIN research.concept c2 ON cr.concept_id_2 = c2.concept_id
WHERE cr.relationship_id = 'RxNorm has ing';
Table 12: research.concept
Purpose: OMOP vocabulary lookup table (no deidentification needed) Row Level: One record per concept
Column List:
| Column Name | Data Type | Description |
|---|---|---|
concept_id |
integer | Unique concept identifier |
concept_name |
varchar | Human-readable concept name |
domain_id |
varchar | Domain (Condition, Drug, Procedure, etc.) |
vocabulary_id |
varchar | Vocabulary system (ICD10CM, RxNorm, CPT4, etc.) |
concept_class_id |
varchar | Classification within vocabulary |
standard_concept |
varchar | Standard concept flag (S=standard, C=classification) |
concept_code |
varchar | Original code from vocabulary |
valid_start_date |
date | When concept became valid |
valid_end_date |
date | When concept became invalid |
invalid_reason |
varchar | Reason concept was invalidated |
Key Analysis Notes:
- Essential lookup table for translating concept_ids to readable names
- Filter by
domain_idto find concepts for specific domains - Use
vocabulary_idto identify code systems (ICD10CM, CPT4, RxNorm, etc.) standard_concept = 'S'identifies preferred standard concepts
Cross-Table Analysis Patterns
Common Joins:
-- All clinical events for ADRC cohort with insurance data
SELECT p.person_id, p.adrc_cohort, p.insurance, c.age_at_condition_start
FROM research.person p
JOIN research.condition_occurrence c ON p.person_id = c.person_id
WHERE p.adrc_cohort = 1;
-- Analyze conditions by insurance type
SELECT
p.insurance,
COUNT(DISTINCT p.person_id) as patient_count,
COUNT(c.condition_occurrence_id) as condition_count
FROM research.person p
JOIN research.condition_occurrence c ON p.person_id = c.person_id
WHERE p.adrc_cohort = 1 AND p.insurance IS NOT NULL
GROUP BY p.insurance
ORDER BY patient_count DESC;
-- Translate concepts to names
SELECT co.person_id, co.age_at_condition_start, c.concept_name
FROM research.condition_occurrence co
JOIN research.concept c ON co.condition_concept_id = c.concept_id;
-- Map source codes to standard concepts using concept_relationship
SELECT
co.person_id,
co.condition_source_value,
c_source.concept_name as source_condition_name,
c_standard.concept_name as standard_condition_name
FROM research.condition_occurrence co
JOIN research.concept c_source ON co.condition_source_concept_id = c_source.concept_id
JOIN research.concept_relationship cr ON c_source.concept_id = cr.concept_id_1
JOIN research.concept c_standard ON cr.concept_id_2 = c_standard.concept_id
WHERE cr.relationship_id = 'Maps to';
-- Neurology clinic notes with patient demographics
SELECT
n.person_id,
n.contact_date,
p.year_of_birth,
p.gender_concept_id,
p.adrc_cohort,
n.note_text
FROM research.neuro_clinic_notes n
JOIN research.person p ON n.person_id = p.person_id
WHERE p.adrc_cohort = 1;
-- Count notes per patient in neurology clinic
SELECT
p.person_id,
p.year_of_birth,
COUNT(n.note_id) as note_count,
MIN(n.contact_date) as first_note_date,
MAX(n.contact_date) as last_note_date
FROM research.person p
JOIN research.neuro_clinic_notes n ON p.person_id = n.person_id
GROUP BY p.person_id, p.year_of_birth
ORDER BY note_count DESC;
Key Analysis Variables:
- Temporal: All
age_at_*fields for timeline analysis;contact_datefor neurology notes - Clinical Values:
value_as_number,value_as_concept_idin measurements/observations - Unstructured Text:
note_textin neuro_clinic_notes and cbdb_notes for NLP and text mining - Cohort:
adrc_cohortflag in person table - Insurance:
insurancefield for healthcare access and utilization analysis - Birth Cohort:
year_of_birthfor generational and age-based analysis - Mortality:
age_at_deathfor survival endpoints
Source documentation:
- The OMOP Common Data Model, official OHDSI Common Data Model documentation.
- For search of concepts, use Athena's data dictionary