Skip to content

CBDB Research Schema - Table and Column Documentation

Overview

This document provides detailed column information for all tables in the research schema, designed for analysis planning and code development. The research schema contains deidentified views with dates converted to ages for privacy protection.

Note: This documentation excludes adrc_values table as that documentation has been provided separately.


Table 1: research.person

Purpose: Demographics, cohort information, and insurance data Row Level: One record per person

Column List:

Column Name Data Type Description
person_id integer Unique identifier for each person
gender_concept_id integer OMOP concept ID for gender
year_of_birth integer Year of birth for age calculations
race_concept_id integer OMOP concept ID for race
ethnicity_concept_id integer OMOP concept ID for ethnicity
provider_id integer Associated provider ID
care_site_id integer Associated care site ID
gender_source_value varchar Source system gender value
gender_source_concept_id integer Source system gender concept
race_source_value varchar Source system race value
race_source_concept_id integer Source system race concept
ethnicity_source_value varchar Source system ethnicity value
ethnicity_source_concept_id integer Source system ethnicity concept
adrc_cohort integer ADRC cohort flag (1=yes, 0=no)
insurance varchar Insurance information from EPIC (concatenated financial class names)
first_condition_year_month varchar First condition date as YYYY-MM format only
source_system varchar Data source system identifier
source_table varchar Source table name
source_recordid varchar Source record identifier

Key Analysis Notes:

  • Year of birth included for precise age calculations and birth cohort analysis
  • Use adrc_cohort = 1 to filter for main research population
  • Insurance data contains concatenated financial class names from EPIC (e.g., "Medicare", "Commercial Insurance; Medicare")
  • first_condition_year_month provides timeline context without exact dates
  • Insurance information is available for Clarity patients who have coverage data in EPIC

Table 2: research.condition_occurrence

Purpose: Diagnosis records with age-based temporal data Row Level: One record per condition occurrence

Column List:

Column Name Data Type Description
condition_occurrence_id bigint Unique identifier for each condition occurrence
person_id integer Reference to person table
condition_concept_id integer OMOP concept ID for the condition
age_at_condition_start numeric Age when condition was first diagnosed
age_at_condition_end numeric Age when condition ended (if applicable)
condition_type_concept_id integer Type of condition record (EHR, claim, etc.)
stop_reason varchar Reason condition was stopped
provider_id integer Provider who recorded the condition
visit_occurrence_id bigint Associated visit
visit_detail_id bigint Associated visit detail
condition_source_value varchar Original condition code from source
condition_source_concept_id integer Source concept for the condition
condition_status_source_value varchar Source value for condition status
condition_status_concept_id integer Concept for condition status
facility_id integer Facility where condition was recorded
source_system varchar Data source system identifier
source_table varchar Source table name
source_recordid varchar Source record identifier

Key Analysis Notes:

  • Age fields replace all date fields for deidentification
  • Link to research.concept using condition_concept_id for condition names
  • condition_source_value contains original ICD codes (type:value format)

Table 3: research.drug_exposure

Purpose: Medication records with age-based temporal data Row Level: One record per drug exposure

Column List:

Column Name Data Type Description
drug_exposure_id bigint Unique identifier for each drug exposure
person_id integer Reference to person table
drug_concept_id integer OMOP concept ID for the drug
age_at_drug_start numeric Age when drug exposure started
age_at_drug_end numeric Age when drug exposure ended
age_at_verbatim_end numeric Age at verbatim end date
drug_type_concept_id integer Type of drug record (prescription, OTC, etc.)
stop_reason varchar Reason drug was stopped
refills integer Number of refills
quantity numeric Quantity dispensed
days_supply integer Days supply
sig varchar Prescription instructions
route_concept_id integer Route of administration concept
lot_number varchar Drug lot number
provider_id integer Prescribing provider
visit_occurrence_id bigint Associated visit
visit_detail_id bigint Associated visit detail
drug_source_value varchar Original drug code from source
drug_source_concept_id integer Source concept for the drug
route_source_value varchar Source value for route
dose_unit_source_value varchar Source value for dose unit
description varchar Drug description
facility_id integer Facility where drug was prescribed
source_system varchar Data source system identifier
source_table varchar Source table name
source_recordid varchar Source record identifier

Key Analysis Notes:

  • Three age fields provide comprehensive temporal information
  • quantity, days_supply, and refills enable dosage analysis
  • drug_source_value contains original NDC/RxNorm codes

Table 4: research.procedure_occurrence

Purpose: Procedure records with age-based temporal data Row Level: One record per procedure

Column List:

Column Name Data Type Description
procedure_occurrence_id bigint Unique identifier for each procedure
person_id integer Reference to person table
procedure_concept_id integer OMOP concept ID for the procedure
age_at_procedure numeric Age when procedure was performed
procedure_type_concept_id integer Type of procedure record
modifier_concept_id integer Procedure modifier concept
quantity integer Quantity of procedures performed
provider_id integer Provider who performed procedure
visit_occurrence_id bigint Associated visit
visit_detail_id bigint Associated visit detail
procedure_source_value varchar Original procedure code from source
procedure_source_concept_id integer Source concept for procedure
modifier_source_value varchar Source value for modifier
facility_id integer Facility where procedure was performed
source_system varchar Data source system identifier
source_table varchar Source table name
source_recordid varchar Source record identifier

Key Analysis Notes:

  • procedure_source_value contains original CPT codes (type:value format)
  • quantity field enables counting multiple procedures
  • Link to concept table for procedure descriptions

Table 5: research.measurement

Purpose: Lab results and clinical measurements with age-based temporal data Row Level: One record per measurement

Column List:

Column Name Data Type Description
measurement_id bigint Unique identifier for each measurement
person_id integer Reference to person table
measurement_concept_id integer OMOP concept ID for the measurement
age_at_measurement numeric Age when measurement was taken
measurement_time varchar Time of measurement (time only, no date)
measurement_type_concept_id integer Type of measurement record
operator_concept_id integer Operator concept (=, <, >, etc.)
value_as_number numeric Numeric measurement value
value_as_concept_id integer Categorical measurement value
unit_concept_id integer Unit of measurement concept
range_low numeric Normal range low value
range_high numeric Normal range high value
provider_id integer Provider who ordered measurement
visit_occurrence_id bigint Associated visit
visit_detail_id bigint Associated visit detail
measurement_source_value varchar Original measurement code from source
measurement_source_concept_id integer Source concept for measurement
unit_source_value varchar Source value for unit
unit_source_concept_id integer Source concept for unit
value_source_value varchar Source value as recorded
measurement_event_id bigint Associated measurement event
meas_event_field_concept_id integer Measurement event field concept
description varchar Measurement description
facility_id integer Facility where measurement was taken
source_system varchar Data source system identifier
source_table varchar Source table name
source_recordid varchar Source record identifier

Key Analysis Notes:

  • Rich measurement data with values, units, and normal ranges
  • value_as_number for quantitative labs, value_as_concept_id for categorical
  • range_low/range_high enable abnormal value detection

Table 6: research.observation

Purpose: Clinical observations and notes with age-based temporal data Row Level: One record per observation

Column List:

Column Name Data Type Description
observation_id bigint Unique identifier for each observation
person_id integer Reference to person table
observation_concept_id integer OMOP concept ID for the observation
age_at_observation numeric Age when observation was made
observation_type_concept_id integer Type of observation record
value_as_number numeric Numeric observation value
value_as_string varchar Text observation value
value_as_concept_id integer Categorical observation value
qualifier_concept_id integer Qualifier concept for observation
unit_concept_id integer Unit concept for numeric values
provider_id integer Provider who made observation
visit_occurrence_id bigint Associated visit
visit_detail_id bigint Associated visit detail
observation_source_value varchar Original observation code from source
observation_source_concept_id integer Source concept for observation
unit_source_value varchar Source value for unit
qualifier_source_value varchar Source value for qualifier
value_source_value varchar Source value as recorded
observation_event_id bigint Associated observation event
obs_event_field_concept_id integer Observation event field concept
description varchar Observation description
facility_id integer Facility where observation was made
source_system varchar Data source system identifier
source_table varchar Source table name
source_recordid varchar Source record identifier

Key Analysis Notes:

  • Multiple value formats: numeric, string, and concept-based
  • Includes clinical assessments, vital signs, and survey responses
  • qualifier_concept_id provides additional context

Table 7: research.visit_occurrence

Purpose: Healthcare visit records with age-based temporal data Row Level: One record per visit

Column List:

Column Name Data Type Description
visit_occurrence_id bigint Unique identifier for each visit
person_id integer Reference to person table
visit_concept_id integer OMOP concept ID for visit type
age_at_visit_start numeric Age at start of visit
age_at_visit_end numeric Age at end of visit
visit_type_concept_id integer Type of visit record
provider_id integer Primary provider for visit
care_site_id integer Care site where visit occurred
visit_source_value varchar Original visit code from source
visit_source_concept_id integer Source concept for visit
admitted_from_concept_id integer Where patient was admitted from
admitted_from_source_value varchar Source value for admission source
discharged_to_concept_id integer Where patient was discharged to
discharged_to_source_value varchar Source value for discharge destination
preceding_visit_occurrence_id bigint Previous visit in sequence
facility_id integer Facility where visit occurred
source_system varchar Data source system identifier
source_table varchar Source table name
source_recordid varchar Source record identifier

Key Analysis Notes:

  • Visit duration can be calculated from start/end ages
  • visit_concept_id distinguishes inpatient, outpatient, emergency, etc.
  • preceding_visit_occurrence_id enables visit sequence analysis

Table 8: research.death

Purpose: Death records with age-based temporal data Row Level: One record per person (if deceased)

Column List:

Column Name Data Type Description
person_id integer Reference to person table
age_at_death numeric Age at time of death
death_type_concept_id integer Type of death record
cause_concept_id integer Primary cause of death concept
cause_source_value varchar Original cause of death code
cause_source_concept_id integer Source concept for cause of death
source_system varchar Data source system identifier
source_table varchar Source table name
source_recordid varchar Source record identifier

Key Analysis Notes:

  • Mortality endpoint for survival analysis
  • cause_concept_id provides standardized cause of death
  • Link to person table to identify deceased patients

Table 9: research.neuro_clinic_notes

Purpose: Clinical notes from Columbia Neurology Aging and Dementia department Row Level: One record per clinical note

Column List:

Column Name Data Type Description
person_id bigint Reference to person table
pat_enc_csn_id bigint EPIC-specific encounter identifier (for internal tracking only)
contact_date timestamp Date and time of the clinical note/encounter
note_id bigint EPIC-specific note identifier (for internal tracking only)
note_text text Full text of the clinical note

Key Analysis Notes:

  • Contains unstructured clinical text from neurology encounters
  • Focus on patients with aging and dementia concerns
  • note_text field can be analyzed for symptoms, diagnoses, treatment plans
  • Useful for natural language processing (NLP) and text mining
  • Link to research.person via person_id for patient demographics
  • contact_date provides exact temporal context (not converted to age)
  • Note: pat_enc_csn_id and note_id are EPIC system identifiers; cannot be linked to OMOP tables

Clinical Note Content May Include:

  • Patient symptoms and complaints
  • Clinical assessments and findings
  • Medication prescriptions and changes
  • Treatment recommendations
  • Follow-up instructions
  • Provider observations

Table 10: research.cbdb_notes

Purpose: Clinical notes from all CBDB cohorts (ADRD, pre-AD, control) Row Level: One record per clinical note

Column List:

Column Name Data Type Description
person_id bigint Reference to person table
pat_enc_csn_id bigint EPIC-specific encounter identifier (for internal tracking only)
age_at_contact numeric Age at the time of the clinical note (replaces contact_date for deidentification)
note_id bigint EPIC-specific note identifier (for internal tracking only)
note_text text Full text of the clinical note

Key Analysis Notes:

  • Contains unstructured clinical text from all three CBDB cohorts (ADRD, pre-AD, control)
  • Unlike neuro_clinic_notes, this table covers all departments, not just neurology
  • age_at_contact replaces contact_date for deidentification (consistent with other research tables)
  • note_text field can be analyzed for symptoms, diagnoses, treatment plans
  • Useful for natural language processing (NLP) and text mining across the full cohort
  • Link to research.person via person_id for patient demographics
  • Note: pat_enc_csn_id and note_id are EPIC system identifiers; cannot be linked to OMOP tables

Clinical Note Content May Include:

  • Patient symptoms and complaints
  • Clinical assessments and findings
  • Medication prescriptions and changes
  • Treatment recommendations
  • Follow-up instructions
  • Provider observations

Table 11: research.concept_relationship

Purpose: OMOP vocabulary relationships between concepts (e.g., Maps to, Subsumes, Has ingredient) Row Level: One record per concept relationship

Column List:

Column Name Data Type Description
concept_id_1 integer Source concept ID in the relationship
concept_id_2 integer Target concept ID in the relationship
relationship_id varchar Type of relationship (e.g., 'Maps to', 'Subsumes', 'RxNorm has ing')
valid_start_date date When relationship became valid
valid_end_date date When relationship became invalid
invalid_reason varchar Reason relationship was invalidated

Key Analysis Notes:

  • Essential for mapping between vocabularies (e.g., ICD10CM to SNOMED)
  • Common relationship types:
  • 'Maps to': Maps source concept to standard concept
  • 'Subsumes': Hierarchical parent-child relationship
  • 'RxNorm has ing': Drug to ingredient mapping
  • 'Has asso proc': Condition to associated procedure
  • Use to find standard concepts from source codes
  • Enables concept hierarchy navigation
  • Critical for drug ingredient analysis (rolling up specific drugs to ingredients)

Common Use Cases:

-- Map ICD10 codes to standard SNOMED concepts
SELECT 
    c1.concept_code as icd10_code,
    c1.concept_name as icd10_name,
    c2.concept_code as snomed_code,
    c2.concept_name as snomed_name
FROM research.concept_relationship cr
JOIN research.concept c1 ON cr.concept_id_1 = c1.concept_id
JOIN research.concept c2 ON cr.concept_id_2 = c2.concept_id
WHERE c1.vocabulary_id = 'ICD10CM'
  AND c2.vocabulary_id = 'SNOMED'
  AND cr.relationship_id = 'Maps to';

-- Find all ingredients in a drug
SELECT 
    c1.concept_name as drug_name,
    c2.concept_name as ingredient_name
FROM research.concept_relationship cr
JOIN research.concept c1 ON cr.concept_id_1 = c1.concept_id
JOIN research.concept c2 ON cr.concept_id_2 = c2.concept_id
WHERE cr.relationship_id = 'RxNorm has ing';

Table 12: research.concept

Purpose: OMOP vocabulary lookup table (no deidentification needed) Row Level: One record per concept

Column List:

Column Name Data Type Description
concept_id integer Unique concept identifier
concept_name varchar Human-readable concept name
domain_id varchar Domain (Condition, Drug, Procedure, etc.)
vocabulary_id varchar Vocabulary system (ICD10CM, RxNorm, CPT4, etc.)
concept_class_id varchar Classification within vocabulary
standard_concept varchar Standard concept flag (S=standard, C=classification)
concept_code varchar Original code from vocabulary
valid_start_date date When concept became valid
valid_end_date date When concept became invalid
invalid_reason varchar Reason concept was invalidated

Key Analysis Notes:

  • Essential lookup table for translating concept_ids to readable names
  • Filter by domain_id to find concepts for specific domains
  • Use vocabulary_id to identify code systems (ICD10CM, CPT4, RxNorm, etc.)
  • standard_concept = 'S' identifies preferred standard concepts

Cross-Table Analysis Patterns

Common Joins:

-- All clinical events for ADRC cohort with insurance data
SELECT p.person_id, p.adrc_cohort, p.insurance, c.age_at_condition_start
FROM research.person p
JOIN research.condition_occurrence c ON p.person_id = c.person_id
WHERE p.adrc_cohort = 1;

-- Analyze conditions by insurance type
SELECT 
    p.insurance,
    COUNT(DISTINCT p.person_id) as patient_count,
    COUNT(c.condition_occurrence_id) as condition_count
FROM research.person p
JOIN research.condition_occurrence c ON p.person_id = c.person_id
WHERE p.adrc_cohort = 1 AND p.insurance IS NOT NULL
GROUP BY p.insurance
ORDER BY patient_count DESC;

-- Translate concepts to names
SELECT co.person_id, co.age_at_condition_start, c.concept_name
FROM research.condition_occurrence co
JOIN research.concept c ON co.condition_concept_id = c.concept_id;

-- Map source codes to standard concepts using concept_relationship
SELECT 
    co.person_id,
    co.condition_source_value,
    c_source.concept_name as source_condition_name,
    c_standard.concept_name as standard_condition_name
FROM research.condition_occurrence co
JOIN research.concept c_source ON co.condition_source_concept_id = c_source.concept_id
JOIN research.concept_relationship cr ON c_source.concept_id = cr.concept_id_1
JOIN research.concept c_standard ON cr.concept_id_2 = c_standard.concept_id
WHERE cr.relationship_id = 'Maps to';

-- Neurology clinic notes with patient demographics
SELECT 
    n.person_id,
    n.contact_date,
    p.year_of_birth,
    p.gender_concept_id,
    p.adrc_cohort,
    n.note_text
FROM research.neuro_clinic_notes n
JOIN research.person p ON n.person_id = p.person_id
WHERE p.adrc_cohort = 1;

-- Count notes per patient in neurology clinic
SELECT 
    p.person_id,
    p.year_of_birth,
    COUNT(n.note_id) as note_count,
    MIN(n.contact_date) as first_note_date,
    MAX(n.contact_date) as last_note_date
FROM research.person p
JOIN research.neuro_clinic_notes n ON p.person_id = n.person_id
GROUP BY p.person_id, p.year_of_birth
ORDER BY note_count DESC;

Key Analysis Variables:

  • Temporal: All age_at_* fields for timeline analysis; contact_date for neurology notes
  • Clinical Values: value_as_number, value_as_concept_id in measurements/observations
  • Unstructured Text: note_text in neuro_clinic_notes and cbdb_notes for NLP and text mining
  • Cohort: adrc_cohort flag in person table
  • Insurance: insurance field for healthcare access and utilization analysis
  • Birth Cohort: year_of_birth for generational and age-based analysis
  • Mortality: age_at_death for survival endpoints

Source documentation: