Skip to content

Code Rollup and Feature Counts

This guide explains the code rollup strategy used in CBDB and how to access pre-computed rolled-up data for research.

Why Code Rollup?

Raw EHR codes are highly granular. For example, there are thousands of ICD-10 codes for various conditions, making direct analysis challenging. Code rollup aggregates these granular codes into clinically meaningful categories:

Raw Code Type Rolled Up To Purpose
ICD-9/ICD-10 (Diagnoses) PheCode ~1,800 clinically meaningful phenotypes
CPT/HCPCS (Procedures) CCS ~250 clinical procedure categories
LOINC (Labs) LOINC Standard lab test codes
RxNorm (Drugs) RxNorm Ingredient Active drug ingredients

Available Tables

All tables are in the rollup schema and focus on the ADRD cohort.

Mapping Dictionaries

Simple lookup tables to translate raw codes to rolled-up categories.

ICD → PheCode Mapping

Table: rollup.map_icd_phecode

Column Description
icd_code Cleaned ICD code (uppercase, no punctuation)
icd_version ICD9 or ICD10
phecode PheCode (e.g., 290.11)
phecode_feature Feature name format: PheCode:XXX
phecode_description Human-readable description
phecode_category Broad disease category
-- Find all ICD codes that map to Alzheimer's
SELECT icd_code, icd_version, phecode_description
FROM rollup.map_icd_phecode
WHERE phecode = '290.11';

CPT → CCS Mapping

Table: rollup.map_cpt_ccs

Column Description
cpt_code CPT/HCPCS code
ccs_code CCS category code
ccs_feature Feature name format: CCS:XXX
ccs_description CCS category description
-- Look up what CCS category a CPT code belongs to
SELECT * FROM rollup.map_cpt_ccs WHERE cpt_code = '99213';

RxNorm → Ingredient Mapping

Table: rollup.map_rxnorm_ingredient

Column Description
rxnorm_code Source RxNorm code (drug product)
drug_name Drug product name
drug_class RxNorm concept class
ingredient_rxnorm Ingredient-level RxNorm code
ingredient_feature Feature name format: RXNORM_ING:XXX
ingredient_name Ingredient name
-- Find all drugs containing donepezil
SELECT rxnorm_code, drug_name, ingredient_name
FROM rollup.map_rxnorm_ingredient
WHERE ingredient_name ILIKE '%donepezil%';

Patient-Level Counts

Pre-aggregated counts per patient. All counts use the "count once per day" methodology (multiple occurrences on the same day = 1).

Diagnosis Counts (PheCodes)

Table: rollup.patient_phecode_counts

Column Description
person_id Patient identifier
phecode PheCode
phecode_feature Feature name: PheCode:XXX
phecode_description Description
phecode_category Disease category
day_count Number of unique days with this diagnosis
first_occurrence First date
last_occurrence Most recent date
-- Get all dementia-related diagnoses for a patient
SELECT phecode, phecode_description, day_count, first_occurrence
FROM rollup.patient_phecode_counts
WHERE person_id = 12345 AND phecode LIKE '290%';

Procedure Counts (CCS)

Table: rollup.patient_ccs_counts

Column Description
person_id Patient identifier
ccs_code CCS category
ccs_feature Feature name: CCS:XXX
ccs_description Description
day_count Number of unique days
first_occurrence First date
last_occurrence Most recent date
-- What procedures has this patient had?
SELECT ccs_code, ccs_description, day_count
FROM rollup.patient_ccs_counts
WHERE person_id = 12345
ORDER BY day_count DESC;

Lab Counts (LOINC)

Table: rollup.patient_loinc_counts

Column Description
person_id Patient identifier
loinc_code LOINC code
loinc_feature Feature name: LOINC:XXX
loinc_description Lab test name
day_count Number of unique days
first_occurrence First date
last_occurrence Most recent date
-- What labs does this patient have?
SELECT loinc_code, loinc_description, day_count
FROM rollup.patient_loinc_counts
WHERE person_id = 12345
ORDER BY day_count DESC LIMIT 20;

Drug Ingredient Counts

Table: rollup.patient_ingredient_counts

Column Description
person_id Patient identifier
ingredient_rxnorm Ingredient RxNorm code
ingredient_feature Feature name: RXNORM_ING:XXX
ingredient_name Ingredient name
day_count Number of unique days
first_occurrence First date
last_occurrence Most recent date
-- What medications (ingredients) has this patient taken?
SELECT ingredient_name, day_count, first_occurrence
FROM rollup.patient_ingredient_counts
WHERE person_id = 12345
ORDER BY day_count DESC;

Feature Dictionary

Table: rollup.feature_dictionary

Master lookup table for all features. Query any code to see its description, patient counts, and prevalence.

Column Description
feature_code Standardized feature code (TYPE:CODE format)
feature_type PheCode, CCS, LOINC, or RXNORM_ING
source_code Original code value
feature_description Human-readable description
feature_category Category grouping
n_patients Number of patients with this feature
total_cohort_patients Total patients in ADRD cohort
prevalence_pct Percentage of cohort with this feature
total_occurrences Total day-counts across all patients
avg_per_patient Average day-count per patient

Example Queries:

-- Find Alzheimer's related features
SELECT feature_code, feature_description, n_patients, prevalence_pct
FROM rollup.feature_dictionary
WHERE feature_description ILIKE '%alzheimer%';

-- Most common diagnoses in the cohort
SELECT feature_code, feature_description, n_patients, prevalence_pct
FROM rollup.feature_dictionary
WHERE feature_type = 'PheCode'
ORDER BY n_patients DESC
LIMIT 20;

-- Find features with prevalence > 10%
SELECT feature_code, feature_type, feature_description, prevalence_pct
FROM rollup.feature_dictionary
WHERE prevalence_pct > 10
ORDER BY prevalence_pct DESC;

-- Search for a specific code
SELECT * FROM rollup.feature_dictionary
WHERE feature_code = 'PheCode:290.11';

Common Use Cases

Find patients with a specific condition

-- Patients with Alzheimer's diagnosis (PheCode 290.11)
SELECT person_id, day_count, first_occurrence, last_occurrence
FROM rollup.patient_phecode_counts
WHERE phecode = '290.11';

Build a patient feature matrix

-- Wide format: one row per patient, features as columns (example for top 10 diagnoses)
SELECT 
  person_id,
  MAX(CASE WHEN phecode = '290.11' THEN day_count ELSE 0 END) as alzheimers,
  MAX(CASE WHEN phecode = '401.1' THEN day_count ELSE 0 END) as hypertension,
  MAX(CASE WHEN phecode = '250.2' THEN day_count ELSE 0 END) as diabetes
FROM rollup.patient_phecode_counts
GROUP BY person_id;

Export data for R/Python analysis

-- Export diagnosis features
\copy (SELECT person_id, phecode_feature, day_count FROM rollup.patient_phecode_counts) TO 'phecode_features.csv' CSV HEADER;

-- Export all patient counts (long format)
\copy (
  SELECT person_id, phecode_feature as feature, day_count FROM rollup.patient_phecode_counts
  UNION ALL
  SELECT person_id, ccs_feature, day_count FROM rollup.patient_ccs_counts
  UNION ALL
  SELECT person_id, loinc_feature, day_count FROM rollup.patient_loinc_counts
  UNION ALL
  SELECT person_id, ingredient_feature, day_count FROM rollup.patient_ingredient_counts
) TO 'all_features.csv' CSV HEADER;

PheCode Description
290.11 Alzheimer's disease
290.1 Dementias
290.13 Senile dementia
290.16 Vascular dementia
290.12 Frontotemporal dementia
290.2 Delirium
331 Other cerebral degenerations

Data Notes

  • Cohort: All tables contain ADRD cohort patients only
  • Count methodology: "Count once per day" - multiple occurrences on the same day = 1
  • Mapping coverage: Not all raw codes can be mapped; unmapped codes are excluded
  • Coding transitions: ICD-9 (pre-2015) and ICD-10 (post-2015) both map to PheCodes

Further Reading