Code Rollup and Feature Counts
This guide explains the code rollup strategy used in CBDB and how to access pre-computed rolled-up data for research.
Why Code Rollup?
Raw EHR codes are highly granular. For example, there are thousands of ICD-10 codes for various conditions, making direct analysis challenging. Code rollup aggregates these granular codes into clinically meaningful categories:
| Raw Code Type | Rolled Up To | Purpose |
|---|---|---|
| ICD-9/ICD-10 (Diagnoses) | PheCode | ~1,800 clinically meaningful phenotypes |
| CPT/HCPCS (Procedures) | CCS | ~250 clinical procedure categories |
| LOINC (Labs) | LOINC | Standard lab test codes |
| RxNorm (Drugs) | RxNorm Ingredient | Active drug ingredients |
Available Tables
All tables are in the rollup schema and focus on the ADRD cohort.
Mapping Dictionaries
Simple lookup tables to translate raw codes to rolled-up categories.
ICD → PheCode Mapping
Table: rollup.map_icd_phecode
| Column | Description |
|---|---|
icd_code |
Cleaned ICD code (uppercase, no punctuation) |
icd_version |
ICD9 or ICD10 |
phecode |
PheCode (e.g., 290.11) |
phecode_feature |
Feature name format: PheCode:XXX |
phecode_description |
Human-readable description |
phecode_category |
Broad disease category |
-- Find all ICD codes that map to Alzheimer's
SELECT icd_code, icd_version, phecode_description
FROM rollup.map_icd_phecode
WHERE phecode = '290.11';
CPT → CCS Mapping
Table: rollup.map_cpt_ccs
| Column | Description |
|---|---|
cpt_code |
CPT/HCPCS code |
ccs_code |
CCS category code |
ccs_feature |
Feature name format: CCS:XXX |
ccs_description |
CCS category description |
-- Look up what CCS category a CPT code belongs to
SELECT * FROM rollup.map_cpt_ccs WHERE cpt_code = '99213';
RxNorm → Ingredient Mapping
Table: rollup.map_rxnorm_ingredient
| Column | Description |
|---|---|
rxnorm_code |
Source RxNorm code (drug product) |
drug_name |
Drug product name |
drug_class |
RxNorm concept class |
ingredient_rxnorm |
Ingredient-level RxNorm code |
ingredient_feature |
Feature name format: RXNORM_ING:XXX |
ingredient_name |
Ingredient name |
-- Find all drugs containing donepezil
SELECT rxnorm_code, drug_name, ingredient_name
FROM rollup.map_rxnorm_ingredient
WHERE ingredient_name ILIKE '%donepezil%';
Patient-Level Counts
Pre-aggregated counts per patient. All counts use the "count once per day" methodology (multiple occurrences on the same day = 1).
Diagnosis Counts (PheCodes)
Table: rollup.patient_phecode_counts
| Column | Description |
|---|---|
person_id |
Patient identifier |
phecode |
PheCode |
phecode_feature |
Feature name: PheCode:XXX |
phecode_description |
Description |
phecode_category |
Disease category |
day_count |
Number of unique days with this diagnosis |
first_occurrence |
First date |
last_occurrence |
Most recent date |
-- Get all dementia-related diagnoses for a patient
SELECT phecode, phecode_description, day_count, first_occurrence
FROM rollup.patient_phecode_counts
WHERE person_id = 12345 AND phecode LIKE '290%';
Procedure Counts (CCS)
Table: rollup.patient_ccs_counts
| Column | Description |
|---|---|
person_id |
Patient identifier |
ccs_code |
CCS category |
ccs_feature |
Feature name: CCS:XXX |
ccs_description |
Description |
day_count |
Number of unique days |
first_occurrence |
First date |
last_occurrence |
Most recent date |
-- What procedures has this patient had?
SELECT ccs_code, ccs_description, day_count
FROM rollup.patient_ccs_counts
WHERE person_id = 12345
ORDER BY day_count DESC;
Lab Counts (LOINC)
Table: rollup.patient_loinc_counts
| Column | Description |
|---|---|
person_id |
Patient identifier |
loinc_code |
LOINC code |
loinc_feature |
Feature name: LOINC:XXX |
loinc_description |
Lab test name |
day_count |
Number of unique days |
first_occurrence |
First date |
last_occurrence |
Most recent date |
-- What labs does this patient have?
SELECT loinc_code, loinc_description, day_count
FROM rollup.patient_loinc_counts
WHERE person_id = 12345
ORDER BY day_count DESC LIMIT 20;
Drug Ingredient Counts
Table: rollup.patient_ingredient_counts
| Column | Description |
|---|---|
person_id |
Patient identifier |
ingredient_rxnorm |
Ingredient RxNorm code |
ingredient_feature |
Feature name: RXNORM_ING:XXX |
ingredient_name |
Ingredient name |
day_count |
Number of unique days |
first_occurrence |
First date |
last_occurrence |
Most recent date |
-- What medications (ingredients) has this patient taken?
SELECT ingredient_name, day_count, first_occurrence
FROM rollup.patient_ingredient_counts
WHERE person_id = 12345
ORDER BY day_count DESC;
Feature Dictionary
Table: rollup.feature_dictionary
Master lookup table for all features. Query any code to see its description, patient counts, and prevalence.
| Column | Description |
|---|---|
feature_code |
Standardized feature code (TYPE:CODE format) |
feature_type |
PheCode, CCS, LOINC, or RXNORM_ING |
source_code |
Original code value |
feature_description |
Human-readable description |
feature_category |
Category grouping |
n_patients |
Number of patients with this feature |
total_cohort_patients |
Total patients in ADRD cohort |
prevalence_pct |
Percentage of cohort with this feature |
total_occurrences |
Total day-counts across all patients |
avg_per_patient |
Average day-count per patient |
Example Queries:
-- Find Alzheimer's related features
SELECT feature_code, feature_description, n_patients, prevalence_pct
FROM rollup.feature_dictionary
WHERE feature_description ILIKE '%alzheimer%';
-- Most common diagnoses in the cohort
SELECT feature_code, feature_description, n_patients, prevalence_pct
FROM rollup.feature_dictionary
WHERE feature_type = 'PheCode'
ORDER BY n_patients DESC
LIMIT 20;
-- Find features with prevalence > 10%
SELECT feature_code, feature_type, feature_description, prevalence_pct
FROM rollup.feature_dictionary
WHERE prevalence_pct > 10
ORDER BY prevalence_pct DESC;
-- Search for a specific code
SELECT * FROM rollup.feature_dictionary
WHERE feature_code = 'PheCode:290.11';
Common Use Cases
Find patients with a specific condition
-- Patients with Alzheimer's diagnosis (PheCode 290.11)
SELECT person_id, day_count, first_occurrence, last_occurrence
FROM rollup.patient_phecode_counts
WHERE phecode = '290.11';
Build a patient feature matrix
-- Wide format: one row per patient, features as columns (example for top 10 diagnoses)
SELECT
person_id,
MAX(CASE WHEN phecode = '290.11' THEN day_count ELSE 0 END) as alzheimers,
MAX(CASE WHEN phecode = '401.1' THEN day_count ELSE 0 END) as hypertension,
MAX(CASE WHEN phecode = '250.2' THEN day_count ELSE 0 END) as diabetes
FROM rollup.patient_phecode_counts
GROUP BY person_id;
Export data for R/Python analysis
-- Export diagnosis features
\copy (SELECT person_id, phecode_feature, day_count FROM rollup.patient_phecode_counts) TO 'phecode_features.csv' CSV HEADER;
-- Export all patient counts (long format)
\copy (
SELECT person_id, phecode_feature as feature, day_count FROM rollup.patient_phecode_counts
UNION ALL
SELECT person_id, ccs_feature, day_count FROM rollup.patient_ccs_counts
UNION ALL
SELECT person_id, loinc_feature, day_count FROM rollup.patient_loinc_counts
UNION ALL
SELECT person_id, ingredient_feature, day_count FROM rollup.patient_ingredient_counts
) TO 'all_features.csv' CSV HEADER;
Key ADRD-Related PheCodes
| PheCode | Description |
|---|---|
| 290.11 | Alzheimer's disease |
| 290.1 | Dementias |
| 290.13 | Senile dementia |
| 290.16 | Vascular dementia |
| 290.12 | Frontotemporal dementia |
| 290.2 | Delirium |
| 331 | Other cerebral degenerations |
Data Notes
- Cohort: All tables contain ADRD cohort patients only
- Count methodology: "Count once per day" - multiple occurrences on the same day = 1
- Mapping coverage: Not all raw codes can be mapped; unmapped codes are excluded
- Coding transitions: ICD-9 (pre-2015) and ICD-10 (post-2015) both map to PheCodes
Further Reading
- PheWAS Catalog - PheCode mappings and phenotype associations
- AHRQ HCUP CCS - CCS documentation
- LOINC - Laboratory code standards
- RxNorm - Drug terminology