Comparison of ICD-9-CM to ICD-10-CM Crosswalks Derived by Physician and Clinical Coder vs. Automated Methods

By Jason C. Simeone, PhD; Xinyue Liu, PhD; Tarun Bhagnani, MS; Matthew W. Reynolds, PhD; Jenna Collins, MPH; and Edward A. Bortnichak PhD, MPH, MBE


Purpose: To evaluate whether automated methods are sufficient for deriving ICD-10-CM algorithms by comparing ICD-9-CM to ICD-10-CM crosswalks from general equivalence mappings (GEMs) with physician/clinical coder-derived crosswalks.

Patients and methods: Forward mapping was used to derive ICD-10-CM crosswalks for 10 conditions. As a sensitivity analysis, forward-backward mapping (FBM) was also conducted for three clinical conditions. The physician/coder independently developed crosswalks for the same conditions. Differences between the crosswalks were summarized using the Jaccard similarity coefficient (JSC).

Results: Physician/coder crosswalks were typically far more inclusive than GEMs crosswalks. Crosswalks for peripheral artery disease were most dissimilar (JSC: 0.06), while crosswalks for mild cognitive impairment (JSC: 1) and congestive heart failure (0.85) were most similar. FBM added ICD-10-CM codes for all three conditions but did not consistently increase similarity between crosswalks.

Conclusion: The GEMs and physician/coder algorithms rarely aligned fully; human review is still required for ICD-9-CM to ICD-10-CM crosswalk development.

Keywords: Coding algorithms, diagnosis codes, healthcare research, general equivalence mappings, ICD-10 transition.


Observational studies of administrative claims or electronic medical record (EMR) data frequently require the use of coding algorithms to identify patients with a particular comorbidity or outcome of interest.1 The coding algorithms developed in the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) coding system are now obsolete when analyzing contemporary data.

The United States Department of Health & Human Services required providers to switch from diagnosing medical conditions with the ICD-9-CM to the ICD-10th Revision, Clinical Modification (ICD-10-CM) by October 1, 2015.2 This transition necessitated a “crosswalk” of existing ICD-9-CM coding algorithms to the ICD-10-CM system. The Centers for Medicare & Medicaid Services (CMS) published the GEMs as a software tool to facilitate this process, with the goal of allowing medical professionals, researchers, and others to identify ICD-9-CM to ICD-10-CM coding crosswalks.3

The GEMs are a comprehensive translation tool used to convert diagnosis and procedure codes from one ICD coding system to the other.4 These crosswalks assist healthcare professionals, researchers, and administrative staff by linking codes for data used for billing, tracking quality, recording morbidity/mortality, calculating reimbursement, and converting any ICD-9-CM-based application to ICD-10-CM/Procedural Classification System.4

There are two types of GEMs tools, which are directional. The forward mappings convert ICD-9-CM codes into ICD-10-CM, and the backward mappings convert ICD-10-CM codes into ICD-9-CM. The forward and backward GEMs tools are not exact duplicates of one another; they are independent mappings that differ in coverage.

The accuracy of GEMs and similar tools, however, is not well established. The CMS published the GEMs crosswalk tool to test and convert ICD-9-CM codes to ICD-10-CM, develop application specific-mappings, and link and analyze data in long-term clinical studies. There are few direct matches that exist between coding systems, as some ICD-9-CM codes are represented by several ICD-10-CM codes, and vice versa. Approximate matches, therefore, may not necessarily identify the most clinically appropriate ICD-10-CM coding algorithm for a given condition. This study compared ICD-9-CM to ICD-10-CM crosswalks from GEMs with those derived by a physician and clinical coder and evaluated whether automated methods are sufficient for deriving ICD-10-CM algorithms.


This was a coding algorithm development and comparison study; no patient selection or analysis of patient data was performed.

Existing ICD-9-CM algorithms were compiled and evaluated by the study authors for the following 10 clinical conditions: acute myocardial infarction (AMI), cardiac arrhythmia, chronic kidney disease (CKD), congestive heart failure (CHF), diabetes, diabetic neuropathy, hypertension, hypoglycemia, mild cognitive impairment (MCI), and peripheral artery disease (PAD). The GEMs crosswalk software tool was used to identify the corresponding ICD-10-CM codes from each ICD-9-CM code in the existing algorithms for each clinical condition via forward mapping. As a sensitivity analysis, forward-backward mapping (FBM) was conducted on three of the clinical conditions: AMI, arrhythmia, and hypoglycemia. In FBM, both forward and backward dictionaries were used to search for ICD-10-CM codes corresponding to the ICD-9-CM codes in the algorithms.

A physician and a clinical coding expert also independently identified an appropriate ICD-10-CM algorithm for each clinical condition; no specifications or restrictions were placed on the means they used to identify those codes. A questionnaire (see Supplemental Table 1) was pre-filled with each clinical condition and corresponding ICD-9-CM algorithm and was provided to the physician and coder to guide their development of ICD-10-CM algorithms. The physician and clinical coder were asked to evaluate the differences and exercise their judgment to determine which ICD-10-CM codes were required for inclusion or exclusion from each algorithm.

The differences between the GEMs and the physician/coder crosswalks for each selected condition were then quantitatively summarized using descriptive statistics, including means, medians, and ranges. The analysis unit of the differences was billable codes. The Jaccard similarity coefficient (JSC), a measure ranging from 0 (completely dissimilar) to 1 (completely similar), was also used to identify the degree to which GEMs-derived crosswalks were similar to the physician/coder crosswalks. The JSC was calculated as JSC(A,B) = |A∩B| / |AB|, with A representing the elements in set A (i.e., codes from the GEMs-derived crosswalk), B representing the elements in set B (i.e., codes from the physician/coder-derived crosswalk), ∩ representing the number of elements shared in both sets, and representing the total number of unique elements in both sets. The theoretical impact of differences between algorithms on sensitivity and specificity was assessed qualitatively.


The full ICD-10-CM crosswalks from the GEMs and the physician/coder algorithms are located in Supplemental Table 2.

As shown in Figure 1 and Figure 2, the crosswalks for diabetes and PAD had the most differences (>240 after comparing those from GEMs forward matching with those identified by the physician/coder), while the crosswalks for MCI, CHF, and hypoglycemia had the fewest differences (<7) between the two sets. The JSC ranged from 0.06 for the PAD crosswalks to 1.00 for the MCI crosswalk. In general, the crosswalks identified by the physician/coder were far more inclusive than those identified by the GEMs system. When compared with crosswalks identified by the physician/coder, the crosswalks from GEMs were missing a mean of 61.2 individual billing level ICD-10-CM codes (median: 9.0; range: 0–294). Alternatively, the GEMs crosswalks sometimes contained codes that were not identified by the physician and coder. The algorithms identified by the physician/coder were missing a mean of 4.0 codes after comparison with the GEMs crosswalks (median: 3.0 codes; range: 0–15).

AMI. Interestingly, GEMs did not identify some codes that appeared to be clearly indicative of myocardial infarction, such as ICD-10-CM I21.01 (ST elevation myocardial infarction involving left main coronary artery). The JSC for the GEMs and physician/coder crosswalks was 0.53.

Arrhythmia. The physician/coder algorithm identified codes that appear to indicate a diagnosis of arrhythmia, even when the term did not appear in the description (including ICD-10-CM code 148.4 – atypical atrial flutter). In general, the crosswalk developed by the physician/coder should have a higher sensitivity, but perhaps lower specificity, than the crosswalk identified by GEMs (JSC: 0.40).

CKD. Broad differences were identified across the GEMs and clinician/coder crosswalks (JSC: 0.29), and the physician/coder crosswalk likely has a higher sensitivity than the GEMs crosswalk. For instance, some ICD-10-CM codes for diabetes mellitus with diabetic CKD or kidney complications were included in the physician/coder crosswalk, but not in the crosswalk identified by GEMs.

CHF. The two sets of crosswalks were largely similar (JSC: 0.85), although the physician/coder algorithm is more inclusive than the GEMs algorithm. For example, ICD-10-CM I11.0 (hypertensive heart disease with heart failure) and P29.0 (neonatal cardiac failure) were found in the physician/coder algorithm but not in the GEMs algorithm.

Diabetes. Only one-third of codes were similar across the two sets of crosswalks (JSC: 0.33); due to the large number of codes identified for this condition by each approach, this level of dissimilarity resulted in 298 differences between the GEMs and physician/coder crosswalks. The physician/coder crosswalk included 294 codes that the GEMS crosswalk did not identify.

In some of those cases, GEMs did not include relevant codes, such as ICD-10-CM E10.32 (type 1 diabetes mellitus with mild non-proliferative diabetic retinopathy). Other GEMs omissions were related to the etiology of the disease, e.g., the physician/coder included ICD-10-CM codes for drug/chemical-induced diabetes and gestational diabetes.

Diabetic neuropathy. Again, the physician/coder crosswalk for diabetic neuropathy had higher sensitivity; just over half of the codes identified from both crosswalks were similar (JSC: 0.56). For example, codes E10.41 and E10.42 (type 1 diabetes mellitus with diabetic mononeuropathy and polyneuropathy, respectively) were included by the physician and coder only. The physician/coder also omitted some codes included by GEMs, including related conditions that could play a role in the development of diabetic neuropathy (such as E11.65/E10.65–type 2/1 diabetes mellitus with hyperglycemia).

Hypertension. Some codes included by the physician and coder but not GEMs identify medical conditions that specify hypertension, such as ICD-10-CM I15.1 (hypertension secondary to other renal disorders) and I15.9 (secondary hypertension, unspecified). Over half of the codes (JSC: 0.60) were similar across sets.

Hypoglycemia. Only half of the codes identified for hypoglycemia by each approach were similar (JSC: 0.50). The codes identified by the physician and coder specifically mention hypoglycemia (for example, ICD-10-CM code E11.641; type 2 diabetes mellitus with hypoglycemia with coma), while those identified only in the GEMs search did not (E71.0; maple-syrup-urine disease).

MCI. The ICD-9-CM algorithm for MCI included only one code (331.83: mild cognitive impairment), and both the GEMs and the physician/coder algorithms included only the analogous ICD-10-CM code, G31.84 (JSC: 1.00).

PAD. The crosswalks derived for PAD by each approach were most dissimilar among all conditions included in the study (JSC: 0.06). Nearly all (n = 243/249, or 97.6%) differences identified from the crosswalks for PAD were codes that were identified by the physician and coder but not the GEMs algorithms. Most of those codes were for diagnoses that should improve the identification of PAD, such as ICD-10-CM I70.2 (atherosclerosis of native arteries of the extremities).

Sensitivity analysis. Three conditions—AMI, arrhythmias, and hypoglycemia—were included in the sensitivity analysis to assess differences between the GEMs forward matching and FBM approaches. Overall, FBM reduced the number of differences between the GEMs-derived algorithms and the physician/coder algorithms for two conditions, although it added five differences (codes now identified by GEMs FBM that were not identified by the physician/coder) for one condition. Eight differences between the GEMs and physician/coder approaches were identified for AMI after forward mapping (all were codes identified by the physician/coder but not by GEMs); all eight of those codes were identified by FBM, so no differences remained between the two approaches after FBM (JSC increased from 0.53 to 1.00). One additional ICD-10-CM code was added by FBM for the arrhythmia algorithm, but this code was not present in the physician/coder crosswalk, so similarity between the crosswalks decreased (JSC decreased from 0.40 to 0.38). FBM identified six additional ICD-10-CM codes for hypoglycemia, but only one of these matched with the codes in the physician/coder crosswalk, leaving five additional unmatched codes; therefore, the crosswalks were more dissimilar after FBM (JSC decreased from 0.50 to 0.42).


This study used two distinct methods to identify ICD-9-CM to ICD-10-CM crosswalks for 10 selected clinical conditions — an automated system (GEMs), and a process by which a physician and clinical coder applied their expertise (with the aid of a questionnaire) to assess the selected conditions. The similarity of crosswalks derived from each approach varied considerably, and results demonstrated that the crosswalks developed by the physician/coder were more inclusive than those identified via GEMs, except for hypoglycemia (three vs. four) and MCI (no differences). The general inclusiveness of the algorithms identified by the physician and clinical coder likely increased the sensitivity of the algorithms, while potentially decreasing the specificity, in comparison with those identified via GEMs.

The ability of a physician/coder to consider various clinical factors related to individual conditions made it possible to identify a larger number of potential codes. The GEMs method, on the other hand, struggled to identify clinical conditions with a broad scope and/or various etiologies (e.g., diabetes), those where the definition may vary somewhat between physicians (e.g., PAD), and conditions that may be a side effect of some therapeutic classes of medications (e.g., hypoglycemia).

Fung et al have summarized the performance of various methods for using GEMs to generate ICD-9-CM to ICD-10-CM crosswalks and determined that FBM had better performance than conventional forward mapping.5 The forward and backward GEMs are not mirror images, and the FBM approach permits the user to identify codes that would not otherwise be identified through either forward or backward matching alone. In the present study, our comparison of the AMI crosswalks from GEMs and the physician/coder yielded eight differences; the use of FBM eliminated all eight, resulting in fully similar crosswalks across both approaches. However, the use of FBM decreased the similarity of crosswalks derived for two other conditions (arrhythmia and hypoglycemia) included in the sensitivity analysis. While the use of FBM was an improvement compared with conventional forward mapping for the crosswalk developed for one condition (AMI), it was not a full substitute for physician and coder involvement in developing ICD-10-CM crosswalks in the present study.

Although the human element provided a clear advantage for creating crosswalks for existing algorithms, algorithms derived from any method should be reviewed and refined by researchers to ensure that they are appropriate for the study objectives. A retrospective analysis of claims data with a chart review could provide valuable information for validation of selected algorithms. The “gold standard” for identifying whether a patient did or did not have a condition of interest would involve a review of the medical charts or other approach to confirm the diagnosis, and the sensitivity, specificity, and other performance metrics of the algorithms identified by GEMs and by a physician and coder could be calculated against the extracted data from the validation. As no patient data was used in this study, such a validation was not possible.


The conclusions of this study may have differed if a larger, or alternate, set of conditions was used to evaluate the algorithm crosswalks. The algorithms were identified by one physician and one clinical coding expert; variations in expertise and resources could impact the review process — and ultimately, the results — of other physician/coding experts. Finally, only qualitative statements about the likely effect of differences between crosswalks on performance metrics, such as specificity, sensitivity, positive predictive value, and negative predictive value, could be made due to the lack of patient data/chart review.


The use of GEMs alone is likely insufficient for identifying appropriate ICD-10-CM crosswalks from ICD-9-CM algorithms; physicians and clinical coders use their expertise and other resources to identify additional codes required in the development of more accurate algorithms. Neither method is comprehensive, however, and algorithms should be thoroughly reviewed and validated, if possible, prior to implementation by researchers.


Merck Sharp & Dohme Corporation provided funding for this study. Co-authors employed by the study sponsors contributed to the study design, data interpretation, and drafting of the manuscript.


The authors thank Dr. Brian Sanderson and Kathleen Seenan of PPD for their contributions in developing the physician and clinical coder-derived ICD-10-CM crosswalks for this study, Dr. Dan Mines, senior director of epidemiology at RTI, for review of the codes, and the Editorial and Design Services team of Evidera for assistance in editing and preparation of this manuscript.

About the Authors

Jason C. Simeone, PhD, ( is senior research scientist and director of US database analytics, Evidera, Waltham, MA.

Xinyue Liu, PhD ( is principal scientist, Pharmacoepidemiology Department, CORE, Merck Sharp & Dohme Corporation, North Wales, PA

Tarun Bhagnani, MS ( was research associate, Evidera, Waltham, MA.

Matthew W. Reynolds, PhD ( was vice president, Epidemiology, Evidera, Waltham, MA.

Jenna Collins, MPH ( is senior research associate, Evidera, Waltham, MA.

Edward A. Bortnichak, PhD, MPH, MBE ( is executive director and global head, Pharmacoepidemiology Database Research Unit, Pharmacoepidemiology Department, CORE, Merck Sharp & Dohme Corporation, North Wales, PA.


1. Quan H, Sundararajan V, Halfon P, et al. “Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data.” Med Care. 2005;43(11):1130-1139.

2. US Department of Health and Human Services Office of the Secretary. 45 CFR Part 162 [CMS–0043–F] RIN 0938–AS31: Administrative Simplification: Change to the Compliance Date for the International Classification of Diseases, 10th Revision (ICD–10–CM and ICD–10–PCS) Medical Data Code Sets. Available at: Fed Regist. August 4, 2014;79(149):45128-45134.

3. Centers for Medicare & Medicaid Services (CMS). 2015 ICD-10-CM and GEMs. Page last modified September 29, 2014;, 2016.

4. Centers for Medicare & Medicaid Services (CMS). General Equivalence Mappings: Frequently Asked Questions Washington, DC: Department of Health and Human Services; 2016. Available at:

5. Fung KW, Richesson R, Smerek M, et al. “Preparing for the ICD-10-CM Transition: Automated Methods for Translating ICD Codes in Clinical Phenotype Definitions.” EGEMS (Wash DC). 2016;4(1):1211.

Posted in:

Leave a Reply