Improving the Collection of Race, Ethnicity, and Language Data to Reduce Healthcare Disparities: A Case Study from an Academic Medical Center

by Wei-Chen Lee, PhD; Sreenivas P. Veeranki, MBBS, DrPH; Hani Serag, MD; Karl Eschbach, PhD; and Kenneth D. Smith, PhD


Well-designed electronic health records (EHRs) must integrate a variety of accurate information to support efforts to improve quality of care, particularly equity-in-care initiatives. This case study provides insight into the challenges those initiatives may face in collecting accurate race, ethnicity, and language (REAL) information in the EHR. We present the experience of an academic medical center strengthening its EHR for better collection of REAL data with funding from the EHR Incentive Programs for meaningful use of health information technology and the Texas Medicaid 1115 Waiver program. We also present a plan to address some of the challenges that arose during the course of the project. Our experience at an academic medical center can provide guidance about the likely challenges similar institutions may expect when they implement new initiatives to collect REAL data, particularly challenges regarding scope, personnel, and other resource needs.


The rapid growth in the use of electronic health records (EHRs) underscores the demand for efficiency and effectiveness of health services in the US healthcare system.1 The Centers for Medicare and Medicaid Services (CMS) began the EHR Incentive Programs to promote meaningful use (MU) in 2011. Since then, the benefits of employing EHRs have been widely documented and discussed.2, 3 Benefits include enhanced administration efficiency, reduced healthcare costs, and improved coordination of care. In addition, accurate demographic information from EHRs can assist researchers in monitoring and addressing disparities in diagnoses, procedures, and outcomes, which in turn can facilitate clinical decision making and effective quality improvement planning.4 Despite these benefits, healthcare providers face challenges in transitioning from paper systems to EHRs and ensuring accurate and secure data in the EHR.

Although not a part of the MU incentive program, CMS funded an initiative to promote demographic data collection through one of the Texas Medicaid 1115 Waiver projects, “Race, Ethnicity, and Language (REAL) Data.” The purpose of the 1115 Waiver is to give states additional flexibility to design and improve their Medicaid and Children’s Health Insurance Program (CHIP) programs.5 There is a growing body of scholarly literature and guidelines discussing the value and benefits of implementing EHRs;6–9 the role of EHRs in informing policy alternatives while planning for and implementing health system transformation;10, 11 and detailed technical issues related to the use of different technologies and data collection, management, and interpretation.12, 13 Our study aims to provide empirical evidence to demonstrate the complexity of REAL data collection in practice. We assess the implementation of our REAL Data project, which requires complete and accurate collection of race, ethnicity, and preferred language data from patient populations. Given the persistent health disparities in the United States, the REAL Data project offers an opportunity to extend the use of EHRs meaningfully to identify and address healthcare disparities. In addition, we present three important challenges identified in the REAL Data project and present solutions to the challenges faced in a large academic medical center setting.


Meaningful Use of EHRs

An EHR system allows users to record patient information electronically rather than through the use of conventional paper forms. The concept of MU refers to the use of a certified system in a meaningful way that leads to benefits such as quality improvement and care coordination.14, 15 The MU incentive program incorporates a set of objectives and core measures for healthcare providers to follow with financial incentives for achieving success. Eligible hospitals receive payment only when they meet all 14 core objectives (e.g., record demographics), five menu measures (e.g., medication reconciliation), and 15 clinical quality measures (CQMs) (e.g., ischemic stroke patient discharge on statins) as required in Stage 1 of the MU incentive program. Then, in Stage 2, eligible hospitals have to advance their system to meet 16 core measures, three menu measures, and 16 CQMs.16 One of the core measures requires healthcare providers to record demographic information including preferred language, gender, race, ethnicity, and date of birth in the EHR. For this core measure, race and ethnicity codes are determined on the basis of the federal standards published by the Office of Management and Budget (OMB).17 Race includes six categories: American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, White, and other. Ethnicity includes two categories: Hispanic or Latino, and Not Hispanic or Latino. If a patient declines to provide demographic information, or does not know his or her race and ethnicity, a notation is entered into the system.

The REAL Data Project

In addition to funding from the EHR Incentive Programs, Texas received federal funding through its Medicaid 1115 Waiver, which gives the state authority to approve pilot or demonstration projects to promote the objectives of Medicaid and managed care expansion.18 One part of Medicaid 1115 Waiver funding supports Delivery System Reform Incentive Payment (DSRIP) programs to incentivize hospitals to improve their services.19 The purpose of a DSRIP program is to support healthcare providers to develop strategies and projects that can enhance access to healthcare, increase quality of care, provide cost-effective care to patients, and improve the health of patients and families. The Center to Eliminate Health Disparities (CEHD) at the University of Texas Medical Branch (UTMB) implemented a five-year DSRIP project called Strengthening UTMB Health Information System to Reduce Health Disparities, also known as the REAL (Race, Ethnicity, and Language) Data project, in 2011.20 UTMB is an academic health center that serves racially diverse patients with a large representation of Medicaid patients. The main service area is Galveston County, which consists of Galveston Island, Bolivar Peninsula, and several cities on the mainland. UTMB serves 84 percent of Hispanic patients, 80 percent of black patients, 71 percent of white patients, and 65 percent of Asian patients or those of other races who live on Galveston Island or on Bolivar Peninsula. Forty-two percent of Galveston Island’s black population has a household income below the poverty level, and 35 percent of Galveston Island’s Hispanic population is uninsured. These data suggest that services provided by UTMB are significant to racial/ethnic minority and low-income patients.

Similar to the MU program, the REAL Data project has established several specific achievement milestones related to the collection of patient demographics: (1) to establish/modify the registration screens and written materials in the EHR to collect accurate information, and (2) to develop a training module to guide staff to collect additional demographic data to be entered in the new system. However, the main purposes behind this project are (1) to improve the UTMB health information system to report patient outcomes, diagnoses, and quality measures stratified by race, ethnicity, language, and billing/insurance status, and (2) to identify priority disparities and develop and disseminate intervention plans to address them through effective partnerships with relevant stakeholder groups. Figure 1 illustrates the role of CEHD in the UTMB and Galveston County healthcare system as a whole. CEHD endeavors to identify populations affected by health disparities through analysis of EHR data, disseminate the findings of this analysis, and collaborate with stakeholders to develop specific action plans and either community-based or hospital-based interventions to eliminate health disparities in and around Galveston County.


Improvement of REAL Data Collection

UTMB is working to develop the infrastructure within its certified Epic EHR system to achieve the requirements of MU.21 The REAL Data project started to monitor the collection of valid REAL data on October 1, 2012. During the project’s first year, the REAL Data project team and UTMB’s employee education team organized training for the staff involved in recording and maintaining patients’ records to increase their capacity to collect REAL data in a more accurate and consistent manner. The training utilized a standardized protocol and included intensive follow-up sessions. The training focused on knowledge (relevancy of REAL data, categories of race and ethnicity), skills (how to probe to obtain accurate information), and attitude (respect for the patient’s choice of whether or not to report).

After one year of implementation, the percentage of valid REAL data collected in the EHR increased from 71.7 percent of 18,577 unique patients as of October 1, 2013, to 75.9 percent of 26,611 unique patients in the selected location by September 30, 2014. Discharge claims with “Unknown” for either race or ethnicity are considered as an invalid case. In response to the results in this first phase of implementation, the project team, employee education team, and Epic management team put a warning in the EHR system that automatically reminds staff to ask new patients their race, ethnicity, and language or to ask existing patients for this information if it is missing. With this warning, the percentage of valid REAL data collected increased further to 84.1 percent of 27,491 unique patients as of September 30, 2015 (see Table 1). The accurate data collection will be a significant benefit to identify racial/ethnic disparities in access and quality of care.

Based on REAL data as of September 30, 2015, the most interesting finding is how patients reported their race/ethnicity selectively (see Table 1). For example, 5.9 percent of the 7,000 patients with Hispanic origin did not report their race (i.e., unknown) but less than 1 percent of the 14,798 non-Hispanic patients did not have race information (i.e., unknown). On the other hand, 19.5 percent of the 20,351 white patients did not provide information about their Hispanic origin status (i.e., unknown). This number is even higher in other groups: 24.1 percent of 5,862 black patients and 32.3 percent of 640 Asian patients did not provide information about Hispanic origin status (i.e., unknown). Potential explanations are discussed later in this article.

Challenges Faced by the REAL Data Project

Scope. Identifying race-, ethnicity-, and preferred language–based disparities in a patient population requires collection of complete and precise data from all patients. This, in turn, requires good tools to help registration staff ask the right questions, provide the right options, and enable patients to report precisely. The “Unknown” option that appears in the Epic EHR hinders the objective of obtaining complete information in REAL fields. In addition, the current options do not accommodate reports by patients who belong to more than one race. Race, ethnicity, and language are also the only predictors of healthcare disparities in the system. The addition of socioeconomic predictors is suggested in the future.

People. Similar to other healthcare systems, UTMB used a paper system for patients’ registration before adopting the EHR. After UTMB started using the Epic EHR system, paper charts were gradually converted into an electronic format that resulted in a high percentage of missed REAL data, which were not mandatory fields in the paper system. Asking patients, especially those who revisited UTMB, to report their race, ethnicity, and language always remained a challenge. Additionally, patients, on several occasions, felt uncomfortable being asked to report their race and ethnicity. Addressing this challenge requires further training of registration staff to rationalize asking these questions at the appropriate time during registration. It also requires imparting to patients knowledge about the importance of REAL data and increasing their positive attitude toward data collection and reporting for their overall benefit.

Resources. The broader scope of data we aim to collect and the higher dollar amount needed to invest in EHR improvements to better stratify outcomes by demographic factors are additional challenges. The more efficiently we want to obtain complete and valid data, the greater the effort and number of information technology (IT) personnel required. Unfortunately, both financial and human resources are insufficient. UTMB has approximately one million patient encounters per year. The workload required by the Epic EHR system is already heavy, and, in the meantime, the volume of data requests from clinicians, administrators, and researchers to achieve a wide range of missions such as the Triple Aim (improving patient experience of care, improving the health of populations, and reducing the per capita cost of healthcare)22 continues to increase. Furthermore, misunderstandings around language and terminology decrease the efficiency and effectiveness of data queries. Researchers need to become familiar with the key features of the EHR system so that communication with the IT personnel who run data queries will be more effective. Such familiarity requires training researchers, which subsequently leads to additional demand for funds.


Numerous studies and reports have identified racial and ethnic disparities in health status and outcomes of health services.23,24 However, data on race, ethnicity, and language are either not available, not complete, or not completely reliable.25 In this article, we focus on how programs such as MU and DSRIP responded to the imperative calls to action to document patients’ REAL data and how they supported healthcare systems to build capacity. This study also provided the results of data collection, success and challenges confronted in the process, and implications for other healthcare systems and providers.

A previous study suggested that self-reported race, ethnicity, and language data are more accurate than observation by the registration staff.26 Nevertheless, the project team received feedback from registration staff regarding challenges such as the fact that some patients are not comfortable reporting this information. In addition, one unique challenge in our system is whether the categories of race and ethnicity reflect the reality of demographics of our patient populations. For example, patients of Hispanic ethnic origin tend not to report their race. It might be due to their insufficient understanding about different definitions of race and ethnicity. Likewise, registration staff may not document race information if patients are easily identified as black or Asian through observation. This leads to the discussion about whether a two-question format (race and ethnicity as two separate questions) or a one-question format (race and ethnicity as one combined question) will better capture missing or unknown data. Kawachi and colleagues pointed out three interpretations of racial disparities in health: race could be interpreted as a biological factor, a proxy for class, or an interactive effect of both class and race.27 In this regard, the Institute of Medicine suggests that healthcare providers identify the best approach to ensure the ability to collect, report, and use data within context.28


On the basis of the studies conducted in other healthcare systems29–31 and our internal study group discussions, we propose the following recommendations to further augment the REAL data collection in the EHR system to address patient healthcare disparities. This new plan is referred to as the A-B-C plan, which stands for (A) adjust the EHR system, (B) build awareness among all professionals and patients, and (C) collaborate and share lessons learned with other health systems. Each step is described below.

A: Adjust the EHR system. For the race question, “Unknown” should be replaced with a “Refused/Don’t know” response option. The system should also allow for the reporting of patients belonging to multiple races. For the ethnicity question, “Unknown” should also be replaced with a “Refused/Don’t know” response option. These three adjustments would provide better capacity for racial/ethnic disparities analyses, which further facilitates informed clinical and health policy decisions. Developing an equity dashboard to display disparity-specific measures, such as cancer incidence rates across diverse racial/ethnic groups, will be the next step, followed by development of interventions tailored to specific racial/ethnic groups.32 Such measurements can enhance the accountability of the institution to provide equitable care to its served population. Analyzing EHR data and disseminating aggregated results stratified by patient demographics via an equity dashboard will help track progress toward providing equitable care over time and generate timely feedback to both administrative leaders and clinicians.

B: Build awareness among all professionals and patients. Education/training for staff has been indicated as an effective strategy for better data collection in earlier studies.33, 34 In addition, a warning in the EHR has brought to the attention of registration staff the need to input a patient’s race/ethnicity if the status is “unknown.” Another effective strategy for improved REAL data collection is to develop collaborations across different departments, such as employee education teams and health information teams. The key concepts of MU, EHRs, and collection of REAL data need to be well communicated to raise awareness among all professionals and patients. For professionals, we recommend developing a script that outlines the rationale for data collection, and incorporating the key concepts into continual training programs for all UTMB employees. For patients, we suggest developing simplified (i.e., in plain language) flyers that would be distributed to patients in waiting rooms to create awareness about the need for collecting REAL data and to make them more comfortable when registration staff ask them their race and ethnicity.

C: Collaborate and share lessons learned with other health systems. Both DSRIP and MU programs provide a unique opportunity for collaborative learning among healthcare providers. Healthcare providers regularly report on challenges they face and solutions they innovate. This sharing would enable the programs’ beneficiaries to identify useful approaches from a growing list of solutions for potential challenges. These programs also encourage providers to convene meetings in which different providers present and discuss the lessons learned in their practices.

The A-B-C plan is anticipated to help define a new sustainable environment in which an academic medical center would create a virtuous cycle of healthcare delivery transformation (see Figure 2). Accurate data enhances appropriate analyses; right analysis supports right treatments; right treatments bring right payments; and right payment supports right data collection. Within this new environment, the contribution of each section brings out the best in the next section and ultimately brings the benefits back to the starting point.


The EHR Incentive Programs have successfully encouraged providers to demonstrate meaningful use of certified EHRs, such as by measuring quality and quantity of services. Furthermore, the incentives of the Medicaid 1115 Waiver DSRIP projects have offered additional resources for providers to strengthen the EHR to better capture accurate and complete demographic information. To address scope, personnel, and resource issues, we recommend the A-B-C plan: adjusting the EHR system, building awareness among all professionals and patients, and collaborating and sharing lessons learned with other health systems. These solutions are based on the experiences of the UTMB REAL Data project. With the rise of hospital performance reporting and benchmarking, the time is ripe to consider disparities as one of a hospital’s key performance measurements. More efforts to facilitate REAL data collection and its meaningful use to address racial, ethnic, and language disparities are highly recommended.



We thank the University of Texas Medical Branch Office of Health Policy and Legislative Affairs; Office of the President; Waiver Operations; Clinical Data Management; Oliver Center for Patient Safety and Quality Healthcare; Quality, Safety and Clinical Information Office; Admitting & Registration Services; and Ambulatory Training & Development for their support.


Competing Interest

The authors have no financial interest or relationship with a for-profit company to declare.


Wei-Chen Lee, PhD, is a health disparities analyst in the Center to Eliminate Health Disparities at the University of Texas Medical Branch in Galveston, TX.

Sreenivas P. Veeranki, MBBS, DrPH, is an assistant professor in the Department of Preventive Medicine and Community Health at the University of Texas Medical Branch in Galveston, TX.

Hani Serag, MD, is a health system research fellow in the Center to Eliminate Health Disparities at the University of Texas Medical Branch in Galveston, TX.

Karl Eschbach, PhD, is a professor in the Department of Internal Medicine at the University of Texas Medical Branch in Galveston, TX.

Kenneth D. Smith, PhD, is the interim director of the Center to Eliminate Health Disparities at the University of Texas Medical Branch in Galveston, TX.


  1. gov. “Why Adopt EHRs?” 2014. Available at
  2. gov. “Benefits of Electronic Health Records.” 2014. Available at
  3. Hoyt, R. “Benefits of Switching to an Electronic Health Record.” 2014. Available at
  4. Practice Fusion Blog. “Healthcare Disparities and Electronic Health Records (II).” April 28, 2010. Available at
  5. gov. “Section 1115 Demonstrations.” 2016. Available at
  6. Health Research & Educational Trust. Reducing Health Care Disparities: Collection and Use of Race, Ethnicity and Language Data. 2013. Available at
  7. Keller, M. E., S. E. Kelling, D. C. Cornelius, H. A. Oni, and D. R. Bright. “Enhancing Practice Efficiency and Patient Care by Sharing Electronic Health Records.” Perspectives in Health Information Management (Fall 2015): 1–18.
  8. Rizer, M. K., B. Kaufman, C. J. Sieck, J. L. Hefner, and A. S. McAlearney. “Top 10 Lessons Learned from Electronic Medical Record Implementation in a Large Academic Medical Center.” Perspectives in Health Information Management (Summer 2015): 1–9.
  9. Robert Wood Johnson Foundation. “Using Data to Reduce Disparities and Improve Quality: A Guide to Health Care Organizations.” 2014. Available at–a-guide-fo.html.
  10. Consumer Partnership for eHealth. Leveraging Meaningful Use to Reduce Health Disparities: An Action Plan. 2013. Available at
  11. Unger, M. D., A. M. Aldrich, J. L. Hefner, and M. K. Rizer. “A Journey through Meaningful Use at a Large Academic Medical Center: Lessons of Leadership, Administration, and Technical Implementation.” Perspectives in Health Information Management (Fall 2014): 1–12.
  12. Bowens, F. M., P. A. Frye, and W. A. Jones. “Health Information Technology: Integration of Clinical Workflow into Meaningful Use of Electronic Health Records.” Perspectives in Health Information Management (Fall 2010): 1–18.
  13. Carroll, M., T. Cullen, S. Ferguson, N. Hogge, M. Horton, and J. Kokesh. “Innovation in Indian Healthcare: Using Health Information Technology to Achieve Health Equity for American Indian and Alaska Native Populations.” Perspectives in Health Information Management (Winter 2011): 1–9.
  14. Blumenthal, D., and M. Tavenner. “The Meaningful Use Regulation for Electronic Health Records.” New England Journal of Medicine 363 (2010): 501–4.
  15. Centers for Disease Control and Prevention. “Meaningful Use.” 2016. Available at
  16. gov. “Step 5: Achieve Meaningful Use Stage 1.” 2014. Available at
  17. Office of Management and Budget. “Revisions to the Standards for the Classification of Federal Data on Race and Ethnicity.” Federal Register October 30, 1997. Available at
  18. Texas Health and Human Services Commission. “Waiver Overview and Background Resources.” 2011. Available at
  19. University of Texas Medical Branch. “Project Details.” 2016. Available at:
  20. University of Texas Medical Branch. “Medical Records.” 2016. Available at:
  21. Institute for Healthcare Improvement. “The IHI Triple Aim.” 2015. Available at
  22. Smedley, B. D., A. Y. Stith, and A. R. Nelson. Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care. Washington, DC: National Academies Press, 2003.
  23. Centers for Disease Control and Prevention. “CDC Health Disparities & Inequalities Report.” 2013. Available at
  24. Ver Ploeg, M., and E. Perrin. Eliminating Health Disparities: Measurement and Data Needs. National Research Council (US) Panel on DHHS Collection of Race and Ethnic Data. Washington, DC: National Academies Press, 2004.
  25. Hasnain-Wynia, R., and D. W. Baker. “Obtaining Data on Patient Race, Ethnicity, and Primary Language in Health Care Organizations: Current Challenges and Proposed Solutions.” Health Services Research 41, no. 4, pt. 1 (2006): 1501–18.
  26. Kawachi, I., N. Daniels, and D. E. Robinson. “Health Disparities by Race and Class: Why Both Matter.” Health Affairs 24, no. 2 (2005): 343–52.
  27. Ulmer, C., B. McFadden, and D. Nerenz. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement. Washington, DC: National Academies Press, 2009.
  28. Baker, D. W., R. Hasnain-Wynia, N. R. Kandula, J. A. Thompson, and E. R. Brown. “Attitudes toward Health Care Providers, Collecting Information about Patients’ Race, Ethnicity, and Language.” Medical Care 45, no. 11 (2007): 1034–42.
  29. Gazmararian, J., R. Carreon, N. Olson, and B. Lardy. “Exploring Health Plan Perspectives in Collecting and Using Data on Race, Ethnicity, and Language.” American Journal of Managed Care 18, no. 7 (2012): e254–e261.
  30. Thorlby, R., S. Jorgensen, B. Siegel, and J. Z. Ayanian. “How Health Care Organizations Are Using Data on Patients’ Race and Ethnicity to Improve Quality of Care.” Milbank Quarterly 89, no. 2 (2011): 226–55.
  31. Santiam Hospital. “Community Health Assessment: Disparities Dashboard.” 2015. Available at
  32. Hasnain-Wynia, R., and D. W. Baker. “Obtaining Data on Patient Race, Ethnicity, and Primary Language in Health Care Organizations: Current Challenges and Proposed Solutions.”
  33. Ulmer, C., B. McFadden, and D. Nerenz. Race, Ethnicity, and Language Data: Standardization for Health Care Quality Improvement.

Printer friendly version.

Wei-Chen Lee, PhD; Sreenivas P. Veeranki, MBBS, DrPH; Hani Serag, MD; Karl Eschbach, PhD; and Kenneth D. Smith, PhD. “Improving the Collection of Race, Ethnicity, and Language Data to Reduce Healthcare Disparities: A Case Study from an Academic Medical Center.” Perspectives in Health Information Management (Fall 2016): 1-11.

Leave a Reply