This article provides an overview of the field of bioinformatics and its implications for the various participants. Next-generation issues facing developers (programmers), users (molecular biologists), and the general public (patients) who would benefit from the potential applications are identified. The goal is to create awareness and debate on the opportunities (such as career paths) and the challenges such as privacy that arise.
A triad model of the participants’ roles and responsibilities is presented along with the identification of the challenges and possible solutions.
“Bioinformatics” is defined by the National Institutes of Health as the “research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.”1 The exponential growth in the amount of such data has necessitated the use of computers for information cataloging and retrieval, while a more global perspective in the quest for new insights into health and disease and the resulting data mining also underscore the need for bioinformatics. 2
Recently, the field of bioinformatics has experienced rapid growth. However, as with other young disciplines, it now faces a host of critical issues. Successfully addressing these key issues is essential to further progress in the field.
The Triad Model
Computing professionals, including developers, programmers, consultants, and vendors, should be concerned with building and testing robust applications and performance issues such as correctness of data, reliability, and real-time processing, and integration and management of data deployed to serve multiple purposes simultaneously.
Roles and Responsibilities
The intersecting area in Figure 1 depicts overlapping roles and responsibilities of participants in the application of the triad model. For example, the public should decide what can and should be ethical and legal. This will directly place limits on the type of research the user may perform. The user, on the other hand, can and should join this public debate. Once it has been decided what can be done, the relationship between the user and the computing professional comes into play to determine how computing technology can assist the user. This is not a static relationship, but rather a dynamic equilibrium, where the participants in the model will have to decide on the point of equilibrium at a particular point in time and in a specific social context.
- Who has the right to access and use our personal genetic data?
- Who controls the data?
- If medical records are used as a community resource, should they not be available to all research facilities within the community?
- Will the medication for a disease discovered through population genetics studies be available to the participants?
- Can anybody own pieces of our genome through patents, copyrights, and so on?
- Should genetic testing be done, and how scientifically reliable is it?
- How will other citizens perceive an individual whose genetic tests reveal a potential disease?
- Will the data lead to discrimination?
The triad model described here can be generalized for the larger field of health information management (HIM), which encompasses all aspects of the healthcare industry, including the flow of information therein. The participants would include patients, healthcare providers (including physicians, nurses, health maintenance organizations [HMOs], insurance companies, hospitals, pharmacies, and medical testing agencies), and federal programs such as Medicare. The gathering, storage, processing, and dissemination of the disparate and complex medical information generated by the overlapping interaction between these entities will result in the need to address privacy and security issues. The dynamics of the interaction and the resultant outcomes can be studied using the triad model.
Table 1 summarizes the roles and responsibilities of the participants in the triad model in greater detail. It must be noted these can also be construed as the challenges and tasks faced by the discipline as a whole. The challenges faced include gene discovery and analysis, and issues in the potential revelation of previously unknown relationships with respect to genetic structure and function. This becomes particularly critical in light of the vast amount of data being produced by the Human Genome Project.
New Challenges for Computing Professionals
The scientific community has marked a significant milestone in the study of genes, the completion of the “working draft” of the human genome. This work, which was recorded in special issues of the journals Nature and Science in 2001, heralds a new beginning for advances in the prevention, diagnosis, and treatment of many genetic and genomic disorders. The availability of this wealth of raw data has a significant effect on the field of bioinformatics, with a great deal of effort being spent on effectively and efficiently storing and accessing these data, as well as on new methods aimed at mining the data in order to make revolutionary medical discoveries. 7 These advances have generated numerous new and exciting challenges with which computing professionals will have to grapple.
An Integrative Framework
Additionally, collaborative research requires conceptualization and implementation of an integrative framework. Apart from standardization of data formats, this will require development of Web-based user interfaces, standards for access to the data and data warehousing capabilities, as well as interoperable software components. The development of a standardized, Web-based, globally distributed view is critical in the light of researchers working together across several languages and countries. A standardized interface to the multiple heterogeneous databases is an important objective for developers.
Two distinct approaches have been used for data warehousing. IBM uses a federated database, in which the data remain in the original separate sources and are accessible with a single query. The data from various sources are brought into a data warehouse, where data freshness depends on the frequency of data replication. The issue of which approach is more useful and when is yet to be determined.
Examples of data sources for a federated database or data warehouse are the three primary sequence databases: GenBank (NCBJ), Nucleotide Sequence Database (EMBC), and the DNA Databank of Japan (DDBJ). These are repositories for raw sequence data, but each entry is extensively annotated and has a features table to highlight the important prospects of each sequence. The three databases exchange data on a daily basis. 8
Interoperability among software components is a crucial goal for successful collaborative work. Object management groups (OMG) and a life sciences research domain task force’s goal to establish common object request broker architecture (CORBA) as the standard for interoperable software components offer potential. 9
Future Computing Needs
- Availability—continuous access to the distributed data warehouse and Web sites
- Security—appropriate controls for access and information assurance
- Data protection—loss of data is decidedly unacceptable, and backup is critical
- Data mobility—data need to be available to the right user, at the right time, in the right place
- Data purpose—the same data may have multiple purposes and views
- Data sharing—access to all information by all participants
- Real-time availability—data must be available at all times in a global setting15
IBM, a leading vendor in bioinformatics tools, proposes secure access to data from a growing number of increasingly diverse data sources and the ability to put that data to use quickly; simplified sharing of data and functionality among the diverse applications and tools used in different research areas; easier collaboration internally and externally to turn data into knowledge, as well as the ability to manage and share that knowledge more efficiently; secure storage and easier management of data; faster installation of new applications and integration with valuable existing systems, making research and product development more efficient; and smooth integration of outsourced functions.
Computing professionals in bioinformatics will also have to deal with many of the following public issues:
- Bioethics—The moral and ethical implications in the application of bioinformatics to genetics. For example, is the manipulation of human cells via genetic engineering contrary to the laws of nature and religion? Cloning is yet another issue.
- Intellectual property—The ownership of the human genome is probably the most critical issue. Researchers at universities where a great deal of bioinformatics research is done should clarify intellectual property issues with the university. Ownership of the successful experiments performed “in silico” (via the computer chip) is an unresolved question.
- Responsibility—Who is responsible for the results? When errors cause injury or damage, who will be responsible?
- Access—Who should have access to the data and for what use? Should law enforcement, insurance companies, HMOs, and employers have access?
- Privacy—How will privacy be protected? Who controls the information? How will conformance to laws like HIPAA be enforced?
- Standards—In terms of gene therapy, what is normal and what is a disability or disorder?
- Technology access—How will the digital divide between those who do and do not have access to expensive technologies be reconciled?
- Outsourcing—How will outsourcing affect the field? Given the sensitive nature of research in bioinformatics, what additional legal and intellectual property rights issues will develop?
These are exciting times for bioinformatics and computing, with great career opportunities in developing sophisticated computing tools, including databases and data warehouses, Web-based retrieval and query applications, search engines, analytical and data mining software, knowledge management, and storage applications. The design, implementation, and use of these tools in genomic and related research areas will keep computing professionals busy and productive for a long time. The overall impact of the Human Genome Project can be felt in the need for new types of scientific professionals willing to work in the integrative field of bioinformatics. Not only is knowledge regarding computing crucial, but so is domain knowledge in genomics and the life sciences. The challenge lies in training more individuals who are excited and willing to work in these interdisciplinary areas. Simultaneously, collaboration with other disciplines, including the life sciences and molecular biology, as well as consideration of public and social policy issues will enhance the debate on future applications.
- Available at the National Center for Biotechnology Information’s Web site at www.ncbi.nlm.nih.gov.
- Attwood, Teresa K. “Genomics. The Babel of Bioinformatics.” Science 290, no. 5491 (2000): 471-473.
- Gibson, Greg and Spencer V. Muse. A Primer on Genome Science. Sunderland, MA: Sinauer Associates, Inc., Publishers, 2002.
- Thornton, Janet M. “From Genome to Function.” Science 292, no. 5524 (2001): 2095-2097.
- Hlodan, Oksana. “For Sale: Iceland’s Genetic History.” Available at the American Institute of Biological Sciences’ Web site at www.actionbioscience.org/genomic/hlodan.html.
- Baxevanis, Andreas. D. and B. F. Francis Ouellette (Editors). Bioinformatics, 2nd edition. New York: John Wiley & Sons, Inc., 2001.
- Westhead, David R., J. Howard Parish, and Richard. M. Thyman. Bioinformatics. Oxfordshire, UK: BIOS Scientific Publishers, 2002.
- Swope, William. C. “Deep Computing for the Life Sciences.” IBM Systems Journal 40, no. 2 (2001): 248-262.
- Mount, D.W. Bioinformatics, Sequence and Gene Analysis. New York: Cold Spring Harbor Laboratory Press, 2001.
- Lesk, A. M. Introduction to Bioinformatics. Oxford, UK: Oxford University Press, 2002.
- Head-Gordon, Teresa and John C. Wooley. “Computational Challenges in Structural and Functional Genomics.” IBM Systems Journal 40, no. 2 (2001): 265-291.
- Westhead, David R., J. Howard Parish, and Richard M. Twyman. Bioinformatics.
- Graham-Rowe, Duncan. “Software Agents Could Tackle Human Genome Data Explosion.” New Scientist 179, no. 2407 (2003): 22.
- Goble, Carole A. et al. “Transparent Access to Multiple Bioinformatics Information Sources.” IBM Systems Journal 40, no. 2 (2001): 532-551.
- Sensen, Christoph W. (Editor). Essentials of Genomics and Bioinformatics. Weinheim, Germany: Wiley-VCH Verlag GmbH & Co., 2002.
- Regalado, Antonio and Leila Abboud. “New Genetics Map to Explore Links to Ailments.” The Wall Street Journal October 30, 2002, p. D4.
Article citation: Perspectives in Health Information Management 1; 9, Winter 2004