- About Us
- Education & Advisement
- Our Research
- Our People
- News & Events
- Log in
Cathy H. Wu, Ph.D.
Systems integration is becoming the driving force for the 21st century biology. Researchers are systematically tackling gene functions and complex regulatory processes by studying organisms at different levels of organization, from genomes, transcriptomes and proteomes to metabolomes and interactomes. To fully realize the value of such high-throughput data requires advanced bioinformatics for integration, mining, comparative analysis, and functional interpretation. My group conducts bioinformatics and computational biology research and has developed a bioinformatics resource at the Protein Information Resource with integrated databases and analytical tools to support genomics, proteomics and systems biology research [Wu et al., 2003]. PIR is a member of the UniProt Consortium to provide the central international resource on protein sequence and function [Wu et al., 2006]. The PIR web site and the UniProt web site at PIR are accessible by researchers worldwide with over 4 million hits per month from over 100,000 unique sites.
Our research encompasses protein evolution-structure-function relationships, biological text mining, protein ontology, proteomic bioinformatics, computational systems biology, and bioinformatics cyberinfrastructure. The protein-centric bioinformatics framework we are developing connects data mining, text mining and ontology for functional analysis of genes and proteins in the systems biology context. The integrative approach reveals hidden relationships among the various components of the biological systems, allows researchers to ask complex biological questions and gain better understanding of disease processes, and facilitates target discovery. We will further establish a new Center for Bioinformatics and Computational Biology at University of Delaware to foster collaborative interdisciplinary research and to offer graduate degree programs in Bioinformatics and Computational Biology to train the next generation of researchers and educators.
- Protein family classification, functional annotation, and structure-function analysis - As a central approach to protein annotation for the UniProt Knowledgebase, we employ a classification-driven rule-based method. The PIRSF system classifies proteins from superfamily to subfamily levels to reflect evolutionary relationship of proteins and their domain architecture, allowing comparative studies of protein function and evolution [Wu et al., 2004; Nikolskaya et al., 2006]. Coupling with manually curated, structure-guided rules, the system supports the standardization and accurate annotation of protein names, functions, and functional sites [Wu et al., 2006]. The systematic approach provides high-quality functional annotation, while keeping pace with the exponential growth of molecular sequence data.
- Biological text mining - With an ever-increasing volume of scientific literature now available electronically, we have been collaborating with several Natural Language Processing research groups to develop algorithms for text mining and information extraction [Hirschman et al., 2002]. Several projects have led to tools directly accessible from the iProLINK text mining resource [Hu et al., 2004], including the BioThesaurus of gene/protein names that allows the identification of synonymous and ambiguous names [Liu et al., 2006] and the RLIMS-P text mining system to extract phosphorylation information (kinase, protein substrate, and phosphorylation sites) from Medline abstracts [Hu et al., 2005]. We plan to develop a “configurable, intelligent and integrated” text mining system as the link bridging PubMed and databases for knowledge discovery. We co-organize the BioCreative Challenge Evaluations, bringing together both the text mining and biological research communities to evaluate and guide the future development of text mining systems.
- Biomedical ontology - As biomedical ontologies emerged as critical tools in biological research for semantic integration of complex data in disparate resources, we have developed a Protein Ontology (PRO) in the OBO (Open Biomedical Ontologies) Foundry framework [Natale et al., 2007]. Extending from the evolutionary relationships of protein classes to the representation of multiple protein forms of genes (e.g., isoforms, post-translational modifications), PRO allows precise definition of protein objects in biological context (e.g., pathways, networks, complexes) and specification of relationships with other ontologies (such as Gene Ontology) [Arighi et al., 2009]. The project aims to capture knowledge representation of protein biology embedded in the scientific literature to facilitate pathway, network and disease modeling.
- Omics data integration and pathway/network analysis - Designed for data integration in a distributed environment, the iProClass database provides rich protein annotation with data from over 100 molecular databases [Wu et al., 2004]. It is also the underlying data warehouse for gene/protein ID and name mapping. Built upon iProClass and UniProt, we have developed the iProXpress system for functional profiling and pathway analysis of large-scale gene expression and proteomic data [Huang et al., 2007]. iProXpress has been applied to several studies, including proteomic profiling of melanosomes and lysosome-related organelle proteomes, identification of signaling pathways and networks underlying estrogen-induced apoptosis of breast cancer cells, and analysis of cellular pathways in radiation-resistant cells [Chi et al., 2006; Hu et al., 2007; 2008]. As part of the NIAID biodefense proteomics program, we have integrated various omics data on pathogens and their hosts, allowing biologists to query and analyze data from multiple disparate proteomic centers about pathogen-host relationships. We have conducted integrative bioinformatics analysis of protein structure, function and evolution to identify potential targets for hemorrhagic viruses [Mazumder et al., 2007]. We plan to further develop network mining, visualization and prediction methods, and coupling with the integrative bioinformatics approach, to facilitate data-driven hypothesis generation.
- Cecilia Arighi, Ph.D. - Research Assistant Professor and Senior Bioinformatics Scientist (Ph.D., University of Buenos Aires, Argentina). Protein functional annotation, biological text mining and protein ontology.
- Chuming Chen, Ph.D. - Research Assistant Professor and Senior Bioinformatics Scientist (Ph.D., University of South Carolina). Bioinformatics system development and proteomic bioinformatics.
- Hongzhan Huang, Ph.D. - Research Associate Professor and Bioinformatics Team Lead (Ph.D., University of California, Davis). Bioinformatics infrastructure development, omics data integration and network analysis.
- Natalia Petrova, Ph.D. - Research Assistant Professor and Senior Bioinformatics Scientist (Ph.D., Georgetown University). Protein evolution-structure-function analysis and rule-based system development.
- Yongxing Chen, M.S. - Senior Bioinformatics Programmer (M.S., University of Northern Virginia). Database design and system development.
- Alvaro Gonzalez, M.S. - Ph.D. Graduate Student (M.S., University of Delaware). Biological network visualization and modeling.
- Jules Nchoutmboube, M.S. - Bioinformatics Research Assistant (M.S., Georgetown University). Protein ontology, biological text mining and literature-based curation.
- Amy Siu, M.E. - Ph.D. Graduate Student (M.E., Cornell University). Biological text mining, natural language processing.
- Nisha Subramanian, B.Tech. - M.S. Graduate Student (B.Tech., Dr. M.G.R. Engineering College, India). Protein annotation rule system development.
- Wu CH, Chen CM, eds. Bioinformatics for Comparative Proteomics. Humana Press; 2010. Methods in Molecular Biology.
- Arighi CN, Liu H, Natale DA, et al. TGF-beta signaling proteins and the protein ontology. BMC Bioinformatics. 2009;10(Suppl 5):S3.
- Hu ZZ, Huang H, Cheema A, Jung M, Dritschilo A, Wu CH. Integrated Bioinformatics for Radiation-Induced Pathway Analysis from Proteomics and Microarray Data. J Proteomics Bioinform. 2008;1(2):47–60.
- Hu ZZ, Valencia JC, Huang H, et al. Comparative Bioinformatics Analyses and Profiling of Lysosome-Related Organelle Proteomes. Int J Mass Spectrom. 2007;259(1-3):147–160.
- Huang H, Hu Z-Z, Arighi CN, Wu CH. Integration of bioinformatics resources for functional analysis of gene expression and proteomic data. Front Biosci. 2007;12:5071–5088.
- Mazumder R, Hu Z-Z, Vinayaka CR, et al. Computational analysis and identification of amino acid sites in dengue E proteins relevant to development of diagnostics and vaccines. Virus Genes. 2007;35(2):175–186.
- Qiu P, Wang ZJ, Liu KJR, Hu Z-Z, Wu CH. Dependence network modeling for biomarker identification. Bioinformatics. 2007;23(2):198–206.
- Chi A, Valencia JC, Hu Z-Z, et al. Proteomic and bioinformatic characterization of the biogenesis and function of melanosomes. J Proteome Res. 2006;5(11):3135–3144.
- Liu H, Hu Z-Z, Zhang J, Wu C. BioThesaurus: a web-based thesaurus of protein and gene names. Bioinformatics. 2006;22(1):103–105.
- Nikolskaya AN, Arighi C, Huang H, Barker WC, Wu CH. PIRSF family classification system for protein functional and evolutionary analysis. Evol Bioinform Online. 2006;2:209–221.
- Petrova NV, Wu CH. Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties. BMC Bioinformatics. 2006;7:312.
- Wu CH, Apweiler R, Bairoch A, et al. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 2006;34(Database issue):D187–91.
- Hu ZZ, Narayanaswamy M, Ravikumar KE, Vijay-Shanker K, Wu CH. Literature mining and database annotation of protein phosphorylation using a rule-based system. Bioinformatics. 2005;21(11):2759–2765.
- Hu Z-Z, Mani I, Hermoso V, Liu H, Wu CH. iProLINK: an integrated protein resource for literature mining. Comput Biol Chem. 2004;28(5-6):409–416.
- Wu CH, Huang H, Nikolskaya A, Hu Z, Barker WC. The iProClass integrated database for protein functional analysis. Comput Biol Chem. 2004;28(1):87–96.
- Wu CH, Nikolskaya A, Huang H, et al. PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Res. 2004;32(Database issue):D112–4.
- Wang J, Wu CH, Wang P, eds. Computational Biology and Genome Informatics. World Scientific; 2003.
- Wu CH, Yeh L-SL, Huang H, et al. The Protein Information Resource. Nucleic Acids Res. 2003;31(1):345–347.
- Hirschman L, Park JC, Tsujii J, Wong L, Wu CH. Accomplishments and challenges in literature data mining for biology. Bioinformatics. 2002;18(12):1553–1561.
- Wu CH, McLarty J. Neural Networks and Genome Informatics. Elsevier Science; 2000. Methods in Computational Biology and Biochemistry 1.
- Wu CH. Gene classification artificial neural system. Methods Enzymol. 1996;266:71–88.
- Wu CH, Zhao S, Chen HL. A protein class database organized with ProSite protein groups and PIR superfamilies. J Comput Biol. 1996;3(4):547–561.
Edward G. Jefferson Professor of Bioinformatics & Computational Biology
Professor, Department of Computer & Information Sciences
Affiliated Faculty, Delaware Biotechnology Institute
Director, Protein Information Resource
Adjunct Professor, Georgetown University Medical Center
Phone: (302) 831-8869
Fax: (302) 831-4841
Office: 205 DBI
Delaware Biotechnology Institute
15 Innovation Way, Suite 205
University of Delaware
Newark, DE 19711
- B.S. - National Taiwan University (Taiwan)
- Ph.D. - Purdue University
- Postdoctoral - Michigan State University