Syed Ahmad Chan Bukhari

Assistant Professor
With the advancement of high-throughput technologies, data in life sciences and healthcare is growing exponentially. Curating such large amounts of data will be beyond the capability of the traditional methods of data annotation, management, and retrieval. Since the proficient curation of scientific data improve our ability to make biomedical data and other digital research assets findable, accessible, interoperable and reusable (FAIR) and helps assure reproducibility of research. Therefore, there is now, more than ever, a need to research and develop new techniques, and infrastructures to leverage vast amounts of data effectively. My research focuses on devising techniques to semantically annotate and federate heterogeneous biomedical data to derive meaningful information and to improve the experimental reproducibility. These techniques leverage the semantic web (ontologies), NLP (Natural Language Processing), machine learning, automated logical reasoning and advanced data science technologies. These techniques further alleviate many data access-related challenges faced by biologists and clinicians, including data fragmentation, the necessity to combine data with computation, the need for usage of declarative knowledge in querying, and difficulties in accessing data for non-technical users. Secondary analysis and meta-analyses, which combine independent studies, could enhance reproducibility and facilitate new scientific discoveries provided that the data are shared as FAIR. In this context, I have developed CAIRR and CEDAR-to-NCBI pipelines for the standardized authoring, validating and submission of scientific data to the NCBI repositories (BioProject, BioSample, Sequence Read Archive (SRA) and GenBank). CEDAR OnDemand is example of repository agnostic approach which I have developed to facilitate the process of FAIR scientific data creation. In addition to my FAIR infrastructure development efforts, I am actively involved in the community-driven data standardization efforts. For example, my colleagues and I have developed MiAIRR (Minimal Information about Adaptive Immune Receptor Repertoire), a standard for the FAIR sharing of the Adaptive Immune Receptor Repertoire experimental data. As part of Human Immunology Project Consortium (HIPC), we have developed data sharing standards and templates for the immunology community.

Research Interests

Biomedical and Health informatics, Semantic Web, Patient Centered Outcomes Research Machine Learning, Information Retrieval

Courses Taught