Quantitative Assessment of an International Health Science Program by Processing Research Publications
Brown Bag Lecture by Dr. Dharitri Misra | 4/3/2012 11AM-12PM | 7th Floor Conference Room, Bldg 38A
Abstract: Important biomedical information is often recorded in unstructured or semi-structured text from which it is difficult to retrieve and use meaningful information cost-effectively. Nevertheless, with suitable techniques and tools, it is feasible to identify and extract relevant, domain-specific metadata from the contents of the documents, and then use these to search for specific patterns, trends and events meaningful to a target community.
In this presentation, we discuss one such case: a biomedical collection held by the NIAID, consisting of a 50-year archive of research publications from the Joint Cholera Panels of the U.S.-Japan Cooperative Medical Science Program (CMSP), a collaboration by more than 60 countries in finding a cure for cholera. The major goal of processing this collection was to programmatically analyze its contents, and retrieve relevant data so as to quantitatively assess the effectiveness of the program along with the impact of related funding and policy issues.
We show how SPER, a customizable digital preservation R&D system developed at CEB, was effectively used to meet this goal. We discuss the techniques used for automated extraction of metadata from different types of CMSP documents, and show how different metadata elements were post-processed, combined and analyzed to yield the required data. Finally we present some quantitative results and discuss how such results would be used by NIAID in evaluating the CMSP program.
Bio: Dr. Dharitri Misra is a Staff Scientist at the Communications Engineering Branch. She received her M.S. and Ph.D. degrees in Physics from the University of Maryland. Dr. Misra is working on the Digital Preservation Research project at CEB to develop techniques and tools for preservation and access of biomedical collections. Her current tasks focus on automated metadata extraction, information retrieval and knowledge discovery from large text-based datasets.