System For Preservation of Electronic Resources (SPER)
SPER is a part of the Digital Preservation Research project at Lister Hill Center’s Communications Engineering Branch. Its main objective is to help in the long term preservation of digitized or born-digital documents at the National Library of Medicine in a cost-effective way.
As a part of on-going research, SPER provides a testbed to explore and experiment with important digital preservation standards, tools and techniques. It also comprises a prototype system to perform actual preservation of digital documents in a convenient manner, using selected open source tools. An important component of SPER’s preservation function is the automated extraction of metadata from textual documents using machine learning tools, which significantly lowers the cost of metadata acquisition over manual input.
The following sections provide a description of the SPER preservation framework (also called SPER for simplicity), and its automated metadata extraction component.
Misra D, Thoma GR. Use of descriptive metadata as a knowledgebase for analyzing data in large textual collections. Proc. IS&T Archiving 2013. Washington D.C. Proc. IS&T Archiving 2013. Washington D.C. pg 193-199.
Abstract | pub6820.pdf
Misra D, Hall RH, Payne SM, Thoma GR. Digital preservation and knowledge discovery based on documents from an international health science program. Proc. 12th ACM/IEEE-CS JCDL, pg 23-26 (2012). doi: 10.1145/2232817.2232823.
Abstract | pub6819.pdf | URL: http://dl.acm.org/citation.cfm?id=2232823
Chen S, Misra D, Thoma GR. Efficient Automatic OCR Word Validation Using Word Partial Format Derivation and Language Model Document Recognition and Retrieval XVII. Proceedings of the SPIE. San Jose, CA. January 2010;7534:75340O-75340O-8
Abstract | pub2010020.pdf
Hsu W, Long LR, Antani SK. SPIRS: A Framework for Content-based Image Retrieval from Large Biomedical Databases Stud Health Technol Inform. 2007;129(Pt 1):188-92.
Abstract | pub2007042.pdf | PMID: 17911704
Misra D, Mao S, Rees J, Thoma GR. Archiving a Historic Medico-legal Collection: Automation and Workflow Customization Proc IS&T Archiving 2007. Arlington, Virginia, May 2007; 157-61
Abstract | pub2007021.pdf
Thoma GR, Mao S, Misra D, Rees J. Design of a Digital Library for Early 20th Century Medico-legal Documents Proc ECDL 2006. Eds: Gonzalo J et al. Berlin: Springer-Verlag; LNCS 4172: 147-57
Abstract | pub2006028.pdf
Demner-Fushman D, Humphrey SM, Ide NC, Loane RF, Ruch P, Ruiz ME, Smith LH, Tanabe LK, Wilbur WJ, Aronson AR. Finding Relevant Passages in Scientific Articles: Fusion of Automatic Approaches vs. an Interactive Team Effort. Proc TREC 2006, 569-76.
Abstract | pub2006072.pdf
Le DX, Thoma GR. Automatically Creating Biomedical Bibliographic Records from Printed Volumes of Old Indexes In: Callaos N, Lesso W, editors. SCI 2005. Proc 9th World Multiconference on Systemics, Cybernetics and Informatics; 2005 Jul 10-13; Vol. 3, Computer Science and Engineering. Orlando (FL): International Institute of Informatics and Systemics; c2005. 267-74
Abstract | pub2005021.pdf
Mao S, Misra D, Seamans J, Thoma GR. Design Strategies for a Prototype Electronic Preservation System for Biomedical Documents IS&T Archiving 2005 Conference, April 2005; 48-53.
Abstract | pub2005012.pdf