System For Preservation of Electronic Resources (SPER)

Overview

SPER is a part of the Digital Preservation Research project at Lister Hill Center’s Communications Engineering Branch. Its main objective is to help in the long term preservation of digitized or born-digital documents at the National Library of Medicine in a cost-effective way.

As a part of on-going research, SPER provides a testbed to explore and experiment with important digital preservation standards, tools and techniques. It also comprises a prototype system to perform actual preservation of digital documents in a convenient manner, using selected open source tools. An important component of SPER’s preservation function is the automated extraction of metadata from textual documents using machine learning tools, which significantly lowers the cost of metadata acquisition over manual input.

The following sections provide a description of the SPER preservation framework (also called SPER for simplicity), and its automated metadata extraction component.

Publications/Tools

Pearson G, Gill MJ. An Evaluation of Motion JPEG 2000 for Video Archiving. Proc. Archiving 2005. Washington, D.C. April 2005:237-43.
Abstract | PDF

Misra D, Thoma GR. Use of descriptive metadata as a knowledgebase for analyzing data in large textual collections. Proc. IS&T Archiving 2013. Washington D.C. Proc. IS&T Archiving 2013. Washington D.C. pg 193-199.
Abstract | PDF

Misra D, Hall RH, Payne SM, Thoma GR. Digital preservation and knowledge discovery based on documents from an international health science program. Proc. 12th ACM/IEEE-CS JCDL, pg 23-26 (2012). doi: 10.1145/2232817.2232823.
Abstract | PDF | URL: http://dl.acm.org/citation.cfm?id=2232823

Lingappa G, Thoma GR, Antani SK. Web Interface: MyMorph
Abstract

Pearson G. Methods to Store Metadata within Motion JPEG 2000 Files. Technical Report Preprint. May 2005.
Abstract | PDF

Misra D, Mao S, Rees J, Thoma GR. Archiving a Historic Medico-legal Collection: Automation and Workflow Customization Proc IS&T Archiving 2007. Arlington, Virginia, May 2007; 157-61
Abstract | PDF

Misra D, Seamans J, Thoma GR. Testing the Scalability of a DSpace-based Archive Proc. IS&T Archiving 2008. Bern, Switzerland. June 2008:36-40
Abstract | PDF

Mao S, Misra D, Seamans J, Thoma GR. Design Strategies for a Prototype Electronic Preservation System for Biomedical Documents IS&T Archiving 2005 Conference, April 2005; 48-53.
Abstract | PDF

Chen S, Misra D, Thoma GR. Efficient Automatic OCR Word Validation Using Word Partial Format Derivation and Language Model Document Recognition and Retrieval XVII. Proceedings of the SPIE. San Jose, CA. January 2010;7534:75340O-75340O-8
Abstract | PDF

Walker FL, Thoma GR. A Web-Based Paradigm for File Migration Proc. of IS and T’s Archiving Conference. 2004 April.
Abstract | PDF

Demner-Fushman D, Humphrey SM, Ide NC, Loane RF, Ruch P, Ruiz ME, Smith LH, Tanabe LK, Wilbur WJ, Aronson AR. Finding Relevant Passages in Scientific Articles: Fusion of Automatic Approaches vs. an Interactive Team Effort. Proc TREC 2006, 569-76.
Abstract | PDF

Demner-Fushman D, Lin J. Answering Clinical Questions with Knowledge-based and Statistical Techniques Computational Linguistics. 2007 Jan;33(1):63-103
Abstract | PDF

Hsu W, Long LR, Antani SK. SPIRS: A Framework for Content-based Image Retrieval from Large Biomedical Databases Stud Health Technol Inform. 2007;129(Pt 1):188-92.
Abstract | PDF | PMID: 17911704

Thoma GR, Mao S, Misra D. Automated Metadata Extraction to Preserve the Digital Contents of Biomedical Collections Proc VIIP 2005. September 2005. Benidorm, Spain; 214-19
Abstract | PDF

Thoma GR, Mao S, Misra D, Rees J. Design of a Digital Library for Early 20th Century Medico-legal Documents Proc ECDL 2006. Eds: Gonzalo J et al. Berlin: Springer-Verlag; LNCS 4172: 147-57
Abstract | PDF

Le DX, Thoma GR. Automatically Creating Biomedical Bibliographic Records from Printed Volumes of Old Indexes In: Callaos N, Lesso W, editors. SCI 2005. Proc 9th World Multiconference on Systemics, Cybernetics and Informatics; 2005 Jul 10-13; Vol. 3, Computer Science and Engineering. Orlando (FL): International Institute of Informatics and Systemics; c2005. 267-74
Abstract | PDF