Kim J, Le DX, Thoma GR
In: Callaos N, Lesso W, editors. SCI 2005. Proc 9th World Multiconference on Systemics, Cybernetics and Informatics; 2005 Jul 10-13; Vol. 4; Orlando (FL): International Institute of Informatics and Systemics; c2005. 406-11
An automated labeling (AL) module has been developed to automate the extraction of bibliographic data (e.g., article title, authors, affiliation, abstract, and others) from online biomedical journals for the National Library of Medicine’s MEDLINE database. The AL module employs string matching, statistics, and fuzzy rule-based algorithms to identify segmented zones in an article’s HTML pages as specific bibliographic data. Experiments conducted with 1,267 medical articles from 64 journal issues show about 97.71% accuracy.