Le DX, Thoma GR
Proc. 8th World Multiconference on Systemics, Cybernetics and Informatics. 2004 Jul.;5:462-6.
As part of research into Web-based document analysis including Web page downloading and classification, an algorithm has been developed to automatically identify article links in Web-based online journals. This algorithm is based on feature vectors calculated from attributes and contents of links extracted from HTML files, and an instance-based learning algorithm using a nearest neighbor methodology to identify article links. The performance of the algorithm has been evaluated using a sample size of several thousand HTML links of Web-based medical journals. Evaluation shows that the algorithm is capable of identifying article links at an accuracy greater than 99 percent.