Place-Based Information Systems: Textual Location Identification and Visualization
Brown Bag Lecture by Prof. Hanan Samet | 5/22/2012 11AM-12PM | 7th Floor Conference Room, Bldg 38A
Abstract: The popularity of web-based mapping services such as Google Earth/Maps and Microsoft Virtual Earth (Bing), has led to an increasing awareness of the importance of location data and its incorporation into both web-based search applications and the databases that support them, In the past, attention to location data had been primarily limited to geographic information systems (GIS), where locations correspond to spatial objects and are usually specified geometrically. However, in the web-based applications, the location data often corresponds to place names and is usually specified textually. An advantage of such a specification is that the same specification can be used regardless of whether the place name is to be interpreted as a point or a region. Thus the place name acts as a polymorphic data type in the parlance of programming languages. However, its drawback is that it is ambiguous. In particular, a given specification may have several interpretations, not all of which are names of places. For example, “Jordan” may refer to both a person as well as a place. Moreover, there is additional ambiguity when the specification has a place name interpretation. For example, “Jordan” can refer to a river or a country while there are a number of cities named “London”. In this talk we examine the extension of GIS concepts to textually specified location data and review search engines that we have developed to retrieve documents where the similarity criterion is not based solely on exact match of elements of the query string but instead also based on spatial proximity. Thus we want to take advantage of spatial synonyms so that, for example, a query seeking a rock concert in Beverly Hills would be satisfied by a result finding a rock concert in Hollywood or Santa Monica. This idea has been applied by us to develop the STEWARD (Spatio-Textual Extraction on the Web Aiding Retrieval of Documents) system for finding documents on website of the Department of Housing and Urban Development. This system relies on the presence of a document tagger that automatically identifies spatial references in text, pdf, word, and other unstructured documents. The thesaurus for the document tagger is a collection of publicly available data sets forming a gazetteer containing the names of places in the world. Search results are ranked according to the extent to which they satisfy the query, which is determined in part by the prevalent spatial entities that are present in the document. The same ideas have also been adapted by us to collections of news articles as well as Twitter tweets resulting in the NewsStand and TwitterStand systems, respectively, which will be demonstrated along with the STEWARD system in conjunction with a discussion of some of the underlying issues that arose and the techniques used in their implementation. Future work involves applying these ideas to spreadsheet data.
Bio: Hanan Samet (http://www.cs.umd.edu/~hjs/) received the B.S. degree in engineering from UCLA, and the M.S. Degree in operations research and the M.S. and Ph.D. degrees in computer science from Stanford University. At Stanford, he was a member of the Stanford Artificial Intelligence Lab where he was one of the developers of the SAIL programming language compiler. His doctoral dissertation dealt with proving the correctness of translations of LISP programs which was the first work in translation validation.
In 1975 he joined the Computer Science Department at the University of Maryland, College Park, where he is a Professor. He is a member of the Computer Vision Laboratory and leads a number of research projects on the use of hierarchical data structures for geographic information systems, computer graphics, image processing, and search. His research group has developed the QUILT system which is a GIS based on hierarchical spatial data structures such as quadtrees and octrees, the SAND system which integrates spatial and non-spatial data, the SAND Browser (http://www.cs.umd.edu/~brabec/sandjava) which enables browsing through a spatial database using a graphical user interface, the VASCO spatial indexing applet (found at http://www.cs.umd.edu/~hjs/quadtree/index.html), the MARCO system for map retrieval by content which consists of a sophisticated pictorial query specification method, the STEWARD system for identifying the geographic focus of documents thereby facilitating the performance of spatio-textual search to enable searches that rank the results by spatial proximity rather than by exact match, and the NewsStand and TwitterStand systems that apply these ideas to a database of news articles and Tweets, respectively, that are continuously updated and that enable them to be accessed using a map query interface.
He is on the editorial boards of GeoInformatica, Journal of Visual Languages and Computing, and Image Understanding. He is the founding chair of the ACM SIG on Spatial Information. He has served as the co-general chair of the 2007 and 2008 ACM SIGSPATIAL Conference on Geographic Information Systems (ACM GIS). He has also served on the program committees of many conferences, symposia, and workshops.
His research interests include data structures, computer graphics, geographic information systems, computer vision, robotics, database management systems, and programming languages, and is the author of over 300 publications on these topics. He is the author of the recent book titled “Foundations of Multidimensional and Metric Data Structures” (http://www.cs.umd.edu/~hjs/multidimensional-book-flyer.pdf) published by Morgan-Kaufmann, an imprint of Elsevier, in 2006, an award winner in the 2006 best book in Computer and Information Science competition of the Professional and Scholarly Publishers (PSP) Group of the American Publishers Association (AAP), and of the first two books on spatial data structures titled “Design and Analysis of Spatial Data Structures”, and Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS”, both published by Addison-Wesley in 1990.
He is a Fellow of the IEEE, ACM, AAAS, and IAPR (International Association for Pattern Recognition), and was also elected to the ACM Council in 1989-1991 where he served as the Capital Region Representative. He received a best paper award in the 2008 ACM SIGMOD and SIGSPATIAL Conferences.
Most recently, he is the winner of the 2011 Paris Kanellakis Theory and Practice Award for fundamental contributions to the development of multidimensional spatial data structures and indexing.