Monday, September 24, 2007

OpenHTMM Released



Statistical methods of text analysis have become increasingly sophisticated over the years. A good example is automated topic analysis using latent models, two variants of which are Probabilistic latent semantic analysis and Latent Dirichlet Allocation.

Earlier this year, Amit Gruber, a Ph.D. student at the Hebrew University of Jerusalem, presented a technique for analyzing the topical content of text at the Eleventh International Conference on Artificial Intelligence and Statistics in Puerto Rico.

Gruber's approach, dubbed Hidden Topic Markov Models (HTMM), was developed in collaboration with Michal Rosen-Zvi and Yair Weiss. It differs notably from others in that, rather than treat each document as a single "bag of words," it imposes a temporal Markov structure on the document. In this way, it is able to account for shifting topics within a document, and in so doing, provides a topic segmentation within the document, and also seems to effectively distinguish among multiple senses that the same word may have in different contexts within the same document.

Amit is currently a doing graduate internship at Google. As part of his project, he has developed a fresh implementation of his method in C++. We are pleased to release it as the OpenHTMM package to the research community under the Apache 2 license, in the hopes that it will be of general interest and facilitate further research in this area.

Thursday, September 20, 2007

The Sky is Open



We've gotten an incredible amount of positive feedback about Sky in Google Earth, which lets Google Earth users explore the sky above them with hundreds of millions of stars and galaxies taken from astronomy imagery.

From the start though, we have wanted to open the sky up to everyone. As a first step, we've been hard at work developing tools to let astronomers add their own imagery, and we think we've come up with something that does the job nicely. We're pleased to announce the availability of wcs2kml, an open source project for importing astronomical imagery into Sky.

Modern telescopes output imagery in the FITS binary format that contains a set of headers known as a World Coordinate System (that's the "wcs" part) specifying the location of the image on the sky. Wcs2kml handles the task of transforming this imagery into the projection system used by Google Earth (the "kml" part) so that it can be viewed directly in Sky. Wcs2kml also includes tools to simplify uploading this data to a web server and sharing it with friends.

We were astounded at the imagery and novel applications people created when we opened the Google Earth API to our users. Now, by opening Sky in Google Earth to the astronomy community, we hope to open a floodgate of new imagery for Sky!