Tuesday, December 21, 2010

More researchers dive into the digital humanities



When we started Google Book Search back in 2004, we were driven by the desire to make books searchable and discoverable online. But as that corpus grew -- we’ve now scanned approximately 10% of all books published in the modern era -- we began to realize how useful it would be for scholarly work. Humanities researchers have started to ask and answer questions about history, society, linguistics, and culture via quantitative techniques that complement traditional qualitative methods.

We’ve been gratified at the positive response to our initial forays into the digital humanities, from our Digital Humanities Research Awards earlier this year, to the Google Books Ngram Viewer and datasets made public just last week. Today we’re pleased to announce a second set of awards focusing on European universities and research centers.

We’ve given awards to 12 projects led by 15 researchers at 13 institutions:
  • Humboldt-Universität zu Berlin. Annotated Corpora in Studying and Teaching Variation and Change in Academic German, Anke Lüdeling
  • LIMSI/CNRS, Université Paris Sud. Building Multi-Parallel Corpora of Classical Fiction, François Yvon
  • Radboud Universiteit. Extracting Factoids from Dutch Texts, Suzan Verberne
  • Slovenian Academy of Sciences and Arts, Jožef Stefan Institute. Language models for historical Slovenian, Matija Ogrin and Tomaž Erjavec
  • Université d'Avignon, Université de Provence. Robust and Language Independent Machine Learning Approaches for Automatic Annotation of Bibliographical References in DH Books, Articles and Blogs, Patrice Bellot and Marin Dacos
  • Université François Rabelais-Tours. Full-text retrieval and indexation for Early Modern French, Marie-Luce Demonet
  • Université François Rabelais-Tours. Using Pattern Redundancy for Text Transcription, Jean-Yves Ramel and Jean-Charles Billaut
  • Universität Frankfurt. Towards a “Corpus Caucasicum”: Digitizing Pre-Soviet Cyrillic-Based Publications on the Languages of the Caucasus, Jost Gippert
  • Universität Hamburg. CLÉA: Literature Éxploration and Annotation Environment for Google Books Corpora, Jan-Christoph Meister
  • Universität zu Köln. Integrating Charter Research in Old and New Media, Manfred Thaller
  • Universität zu Köln. Validating Metadata-Patterns for Google Books' Ancient Places and Sites, Reinhard Foertsch
  • University of Zagreb. A Profile of Croatian neo-Latin, Neven Jovanović
Projects like these, blending empirical data and traditional scholarship, are springing up around the world. We’re eager to see what results they yield and what broader impact their success will have on the humanities.

(Cross-posted from the European Public Policy Blog)

Saturday, December 18, 2010

Robot hackathon connects with Android, browsers and the cloud



With a beer fridge stocked and music blasting, engineers from across Google—and the world—spent the month of October soldering and hacking in their 20% time to connect hobbyist and educational robots with Android phones. Just two months later we’re psyched to announce three ways you can play with your iRobot Create(R), LEGO(R) MINDSTORMS(R) or VEX Pro(R) through the cloud:

For the month of October, we invited any Googler who wanted to contribute to connect robots to Google’s services in the cloud to pool their 20% time and participate in as much of the process as they could, from design to hard-core coding.

Thanks to our hardware partners (iRobot, LEGO Group, and VEX Robotics), we never suffered a shortage of supplies. Designers flew in from London, and prototypes were passed between engineers in Tel-Aviv, Hyderabad, Zurich, Munich and California. In Mountain View, we gathered around every Thursday night, rigging up a projector against the wall to share our week’s worth of demos while chowing on pizza. And here is what we produced (so far!):



We hope these applications provide some fun and inspire you to build upon this lightweight connectivity between robots, Android, the cloud and your browser.

Friday, December 17, 2010

Find out what’s in a word, or five, with the Google Books Ngram Viewer



[Cross-posted from the Google Books Blog]

Scholars interested in topics such as philosophy, religion, politics, art and language have employed qualitative approaches such as literary and critical analysis with great success. As more of the world’s literature becomes available online, it’s increasingly possible to apply quantitative methods to complement that research. So today Will Brockman and I are happy to announce a new visualization tool called the Google Books Ngram Viewer, available on Google Labs. We’re also making the datasets backing the Ngram Viewer, produced by Matthew Gray and intern Yuan K. Shen, freely downloadable so that scholars will be able to create replicable experiments in the style of traditional scientific discovery.

Comparing instances of [flute], [guitar], [drum] and [trumpet] (
blue, red, yellow and green respectively)
in English literature from 1750 to 2008

Since 2004, Google has digitized more than 15 million books worldwide. The datasets we’re making available today to further humanities research are based on a subset of that corpus, weighing in at 500 billion words from 5.2 million books in Chinese, English, French, German, Russian, and Spanish. The datasets contain phrases of up to five words with counts of how often they occurred in each year.

These datasets were the basis of a research project led by Harvard University’s Jean-Baptiste Michel and Erez Lieberman Aiden published today in Science and coauthored by several Googlers. Their work provides several examples of how quantitative methods can provide insights into topics as diverse as the spread of innovations, the effects of youth and profession on fame, and trends in censorship.

The Ngram Viewer lets you graph and compare phrases from these datasets over time, showing how their usage has waxed and waned over the years. One of the advantages of having data online is that it lowers the barrier to serendipity: you can stumble across something in these 500 billion words and be the first person ever to make that discovery. Below I’ve listed a few interesting queries to pique your interest:

World War I, Great War
child care, nursery school, kindergarten
fax, phone, email
look before you leap, he who hesitates is lost
virus, bacteria
tofu, hot dog
burnt, burned
flute, guitar, trumpet, drum
Paris, London, New York, Boston, Rome
laptop, mainframe, microcomputer, minicomputer
fry, bake, grill, roast
George Washington, Thomas Jefferson, Abraham Lincoln
supercalifragilisticexpialidocious

We know nothing can replace the balance of art and science that is the qualitative cornerstone of research in the humanities. But we hope the Google Books Ngram Viewer will spark some new hypotheses ripe for in-depth investigation, and invite casual exploration at the same time. We’ve started working with some researchers already via our Digital Humanities Research Awards, and look forward to additional collaboration with like-minded researchers in the future.

Thursday, December 16, 2010

Letting everyone do great things with App Inventor



In July, we announced App Inventor for Android, a Google Labs experiment that makes it easier for people to access the capabilities of their Android phone and create apps for their personal use. We were delighted (and honestly a bit overwhelmed!) by the interest that our announcement generated. We were even more delighted to hear the stories of what you were doing with App Inventor. All sorts of people (teachers and students, parents and kids, programming hobbyists and programming newbies) were building Android apps that perfectly fit their needs.

For example, we’ve heard of people building vocabulary apps for their children, SMS broadcasting apps for their community events, apps that track their favorite public transportation routes and—our favorite—a marriage proposal app.

We are so impressed with the great things people have done with App Inventor, we want to allow more people the opportunity to do great things. So we’re excited to announce that App Inventor (beta) is now available in Labs to anyone with a Google account.

Visit the App Inventor home page to get set up and start building your first app. And be sure to share your App Inventor story on the App Inventor user forum. Maybe this holiday season you can make a new kind of homemade gift—an app perfectly designed for the recipient’s needs!

Thursday, December 9, 2010

$6 million to faculty in Q4 Research Awards



We've just completed the latest round of Google Research Awards, our program which identifies and supports faculty pursuing research in areas of mutual interest. We had a record number of submissions this round, and are funding 112 awards across 20 different areas—for a total of more than $6 million. We’re also providing more than 150 Android devices for research and curriculum development to faculty whose projects rely heavily on Android hardware.

The areas that received the highest level of funding, due to the large number of proposals in these areas, were systems and infrastructure, human computer interaction, security and multimedia. We also continue to support international research; in this round, 29 percent of the funding was awarded to universities outside the U.S.

Some examples from this round of awards:
  • Injong Rhee, North Carolina State University. Experimental Evaluation of Increasing TCP Initial Congestion Window (Systems)
  • James Jones, University of California, Irvine. Bug Comprehension Techniques to Assist Software Debugging (Software Engineering)
  • Yonina Eldar, Technion, Israel. Semi-Supervised Regression with Auxiliary Knowledge (Machine Learning)
  • Victor Lavrenko, University of Edinburgh, United Kingdom. Interactive Relevance Feedback for Mobile Search (Information Retrieval)
  • James Glass, MIT. Crowdsourcing to Acquire Semantically Labelled Text and Speech Data for Speech Understanding (Speech)
  • Chi Keung Tang, The Hong Kong University of Science and Technology. Quasi-Dense 3D Reconstruction from 2D Uncalibrated Photos (Geo/Maps)
  • Phil Blunsom, Oxford, United Kingdom. Unsupervised Induction of Multi-Nonterminal Grammars for Statistical Machine Translation (Machine Translation)
  • Oren Etzioni, University of Washington. Accessing the Web utilizing Android Phones, Dialogue, and Open Information Extraction (Mobile)
  • Matthew Salganik, Princeton. Developments in Bottom-Up Social Data Collection (Social)

The full list of this round’s award recipients can be found in this PDF. For more information on our research award program, visit our website. And if you’re a faculty member, we welcome you to apply for one of next year’s two rounds. The deadline for the first round is February 1.

Wednesday, December 8, 2010

Four Googlers elected ACM Fellows this year



I am delighted to share with you that, like last year, the Association for Computing Machinery (ACM) has announced that four Googlers have been elected ACM Fellows in 2010, the most this year from any single corporation or institution.

Luiz Barroso, Dick Lyon, Muthu Muthukrishnan and Fernando Pereira were chosen for their contributions to computing and computer science that have provided fundamental knowledge to the field and have generated multiple innovations.

On behalf of Google, I congratulate our colleagues, who join the 10 other ACM Fellows and other professional society awardees at Google in exemplifying our extraordinarily talented people. I’ve been struck by the breadth and depth of their contributions, and I hope that they will serve as inspiration for students and computer scientists around the world.

You can read more detailed summaries of their achievements below, including the official citations from ACM—although it’s really hard to capture everything they’ve accomplished in one paragraph!

Dr. Luiz Barroso: Distinguished Engineer
For contributions to multi-core computing, warehouse scale data-center architectures, and energy proportional computing
Over the past decade, Luiz has played a leading role in the definition and implementation of Google’s cluster architecture which has become a blueprint for the computing systems behind the world’s leading Internet services. As the first manager of Google’s Platforms Engineering team, he helped deliver multiple generations of cluster systems, including the world’s first container-based data center. His theoretical and engineering insights into the requirements of this class of machinery have influenced the processor industry roadmap towards more effective products for server-class computing. His book "The Datacenter as a Computer" (co-authored with Urs Hoelzle) was the first authoritative publication describing these so-called warehouse-scale computers for computer systems professionals and researchers. Luiz was among the first computer scientists to recognize and articulate the importance of energy-related costs for large data centers, and identify energy proportionality as a key property of energy efficient data centers. Prior to Google, at Digital Equipment's Western Research Lab, he worked on Piranha, a pioneering chip-multiprocessing architecture that inspired today’s popular multi-core products. As one of the lead architects and designers of Piranha, his papers, ideas and numerous presentations stimulated much of the research that led to products decades later.
Richard Lyon: Research Scientist
For contributions to machine perception and for the invention of the optical mouse
In the last four years at Google, Dick led the team developing new camera systems and improved photographic image processing for Street View, while leading another team developing technologies for machine hearing and their application to sound retrieval and ranking. He is now writing a book with Cambridge University Press, and will teach a Stanford course this fall on "Human and Machine Hearing," returning to a line of work that he carried out at Xerox, Schlumberger, and Apple while also doing the optical mouse, bit-serial VLSI computing machines, and handwriting recognition. The optical mouse (1980) is especially called out in the citation, because it exemplifies the field of "semi-digital" techniques that he developed, which also led to his work on the first single-chip Ethernet device. And more recently, as chief scientist at Foveon, Dick invented and developed several new techniques for color image sensing and processing, and delivered acclaimed cameras and end-user software. A hallmark of Dick’s work during his distinguished career has been a practical interplay between theory, including biological theory, and practical computing.
Dr. S. Muthukrishnan: Research Scientist
For contributions to efficient algorithms for string matching, data streams, and Internet ad auctions
Muthu has made significant contributions to the theory and practice of Internet ad systems during his more than four years at Google. Muthu's breakthrough WWW’09 paper presented a general stable matching framework that produces a (desirable) truthful mechanism capturing all of the common variations and more, in contradiction to prevailing wisdom. In display ads, where image, video and other types of ads are shown as users browse, Muthu led Ad Exchange at Google, to automate placement of display ads that were previously negotiated offline by sales teams. Prior to Google, Muthu was well known for his pioneering work in the area of data stream algorithmics (including a definitive book on the subject), which led to theoretical and practical advances still in use today to monitor the health and smooth operation of the Internet. Muthu has a talent for bringing new perspectives to longstanding open problems as exemplified in the work he did on string processing. Muthu has made influential contributions to many other areas and problems including IP networks, data compression, scheduling, computational biology, distributed algorithms and database technology. As an educator, Muthu’s avant garde teaching style won him the Award for Excellence in Graduate Teaching at Rutgers CS, where is on the faculty. As a student remarked in his blog: "there is a magic in his class which kinda spellbinds you and it doesn't feel like a class. It’s more like a family sitting down for dinner to discuss some real world problems. It was always like that even when we were 40 people jammed in for cs-513."
Dr. Fernando Pereira: Research Director
For contributions to machine-learning models of natural language and biological sequences
For the past three years, Fernando has been leading some of Google’s most advanced natural language understanding efforts and some of the most important applications of machine learning technology. He has just the right mix of forward thinking ideas and the ability to put ideas into practice. With this balance, Fernando has has helped his team of research scientists apply their ideas at the scale needed for Google. From when he wrote the first Prolog compiler (for the PDP-10 with David Warren) to his days as Chair at University of Pennsylvania, Fernando has demonstrated a unique understanding of the challenges and opportunities that faced companies like Google with their unprecedented access to massive data sets and its application to the world of speech recognition, natural language processing and machine translation. At SRI, he pioneered probabilistic language models at a time when logic-based models were more popular. At AT&T, his work on a toolkit for finite-state models became an industry standard, both as a useful piece of software and in setting the direction for building ever larger language models. And his year at WhizBang had an influence on other leaders of the field, such as Andrew McCallum at University of Massachusetts and John Lafferty and Tom Mitchell at Carnegie Mellon University, with whom Fernando developed the Conditional Random Field model for sequence processing that has become one of the leading tools of the trade.

Finally, we also congratulate Professor Christos Faloutsos of Carnegie Mellon, who is on sabbatical and a Visiting Faculty Member at Google this academic year. Professor Faloutsos is cited for contributions to data mining, indexing, fractals and power laws.

Update 12/8: Updated Dick Lyon's title and added information about Professor Faloutsos.

Friday, December 3, 2010

Google Launches Cantonese Voice Search in Hong Kong



On November 30th 2010, Google launched Cantonese Voice Search in Hong Kong. Google Search by Voice has been available in a growing number of languages since we launched our first US English system in 2008. In addition to US English, we already support Mandarin for Mainland China, Mandarin for Taiwan, Japanese, Korean, French, Italian, German, Spanish, Turkish, Russian, Czech, Polish, Brazilian Portuguese, Dutch, Afrikaans, and Zulu, along with special recognizers for English spoken with British, Indian, Australian, and South African accents.

Cantonese is widely spoken in Hong Kong, where it is written using traditional Chinese characters, similar to those used in Taiwan. Chinese script is much harder to type than the Latin alphabet, especially on mobile devices with small or virtual keyboards. People in Hong Kong typically use either “Cangjie” (倉頡) or “Handwriting” (手寫輸入) input methods. Cangjie (倉頡) has a steep learning curve and requires users to break the Chinese characters down into sequences of graphical components. The Handwriting (手寫輸入) method is easier to learn, but slow to use. Neither is an ideal input method for people in Hong Kong trying to use Google Search on their mobile phones.

Speaking is generally much faster and more natural than typing. Moreover, some Chinese characters – like “滘” in “滘西州” (Kau Sai Chau) and “砵” in “砵典乍街” (Pottinger Street) – are so rarely used that people often know only the pronunciation, and not how to write them. Our Cantonese Voice Search begins to address these situations by allowing Hong Kong users to speak queries instead of entering Chinese characters on mobile devices. We believe our development of Cantonese Voice Search is a step towards solving the text input challenge for devices with small or virtual keyboards for users in Hong Kong.

There were several challenges in developing Cantonese Voice Search, some unique to Cantonese, some typical of Asian languages and some universal to all languages. Here are some examples of problems that stood out:
  • Data Collection: In contrast to English, there are few existing Cantonese datasets that can be used to train a recognition system. Building a recognition system requires both audio and text data so it can recognize both the sounds and the words. For audio data, our efficient DataHound collection technique uses smartphones to record and upload large numbers of audio samples from local Cantonese-speaking volunteers. For text data, we sample from anonymized search query logs from http://www.google.com.hk to obtain the large amounts of data needed to train language models.
  • Chinese Word Boundaries: Chinese writing doesn’t use spaces to indicate word boundaries. To limit the size of the vocabulary for our speech recognizer and to simplify lexicon development, we use characters, rather than words, as the basic units in our system and allow multiple pronunciations for each character.
  • Mixing of Chinese Characters and English Words: We found that Hong Kong users mix more English into their queries than users in Mainland China and Taiwan. To build a lexicon for both Chinese characters and English words, we map English words to a sequence of Cantonese pronunciation units.
  • Tone Issues: Linguists disagree on the best count of the number of tones in Cantonese – some say 6, some say 7, or 9, or 10. In any case, it’s a lot. We decided to model tone-plus-vowel combinations as single units. In order to limit the complexity of the resulting model, some rarely-used tone-vowel combinations are merged into single models.
  • Transliteration: We found that some users use English words while others use the Cantonese transliteration (e.g.,: “Jordan” vs. “佐敦­”). This makes it challenging to develop and evaluate the system, since it’s often impossible for the recognizer to distinguish between an English word and its Cantonese transliteration. During development we use a metric that simply checks whether the correct search results are returned.
  • Different Accents and Noisy Environment: People speak in different styles with different accents. They use our systems in a variety of environments, including offices, subways, and shopping malls. To make our system work in all these different conditions, we train it using data collected from many different volunteers in many different environments.
Cantonese is Google’s third spoken language for Voice Search in the Chinese linguistic family, after Mandarin for Mainland China and Mandarin for Taiwan. We plan to continue to use our data collection and language modeling technologies to help speakers of Chinese languages easily input text and look up information.

Wednesday, November 10, 2010

Voice Search in Underrepresented Languages



Welkom*!

Today we’re introducing Voice Search support for Zulu and Afrikaans, as well as South African-accented English. The addition of Zulu in particular represents our first effort in building Voice Search for underrepresented languages.

We define underrepresented languages as those which, while spoken by millions, have little presence in electronic and physical media, e.g., webpages, newspapers and magazines. Underrepresented languages have also often received little attention from the speech research community. Their phonetics, grammar, acoustics, etc., haven’t been extensively studied, making the development of ASR (automatic speech recognition) voice search systems challenging.

We believe that the speech research community needs to start working on many of these underrepresented languages to advance progress and build speech recognition, translation and other Natural Language Processing (NLP) technologies. The development of NLP technologies in these languages is critical for enabling information access for everybody. Indeed, these technologies have the potential to break language barriers.

We also think it’s important that researchers in these countries take a leading role in advancing the state of the art in their own languages. To this end, we’ve collaborated with the Multilingual Speech Technology group at South Africa’s North-West University led by Prof. Ettiene Barnard (also of the Meraka Research Institute), an authority in speech technology for South African languages. Our development effort was spearheaded by Charl van Heerden, a South African intern and a student of Prof. Barnard. With the help of Prof. Barnard’s team, we collected acoustic data in the three languages, developed lexicons and grammars, and Charl and others used those to develop the three Voice Search systems. A team of language specialists traveled to several cities collecting audio samples from hundreds of speakers in multiple acoustic conditions such as street noise, background speech, etc. Speakers were asked to read typical search queries into an Android app specifically designed for audio data collection.

For Zulu, we faced the additional challenge of few text sources on the web. We often analyze the search queries from local versions of Google to build our lexicons and language models. However, for Zulu there weren’t enough queries to build a useful language model. Furthermore, since it has few online data sources, native speakers have learned to use a mix of Zulu and English when searching for information on the web. So for our Zulu Voice Search product, we had to build a truly hybrid recognizer, allowing free mixture of both languages. Our phonetic inventory covers both English and Zulu and our grammars allow natural switching from Zulu to English, emulating speaker behavior.

This is our first release of Voice Search in a native African language, and we hope that it won’t be the last. We’ll continue to work on technology for languages that have until now received little attention from the speech recognition community.

Salani kahle!**

* “Welcome” in Afrikaans
** “Stay well” in Zulu

Friday, November 5, 2010

Suggesting a Better Remote Control



It seems clear that the TV is a growing source of online audio-video content that you select by searching. Entering characters of a search string one by one using a traditional remote control and onscreen keyboard is extremely tiresome. People have been working on building better ways to search on the TV, ranging from small keyboards to voice input to interesting gestures you might make to let the TV know what you want. But currently the traditional left-right-up-down clicker dominates as the family room input device. To enter the letters of a show, you click over and over until you get to the desired letter on the on-screen keyboard and then you hit enter to select it. You repeat this mind-numbingly slow process until you type in your query string or at least enough letters that the system can put up a list of suggested completions. Can we instead use a Google AutoComplete style recommendation model and novel interface to make character entry less painful?

We have developed an interaction model that reduces the distance to the predicted next letter without scrambling or moving letters on the underlying keyboard (which is annoying and increases the time it takes to find the next letter). We reuse the highlight ring around the currently selected letter and fill it with 4 possible characters that might be next, but we do not change the underlying keyboard layout. With 4 slots to suggest the next letter and a good prediction model trained on the target corpus, the next letter is often right where you are looking and just a click away.

To learn more about this combination of User Experience and Machine Learning to address a growing problem with searching on TVs, check out our WWW 2010 publication,QuickSuggest.

Tuesday, October 26, 2010

Exploring Computational Thinking



Over the past year, a group of California-credentialed teachers along with our own Google engineers came together to discuss and explore ideas about how to incorporate computational thinking into the K-12 curriculum to enhance student learning and build this critical 21st century skill in everyone.

What exactly is computational thinking? Well, that would depend on who you ask as there are several existing resources on the web that may define this term slightly differently. We define computational thinking (CT) as a set of skills that software engineers use to write the programs that underlay all of the computer applications you use every day. Specific CT techniques include:
  • Problem decomposition: the ability to break down a problem into sub-problems
  • Pattern recognition: the ability to notice similarities, differences, properties, or trends in data
  • Pattern generalization: the ability to extract out unnecessary details and generalize those that are necessary in order to define a concept or idea in general terms
  • Algorithm design: the ability to build a repeatable, step-by-step process to solve a particular problem
Given the increasing prevalence of technology in our day-to-day lives and in most careers outside of computer science, we believe that it is important to raise this base level of understanding in everyone.

To this end, we’d like to introduce you to a new resource: Exploring Computational Thinking. Similar to some of our other initiatives in education, including CS4HS and Google Code University, this program is committed to providing educators with access to our curriculum models, resources, and communities to help them learn more about CT, discuss it as a strategy for teaching and understanding core curriculum, as well as easily incorporate CT into their own curriculum, whether it be in math, science, language, history or beyond. The materials developed by the team reflect both the teachers’ expertise in pedagogy and K-12 curriculum as well as our engineers’ problem-solving techniques that are critical to our industry.

Prior to launching this program, we reached out to several educators and classrooms and had them try our materials. Here’s some of the feedback we received:
  • CT as a strategy for teaching and student learning works well with many subjects, and can easily be incorporated to support the existing K-12 curriculum
  • Our models help to call out the specific CT techniques and provide more structure around the topics taught by educators, many of who were already unknowingly applying CT in their classrooms
  • Including programming exercises in the classroom can significantly enrich a lesson by both challenging the advanced students and motivating the students who have fallen behind
  • Our examples provide educators with a means of re-teaching topics that students have struggled with in the past, without simply going through the same lesson that frustrated them before
To learn more about our program or access CT curriculum materials and other resources, visit us at www.google.com/edu/ect.

Tuesday, October 19, 2010

Google at the Conference on Empirical Methods in Natural Language Processing (EMNLP '10)



The Conference on Empirical Methods in Natural Language Processing (EMNLP '10) was recently held at the MIT Stata Center in Massachusetts. Natural Language Processing is at the core of many of the things that we do here at Google. Googlers have therefore been traditionally part of this research community, participating as program committee members, paper authors and attendees.

At this year's EMNLP conference Google Fellow, Amit Singhal gave an invited keynote talk on "Challenges in running a commercial search engine" where he highlighted some of the exciting opportunities, as well as challenges, that Google is currently facing. Furthermore, Terry Koo (who recently joined Google), David Sontag (former Google PhD Fellowship recipient) and their collaborators from MIT received the Fred Jelinek Best Paper Award for their innovative work on syntactic parsing with the title "Dual Decomposition for Parsing with Non-Projective Head Automata".

Here is a complete list of the papers presented by Googlers at the conference:

Friday, October 15, 2010

Kuzman Ganchev Receives Presidential Award from the Republic of Bulgaria



We would like to congratulate Kuzman Ganchev for being the runner-up for the John Atanasoff award from the President of the Republic of Bulgaria. Kuzman recently joined our New York office as a research scientist, after completing his doctoral studies at the University of Pennsylvania.

The John Atanasoff award was established in 2003 and is given annually to a Bulgarian scientist under 35 for scientific or practical contributions to the development of computer and information technology worldwide and significant economic or social importance for Bulgaria. Kuzman received the award for his contributions to computational linguistics and machine learning. Kuzman is the co-author of more than 20 publications that have appeared in international conferences and journals.

Thursday, October 14, 2010

Korean Voice Input -- Have you Dictated your E-Mails in Korean lately?



Google Voice Search has been available in various flavors of English since 2008, in Mandarin and Japanese since 2009, in French, Italian, German and Spanish since June 2010 (see also in this blog post), and shortly after that in Taiwanese. On June 16th 2010, we took the next step by launching our Korean Voice Search system.

Korean Voice Search, by focusing on finding the correct web page for a spoken query, has been quite successful since launch. We have improved the acoustic models several times which resulted in significantly higher accuracy and reduced latency, and we are committed to improving it even more over time.

While voice search significantly simplifies input for search, especially for longer queries, there are numerous applications on any smartphone that could also benefit from general voice input, such as dictating an email or an SMS. Our experience with US English has taught us that voice input is as important as voice search, as the time savings from speaking rather than typing a message are substantial. Korean is the first non-English language where we are launching general voice input. This launch extends voice input to emails, SMS messages, and more on Korean Android phones. Now every text field on the phone will accept Korean speech input.

Creating a general voice input service had different requirements and technical challenges compared to voice search. While voice search was optimized to give the user the correct web page, voice input was optimized to minimize (Hangul) character error rate. Voice inputs are usually longer than searches (short full sentences or parts of sentences), and the system had to be trained differently for this type of data. The current system’s language model was trained on millions of Korean sentences that are similar to those we expect to be spoken. In addition to the queries we used for training voice search, we also used parts of web pages, selected blogs, news articles and more. Because the system expects spoken data similar to what it was trained on, it will generally work well on normal spoken sentences, but may yet have difficulty on random or rare word sequences -- we will work to keep improving on those.

Korean voice input is part of Google’s long-term goal to make speech input an acceptable and useful form of input on any mobile device. As with voice search, our cloud computing infrastructure will help us to improve quality quickly, as we work to better support all noise conditions, all Korean dialects, and all Korean users.

Clustering Related Queries Based on User Intent



People today use search engines for all their information needs, but when they pose a particular search query, they typically have a specific underlying intent. However, when looking at any query in isolation, it might not entirely be clear what the underlying intent is. For example, when querying for mars, a user might be looking for more information about the planet Mars, or the planets in the solar system in general, or the Mars candy bar, or Mars the Roman god of war. The ambiguity in intent is most pronounced for queries that are inherently ambiguous and for queries about prominent entities about which there are various different types of information on the Internet. Given such ambiguity, modern search engines try to complement their search results with lists of related queries that can be used to further explore a particular intent.

In a recent paper, we explored the problem of clustering the related queries as a means of understanding the different intents underlying a given user query. We propose an approach that combines an analysis of anonymized document-click logs (what results do users click on) and query-session logs (what sequences of queries do users pose in a search session). We model typical user search behavior as a traversal of a graph whose nodes are related queries and clicked documents. We propose that the nodes in the graph, when grouped based on the probability of a typical user visiting them within a single search session, yield clusters that correspond to distinct user intents.

Our results show that underlying intents (clusters of related queries) almost always correspond to well-understood, high-level concepts. For example, for mars, in addition to re-constructing each of the intents listed earlier, we also find distinct clusters grouping queries about NASA’s missions to the planet, about specific interest in life on Mars, as well as a Japanese comic series, and a grocery chain named Mars. We found that our clustering approach yields better results than earlier approaches that either only used document-click or only query-session information. More details about our proposed approach and an analysis of the resulting clusters can be found in our paper that was presented at the International World Wide Web conference earlier this year.

Wednesday, October 13, 2010

Google at USENIX Symposium on Operating Systems Design and Implementation (OSDI ‘10)


The 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI ‘10) was recently held in Vancouver, B.C. This biennial conference is one of the premiere forums for presenting innovative research in distributed systems from both academia and industry, and we were glad to be a part of it.

In addition to sponsoring this conference since 2002, Googlers contributed to the exchange of scientific ideas through authoring or co-authoring 3 published papers, organizing workshops, and serving on the program committee. A short summary of the contributions:

  • Large-scale Incremental Processing Using Distributed Transactions and Notifications.
    Google replaced its batch-oriented indexing system with an incremental system, Percolator. Rather than running a series of high-latency map-reduces over large batches of documents, we now index individual documents at very low latency. The result is a 50% reduction in search result age; our paper discusses this project and the implications of the result.
  • Availability in Globally Distributed Storage Systems.
    Reliable and efficient storage systems are a key component of cloud-based services. In this paper we characterize the availability properties of cloud storage systems based on extensive monitoring of Google's main storage infrastructure and present statistical models that enable further insight into the impact of multiple design choices, such as data placement and replication strategies. We demonstrate the utility of these models by computing data availability under a variety of replication schemes given the real patterns of failures observed in our fleet.
  • Onix: A Distributed Control Platform for Large-scale Production Networks.
    There has been recent interest in a new networking paradigm called Software-Defined Networking (SDN). The crucial enabler for SDN is distributed control platform that shields developers from the details of the underlying physical infrastructure and allows them to write sophisticated control logic against a high-level API. Onix provides such a control platform for large-scale production networks.

In addition to the papers presented by current Googlers, we were also happy to see that the recipient of the 2009 Google Ph.D. Fellowship in Cloud Computing, Roxana Geambasu, presented her work on Comet: An active distributed key-value store.

Videos of all of the talks from OSDI are available on the conference website for attendees and current USENIX members. There is also a USENIX YouTube channel with a growing subset of the conference videos open to everyone.

Google is making substantial progress on many of the grand challenge problems in computer science and artificial intelligence as part of its mission to organize the worlds information and make it useful. Given the continuing increase in the scale of our distributed systems it’s fair to say we’ll have some other exciting new work to share at the next OSDI. Hope to see you in 2012.

Tuesday, October 12, 2010

Making an Impact on a Thriving Speech Research Community



While we continue to launch exciting new speech products--most recently Voice Actions and Google Search by Voice in Russian, Czech and Polish--we also strive to contribute to the academic research community by sharing both innovative techniques and experiences with large-scale systems.

This year’s gathering of the world’s experts in speech technology research, Interspeech 2010 in Makuhari, Japan, which Google co-sponsored, was a fantastic demonstration of the momentum of this community, driven by new challenges such as mobile voice communication, voice search, and the increasing international reach of speech technologies.

Googlers published papers that showcased the breadth and depth of our speech recognition research. Our work addresses both fundamental problems in acoustic and language modeling, as well as the practical issues of building scalable speech interfaces that real people use everyday to make their lives easier.

Here is a list of the papers presented by Googlers at the conference:

Friday, October 8, 2010

Bowls and Learning



It is easy to find the bottom of a bowl no matter where you start -- if you toss a marble anywhere into the bowl, it will roll downhill and find its way to the bottom.

What does this have to do with Machine Learning? A natural way to try to construct an accurate classifier is to minimize the number of prediction errors the classifier makes on training data. The trouble is, even for moderate-sized data sets, minimizing the number of training errors is a computationally intractable problem. A popular way around this is to assign different training errors different costs and to minimize the total cost. If the costs are assigned in a certain way (according to a “convex loss function”), the total cost can be efficiently minimized the way a marble rolls to the bottom of a bowl.

In a recent paper, Rocco Servedio and I show that no algorithm that works this way can achieve a simple and natural theoretical noise-tolerance guarantee that can be achieved by other kinds of algorithms. A result like this is interesting for two reasons: first, it's important to understand what you cannot do with convex optimization in order to get a fuller understanding of what you can do with it. Second, this result may spur more research into noise-tolerant training algorithms using alternative approaches.

Wednesday, October 6, 2010

Poetic Machine Translation



Once upon a midnight dreary, long we pondered weak and weary,
Over many a quaint and curious volume of translation lore.
When our system does translation, lifeless prose is its creation;
Making verse with inspiration no machine has done before.
So we want to boldly go where no machine has gone before.
Quoth now Google, "Nevermore!"

Robert Frost once said, “Poetry is what gets lost in translation.” Translating poetry is a very hard task even for humans, and is clearly beyond the capability of current machine translation systems. We therefore, out of academic curiosity, set about testing the limits of translating poetry and were pleasantly surprised with the results!

We are going to present a paper on poetry translation at the EMNLP conference this year. In this paper, we investigate the purely technical challenges around generating translations with fixed rhyme and meter schemes.

The value of preserving meter and rhyme in poetic translation has been highly debated. Vladimir Nabokov famously claimed that, since it is impossible to preserve both the meaning and the form of the poem in translation, one must abandon the form altogether. Another authority (and for us computer scientists, perhaps the more familiar one), Douglas Hofstadter argues that preserving the form is very important to maintaining the feeling and the sound of a poem. It is in this spirit that we decided to experiment with translating not only poetic meaning, but form as well.

A Statistical Machine Translation system, like Google Translate, typically performs translations by searching through a multitude of possible translations, guided by a statistical model of accuracy. However, to translate poetry, we not only considered translation accuracy, but meter and rhyming schemes as well. In our paper we describe in more detail how we altered our translation model, but in general we chose to sacrifice a little of the translation’s accuracy to get the poetic form right.

As a pleasant side-effect, the system is also able to translate anything into poetry, allowing us to specify the genre (say, limericks or haikus), or letting the system pick the one it thinks fits best. At the moment, the system is too slow to be made publicly accessible, but we thought we’d share some excerpts:

A stanza from Essai monographique sur les Dianthus des Pyrénées françaises by Edouard Timbal-Lagrave and Eugène Bucquoy, translated to English as a pair of couplets in iambic tetrameter:
So here's the dear child under land,
will not reflect her beauty and
besides the Great, no alter dark,
the pure ray, fronts elected mark.

Voltaire’s La Henriade, translated as a couplet in dactylic tetrameter:
These words compassion forced the small to lift her head
gently and tell him to whisper: “I'm not dead."

Le Miroir des simples âmes, an Old French poem by Marguerite Porete, translated to Modern French by M. de Corberon, and then to haiku by us:
“Well, gentle soul”, said
Love, “say whatever you please,
for I want to hear.”

More examples and technical details can be found in our research paper (as well as clever commentary).

Thursday, September 30, 2010

Veni, Vidi, Verba Verti



Ut munimenta linguarum convellamus et scientiam mundi patentem utilemque faciamus, instrumenta convertendi multarum nationum linguas creavimus. Hodie nuntiamus primum instrumentum convertendi linguam qua nulli nativi nunc utuntur: Latinam. Cum pauci cotidie Latine loquantur, quotannis amplius centum milia discipuli Americani Domesticam Latinam Probationem suscipiunt. Praeterea plures ex omnibus mundi populis Latinae student.

Hoc instrumentum convertendi Latinam rare usurum ut convertat nuntios electronicos vel epigrammata effigierum YouTubis intellegemus. Multi autem vetusti libri de philosophia, de physicis, et de mathematica lingua Latina scripti sunt. Libri enim vero multi milia in Libris Googlis sunt qui praeclaros locos Latinos habent.

Convertere instrumentis computatoriis ex Latina difficile est et intellegamus grammatica nostra non sine culpa esse. Autem Latina singularis est quia plurimi libri lingua Latina iampridem scripti erant et pauci novi posthac erunt. Multi in alias linguas conversi sunt et his conversis utamur ut nostra instrumenta convertendi edoceamus. Cum hoc instrumentum facile convertat libros similes his ex quibus edidicit, nostra virtus convertendi libros celebratos (ut Commentarios de Bello Gallico Caesaris) iam bona est.

Proximo tempore locum Latinum invenies vel auxilio tibi opus eris cum litteris Latinis, conare hunc.

Saturday, September 18, 2010

Remembering Fred Jelinek



It is with great sadness that we note the passing of Fred Jelinek, teacher and colleague to many of us here at Google. His seminal contributions to statistical modeling of speech and language influenced not only us, but many more members of the research community.

Several of us at Google remember Fred:

Ciprian Chelba:
Fred was my thesis advisor at CLSP. My ten years of work in the field after graduation led me to increasingly appreciate the values that Fred instilled by personal example: work on the hard problem because it simply cannot be avoided, bring fundamental and original contributions that steer clear of incrementalism, exercise your creativity despite the risks entailed, and pursue your ideas with determination.

I recently heard a comment from a colleague, “A natural born leader is someone you follow even if only out of curiosity.” I immediately thought of Fred. Working with him marked a turning point in my life, and his influential role will be remembered.

Bob Moore:
I first met Fred Jelinek in 1984 at an IBM-sponsored workshop on natural-language processing. Fred's talk was my first exposure to the application of statistical ideas to language, and about the only thing I understood was the basic idea of N-gram language modeling: estimate the probability of the next word in a sequence based on a small fixed number of immediately preceding words. At the time, I was so steeped in the tradition of linguistically-based formal grammars that I was sure Fred's approach could not possibly be useful.

Starting about five years later, however, I began to interact with Fred often at speech and language technology meetings organized by DARPA, as well as events affiliated with the Association for Computational Linguistics. Gradually, I (along with much of the computational linguistics community) began to understand and appreciate the statistical approach to language technology that Fred and his colleagues were developing, to the point that it now dominates the field of computational linguistics, including my own research. The importance of Fred's technical contributions and visionary leadership in bringing about this revolution in language technology cannot be overstated. The field is greatly diminished by his passing.

Fernando Pereira:
I met Fred first at a DARPA-organized workshop where one of the main topics was how to put natural language processing research on a more empirical, data-driven path. Fred was leading the charge for the move, drawing from his successes in speech recognition. Although I had already started exploring those ideas, I was not fully convinced by Fred’s vision. Nevertheless, Fred’s program raised many interesting research questions, and I could not resist some of them. Working on search for speech recognition at AT&T, I was part of the small team that invented the finite-state transducer representation of recognition models. I gave what I think was the first public talk on the approach at a workshop session that Fred chaired. It was Fred’s turn to be skeptical, and we had a spirited exchange in the discussion period. At the time, I was disappointed that I had failed to interest Fred in the work, but later I was delighted when Fred became a strong supporter of our work after a JHU Summer workshop where Michael Riley led the use of our software tools in successful experiments with a team of JHU researchers and students. Indeed, in hindsight, Fred was right to be skeptical before we had empirical validation for the approach, and his strong support when the results started coming in was thus much more meaningful and gratifying. Through these experiences and much more, I came to respect immensely Fred’s pioneer spirit, vision, and sharp mind. Many of my most successful projects benefited directly or indirectly from his ideas, his criticism, and his building of thriving institutions, from CLSP to links with the research team at Charles University in Prague. I saw Fred last at ACL in Uppsala. He was in great form, and we had a good discussion on funding for the summer workshops. I am very sad that he will not be with us to continue these conversations.

Shankar Kumar:
Fred was my academic advisor at CLSP/JHU and I interacted with him throughout my Ph.D. program. I had the privilege of having him on my thesis committee. My very first exposure to research in speech and NLP was through an independent study that I did under him. A few years later, I was his teaching assistant for the speech recognition class. Fred's energy and passion for research made a strong impression on me back then and continues to influence my work to this day. I remember Fred carefully writing up his ideas and sending them out as a starting point to our discussions. While I found this curiously amusing at the time, I now think this was his unique approach to ensure clarity of thought and to steer the discussion without distractions. Fred's enthusiasm for learning new concepts was infectious! I attended several classes and guest lectures with him - graphical models, NLP, and many more. His insightful questions and his active participation in each one of these classes made them memorable for me. He epitomized what a life-long learner should be. I will always recall Fred's advice on sharing credit generously. In his own words, "The contribution of a research paper does not get divided by the number of authors". By his passing, we have lost a role model who dedicated his life to research and whose contributions will continue to impact and shape the field for years to come.

Michael Riley:
I got to know Fred pretty well having attended two of the CLSP six-week summer workshops, working on a few joint grants, and visiting CLSP in between. If there is a ‘father of speech recognition’, its got to be Fred Jelinek - he led the IBM team that invented and popularized many of the key methods used today. His intellect, wide knowledge, and force of will served him well later as the leader of the JHU Center for Language and Speech Processing - a sort of academic hearth where countless speech/NLP researchers and students interacted over the years in seminars and workshops. I was impressed that at an age when many retired and after which most of his IBM colleagues had gone into (very lucrative) financial engineering, he remained a vigorous, leading academic. Fernando mentioned the initial skepticism he had for our work on weighted FSTs for ASR. Some years later though I heard that he praised the work to my lab director, Larry Rabiner, on a plane ride that likely helped my promotion shortly thereafter. And no discussion of Fred would be complete without a mention of his inimitable humor, delivered in that loud Czech-accented voice:
Riley [at workshop planning meeting]: “Could they hold the summer workshop in some nicer place than Baltimore to help attract people?”
Fred: “Riley, we’ll hold it in Rome next year and get better people than you!”

Seminar presenter: [fumbling with Windows configuration for minutes].
Fred [very loud]: “How long do we have to endure this high-tech torture?”

The website of The Johns Hopkins University’s Center for Language and Speech Processing links to Fred’s own descriptions of his life and technical achievements.

Friday, September 17, 2010

Frowns, Sighs, and Advanced Queries -- How does search behavior change as search becomes more difficult?



How does search behavior change as search becomes more difficult?

At Google, we strive to make finding information easy, efficient, and even fun. However, we know that once in a while, finding a specific piece of information turns out to be tricky. Based on dozens of user studies over the years, we know that it’s relatively easy for an observer to notice that the user is having problems finding the information, by watching changes in language, body language, and facial expressions:



Computers, however, don’t have the luxury of observing a user the way another person would. But would it be possible for a computer to somehow tell that the user is struggling to find information?

We decided to find out. We first ran a study in the usability lab where we gave users search tasks, some of which we knew to be difficult. The first couple of searches always looked pretty much the same independent of task difficulty: users formulated a query, quickly scanned the results and either clicked on a result or refined the query. However, after a couple of unsuccessful searches, we started noticing interesting changes in behavior. In addition to many of them sighing or starting to bite their nails, users sometimes started to type their searches as natural language questions, they sometimes spent a very long time simply staring at the results page, and they sometimes completely changed their approach to the task.

We were fascinated by these findings as they seemed to be signals that the computer could potentially detect while the user is searching. We formulated the initial findings from the usability lab study as hypotheses which we then tested in a larger web-based user study.

The overall findings were promising: we found five signals that seemed to indicate that users were struggling in the search task. Those signals were: use of question queries, use of advanced operators, spending more time on the search results page, formulating the longest query in the middle of the session, and spending a larger proportion of the time on the search results page. None of these signals alone are strong enough predictors of users having problems in search tasks. However, when used together, we believe we can use them to build a model that will one day make it possible for computers to detect frustration in real time.

You can read the full text of the paper here.

Wednesday, September 15, 2010

Focusing on Our Users: The Google Health Redesign



When I relocated to New York City a few years ago, some of the most important health information for me to have on hand was my immunization history. At the time, though, my health records were scattered, and it felt like a daunting task to organize them -- a not-uncommon problem that many people face. For me, the solution came when Google Health became available in May of 2008, and I started using it to organize my health information and keep it more manageable. I also saw the potential to do much more within Google Health, such as tracking my overall fitness goals. When I joined the Google Health team as the lead user experience researcher, I was curious about the potential for Google Health to impact people’s lives beyond things like immunization tracking and how we could make the product a lot easier to use. So I set out to explore how to expand and improve Google Health.

Here at Google, we focus on the user throughout the entire product development process. So before Google Health was first launched, we interviewed many people about how they managed their medical records and other health information to better understand their needs. We then iteratively created and tested multiple concepts and designs. After our initial launch, we followed up with actual Google Health users through surveys, interviews, and usability studies to understand how well we were meeting their needs.



From this user research, we learned what was working in the product and what needed to be improved. Here are some of the things our users found especially useful:
  • Organizing and tracking health-related information in a single place that is accessible from anywhere at any time
  • Sharing medical records easily with loved ones and health care providers, either by allowing online access or by printing out health summaries
  • Referencing rich information about health topics, aggregated from trusted sources and Google search results

Our users also described to us the benefits they saw from using Google Health:
“Google Health gives me many tools to research my prescriptions and symptoms, and to track all of the many tests I keep having. Google Health made several necessary and cumbersome tasks easy and worry free.”

“For years now, I've tried to remember my son’s allergies and medications, but the list has grown so long, that I kept forgetting one or two when a doctor asked me about them. That can't happen again because I now have a single place to keep up with them. And I love the fact that I can print off information for situations when I really need it.”

“I really like that I can share my profile with others. I want my mom to know my medical information, just in case anything ever happens to me.”

While we learned that our users were clearly getting positive results from using Google Health, our research also taught us that more was needed. We learned that we needed to make fundamental changes to fully meet the needs of all of our current and prospective users, such as those that are chronically ill, those who care for family members, and especially those users looking to track and improve their wellness and fitness.

On this last point, our user surveys already pointed out that there was more we could do to help our users track and manage their wellness, not just their sickness, so we conducted further research about how people collect, monitor, track, and analyze their wellness data. We interviewed several people in their homes and invited others into our usability labs. As a result, we identified several areas where we could improve Google Health to make it a more useful wellness tool, including:
  • Dedicated wellness tracking including pre-built and custom trackers
  • Efficient manual data entry as well as automatic data collection through devices
  • A customizable summary dashboard of wellness and other health topics
  • Goal setting and progress tracking using interactive charts
  • Personalized pages for each topic with rich charts, journaling, and related information

These insights led us to a whole new set of design proposals. We gathered feedback on the resulting sketches, wire-frames, and screenshots from active and new Google Health users. The results throughout this process were eye-opening. While we were on the right track for some parts of the design, other parts had to be corrected or even redesigned. We went through several iterations until we had a design that tested well and we felt met the user needs our research had uncovered. Finally, we conducted several usability studies with a functioning prototype throughout the product development process to continuously improve usability and function.



At the end, the collaboration between the user experience, engineering, and product management teams resulted in an entirely new user experience for Google Health combined with a set of new functionality that is now available for you to try out at www.google.com/health. See for yourself how the old and new versions compare. Here is a screenshot of a health profile in the new version:



And this is how the same account and profile looked in the old user interface:



As a Google Health user, I am excited to take advantage of the new design and have already started using it for my own exercise and weight tracking. And on behalf of the user experience team and the entire Google Health team, we’re excited about being able to bring you a new design and more powerful tool that we think will meet more of your health and wellness needs.

We look forward to continuing to explore how we can make Google Health even more useful and easier to use for people like you. As you use Google Health, you may see a link to a feedback survey at the top of the application. If you do, please take the time to fill it out - we will be listening to your input!

Tuesday, September 14, 2010

Discontinuous Seam Carving for Video Retargeting



Videos come in different sizes, resolutions and aspect ratios, but the device used for playback, may it be your TV, mobile phone, or laptop, only has a fixed resolution and form factor. As a result, you cannot watch your favorite old show that came in 4:3 on your new 16:9 HDTV without having black bars on the side, referred to as letterboxing. Likewise, widescreen movies and user-videos uploaded on YouTube are shot using various cameras with wide-ranging formats, so they do not fit completely on the screen. As an alternative to letterboxing, several devices try to upscale the content uniformly, which either changes the aspect ratio, making everything look stretched out, or simply crop the frame, thereby discarding any content that cannot fit the screen after scaling.

At Google Research, together with collaborators from Georgia Tech, we have developed an algorithm that resizes (or retargets) videos to fit the form factor of a given device without cropping, stretching or letterboxing. Our approach uses all of the screen’s precious pixels, while striving to deliver as much video-content of the original as possible. The result is a video that adapts to your needs, so you don’t have to adapt to the video.


Six frames from the result of our retargeting algorithm applied to a sub-clip of “Apologize”, © 2006 One Republic. Original frame is shown on the left, our resized result on the right. The original content is fit to a new aspect ratio.

The key insight is that we can separate the video into salient and non-salient content, which are then treated differently. Think of salient content as actors, faces, or structured objects, where the viewer anticipates specific, important details to perceive it as being correct and unaltered. We cannot change this content beyond uniform scaling without it being noticeable. On the other hand, non-salient content, such as sky, water or a blurry out-of-focus background can be squished or stretched without changing the overall appearance or the viewer noticing a dramatic change.

Our technique, which we call discontinuous seam carving -- named so because it modifies the video by adding or removing disconnected seams (or chains) of pixels -- allows greater freedom in the resizing process than previous approaches. By optimizing for the retargeted video to be consistent with the original, we carefully preserve the shape and motion of the salient content while being less restrictive with non-salient content. The key innovations of our research include: (a) a solution that maintains temporal continuity of the video in addition to preserving its spatial structure, (b) space-time smoothing for automatic as well as interactive (user-guided) salient content selection, and (c) sequential frame-by-frame processing conducive for arbitrary length and streaming video. The outcome is a scalable system capable of retargeting videos featuring complex motions of actors and cameras, highly dynamic content and camera shake. For more details, please refer to our paper or visit the project web-site.

Friday, September 10, 2010

Google Search by Voice: A Case Study



Wind the clock back two years with your smart phone in hand. Try to recall doing a search for a restaurant or the latest scores of your favorite sports team. If you’re like me you probably won’t even bother, or you’ll suffer with tiny keys or fat fingers on a touch screen. With Google Search by Voice all that has changed. Now you just tap the microphone, speak, and within seconds you see the result. No more fat fingers.

Google Search by Voice is a result of many years of investment in speech at Google. We started by building our own recognizer (aka GReco ) from the ground up. Our first foray in search by voice was doing local searches with GOOG-411. Then, in November 2008, we launched Google Search by Voice. Now you can search the entire Web using your voice.

What makes search by voice really interesting is that it requires much more than a just good speech recognizer. You also need a good user interface and a good phone like an Android in the hands of millions of people. Besides the excellent computational platform and data availability, the project succeeded due to Google’s culture built around teams that wholeheartedly tackle such challenges with the conviction that they will set a new bar.

In our book chapter, “Google Search by Voice: A Case Study”, we describe the basic technology, the supporting technologies, and the user interface design behind Google Search by Voice. We describe how we built it and what lessons we have learned. As the product required many helping hands to build, this chapter required many helping hands to write. We believe it provides a valuable contribution to the academic community.

The book, Advances in Speech Recognition, is available for purchase from Springer.

Thursday, September 2, 2010

Towards Energy-Proportional Datacenters



This is part of the series highlighting some notable publications by Googlers.

At Google, we operate large datacenters containing clusters of servers, networking switches, and more. While this gear costs a lot of money, an increasingly important cost -- both in terms of dollars and environmental impact -- is the electricity that drives the computing clusters and the cooling infrastructure. Since our clusters often do not run at full utilization, Google recently put forth a call to industry and researchers to develop energy proportional computer systems. With such systems, the power consumed by our clusters would be directly proportional to utilization. Servers consume the most electricity, and therefore researchers have responded to Google’s call by focusing their attention towards servers. As the servers become increasingly energy proportional, however, the “always on” network fabric that connects servers together will consume an increasing fraction of datacenter power unless it too becomes energy proportional.

In a paper recently published at the International Symposium on Computer Architecture (ISCA), we push further towards the goal of energy-proportional computing by focusing on the energy usage of high-bandwidth, highly-scalable cluster networking fabrics. This research considers a broad set of architectural and technological solutions to optimize energy usage without sacrificing performance. First, we show how the Flattened Butterfly network topology uses less power since it uses less switching chips and fewer links than a comparable-performance network built using the more conventional Fat Tree topology. Second, our approach takes advantage of the observation that when network demand is low, we can reduce the speed at which links transmit data. We show via simulation, that by tuning the speeds of the links very rapidly, we can reduce power consumption with little impact on performance. Finally, our research is a further call to action for the academic and industry research communities to make energy efficiency, and energy proportionality in particular, a first-class citizen in networking research. Put together, our proposed techniques can reduce energy cost for typical Google workloads seen in our production datacenters by millions of dollars!