Saturday, July 31, 2010

Google Publications


We often get asked if Google scientists and engineers publish technical papers, and the answer is, “Most certainly, yes.” Indeed, we have a formidable research capability, and we encourage publications as well as other forms of technical dissemination--including our contributions to open source and standards and the introduction of new APIs and tools, which have proven to sometimes be foundational.

Needless to say, with our great commitment to technical excellence in computer science and related disciplines, we find it natural and rewarding to contribute to the scientific community and to ongoing technical debates. And we know that it is important for Google to help create the fundamental building blocks upon which continuing advances can occur.

To be specific, Googlers publish hundreds of technical papers that appear in journals, books, and conference and workshop proceedings every year. These deal with specific applications and engineering questions, algorithmic and data structure problems, and important theoretical problems in computer science, mathematics, and other areas, that can guide our algorithmic choices. While the publications are interesting in their own right, they also offer a glance at some of the key problems we face when dealing with very large data sets and demonstrate other questions that arise in our engineering design at Google.

We’d like to highlight a few of the more noteworthy papers from the first trimester of this year. The papers reflect the breadth and depth of the problems on which we work. We find that virtually all aspects of computer science, from systems and programming languages, to algorithms and theory, to security, data mining, and machine learning are relevant to our research landscape. A more complete list of our publications can be found here.

In the coming weeks we will be offering a more in-depth look at these publications, but here are some summaries:

Speech Recognition

"Google Search by Voice: A Case Study," by Johan Schalkwyk, Doug Beeferman, Francoise Beaufays, Bill Byrne, Ciprian Chelba, Mike Cohen, Maryam Garrett, Brian Strope, to appear in Advances in Speech Recognition: Mobile Environments, Call Centers, and Clinics, Amy Neustein (Ed.), Springer-Verlag 2010.

Google Search by Voice is a result of many years of investment in speech at Google. In our book chapter, “Google Search by Voice: A Case Study,” we describe the basic technology, the supporting technologies, and the user interface design behind Google Search by Voice. We describe how we built it and what lessons we have learned. Google search by voice is growing rapidly and being built in many languages. Along the way we constantly encounter new research problems providing the perfect atmosphere for doing research on real world problems.

Computer Architecture & Networks & Distributed Systems

"Energy-proportional Datacenter Networks," by Dennis Abts, Mike Marty, Philip Wells, Peter Klausler, Hong Liu, International Symposium on Computer Architecture, ISCA, June 2010.

Google researchers have called on industry and academia to develop energy-proportional computing systems, where the energy consumed is directly proportional to the utilization of the system. In this work, we focus on the energy usage of high-bandwidth, highly scalable cluster networks. Through a combination of an energy-efficient topology and dynamic fine-grained control of link speeds, our proposed techniques show the potential to significantly reduce both electricity and environmental costs.

Economics & Market Algorithms

"Quasi-Proportional Mechanisms: Prior-free Revenue Maximization," by Vahab S. Mirrokni, S. Muthukrishnan, Uri Nadav, Latin American Theoretical Informatics Symposium, LATIN, April 2010.

Say a seller wishes to sell an item, but the buyers value it vastly differently. What is a suitable auction to sell the item, in terms of efficiency as well as revenue? First and second price auctions will be efficient but will only extract the lower value in equilibrium; if one knows the distributions from which values are drawn, then setting a reserve price will get optimal revenue but will not be efficient. This paper views this problem as prior-free auction and proposes a quasi-proportional allocation in which the probability that an item is allocated to a bidder depends (quasi-proportionally) on their bids. The paper also proves existence of an equilibrium for quasi-proportional auctions and shows how to compute them efficiently. Finally, the paper shows that these auctions have high efficiency and revenue.

"Auctions with Intermediaries," Jon Feldman, Vahab Mirrokni, S. Muthukrishnan, Mallesh Pai, ACM Conference on Electronic Commerce, EC, June 2010.

We study an auction where the bidders are middlemen, looking in turn to auction off the item if they win it. This setting arises naturally in online advertisement exchange systems, where the participants in the exchange are ad networks looking to sell ad impressions to their own advertisers. We present optimal strategies for both the bidders and the auctioneer in this setting. In particular, we show that the optimal strategy for bidders is to choose a randomized reserve price, and the optimal reserve price of the centeral auctioneer may depend on the number of bidders (unlike the case when there are no middlemen).

Computer Vision

"Discontinuous Seam-Carving for Video Retargeting," Matthias Grundmann, Vivek Kwatra, Mei Han, Irfan Essa, Computer Vision and Pattern Recognition, CVPR, June 2010.

Playing a video on devices with different form factors requires resizing (or retargeting) the video to fit the resolution of the given device. We have developed a content-aware technique for video retargeting based on discontinuous seam-carving, which unlike standard methods like uniform scaling and cropping, strives to retain salient content (such as actors, faces and structured objects) while discarding relatively unimportant pixels (such as the sky or a blurry background). The key innovations of our research include: (a) a solution that maintains temporal continuity of the video in addition to preserving its spatial structure, (b) space-time smoothing for automatic as well as interactive (user-guided) salient content selection, and (c) sequential frame-by-frame processing conducive for arbitrary length and streaming video.

Machine Learning

"Random classification noise defeats all convex potential boosters," Philip M. Long, Rocco A. Servedio, Machine Learning, vol. 78 (2010), pp. 287-304.

A popular approach that has been used to tackle many machine learning problems recently is to formulate them as optimization problems in which the goal is to minimize some “convex loss function.” This is an appealing formulation because these optimization problems can be solved in much the same way that a marble rolls to the bottom of a bowl. However, it turns out that there are drawbacks to this formulation. In "Random Classification Noise Defeats All Convex Potential Boosters," we show that any learning algorithm that works in this way can fail badly if there are noisy examples in the training data. This research motivates further study of other approaches to machine learning, for which there are algorithms that are provably more robust in the presence of noise.

IR

"Clustering Query Refinements by User Intent," Eldar Sadikov, Jayant Madhavan, Lu Wang, Alon Halevy, Proceedings of the International World Wide Web Conference, WWW, April 2010.

When users pose a search query, they usually have an underlying intent or information need, and the sequence of queries he or she poses in single search sessions is usually determined by the user's underlying intent. Our research demonstrates that there typically are only a small number of prominent underlying intents for a given user query. Further, these intents can be identified very accurately by an analysis of anonymized search query logs. Our results show that underlying intents almost always correspond to well-understood high-level concepts.

HCI

"How does search behavior change as search becomes more difficult?", Anne Aula, Rehan Khan, Zhiwei Guan, Proceedings of the ACM Conference on Human Factors in Computing Systems, CHI , April 2010.

Seeing that someone is getting frustrated with a difficult search task is easy for another person--just look for the frowns, and listen for the sighs. But could a computer tell that you're getting frustrated from just the limited behavior a search engine can observe? Our study suggests that it can: when getting frustrated, our data shows that users start to formulate question queries, they start to use advanced operators, and they spend a larger proportion of the time on the search results page. Used together, these signals can be used to build a model that can potentially detect user frustration.

Friday, July 30, 2010

Google North American Faculty Summit - Day 1



Thursday, July 29 was the first day of the Google North American Faculty Summit, our sixth annual event bringing together Google engineers and subject matter experts with leading computer science faculty, mostly from North America but some from as far away as Japan and China. This year’s summit is focused on three topics: cloud computing, security and privacy, and social networking. It was these first two areas that we discussed yesterday, in a series of talks by Googlers, informal meetings and small round-table discussions.

After an introduction from Alfred Spector, Google’s VP of Research and Special Initiatives, we dove right into the technical talks, covering the “arms race” of malware detection, privacy and public policy, passwords and authentication, and operations and infrastructure security at large scale. I gave a talk on the changes that cloud computing brings to security, both challenges such as privacy and authentication, as well as opportunities for security improvements, which I wanted to summarize briefly below.

Cloud services have defined a new model for end-user cloud applications that are accessed via single-user devices or browsers. Unlike software on personal computers, or on time-shared servers, cloud applications execute logically on stateless clients accessing a substrate of redundant back-end servers. While a single client may execute multiple applications, those applications are typically isolated and communicate only via the cloud, thus eliminating local dependencies and simplifying device management. As well as being isolated and stateless, clients are also provisioned with software upon use, which makes any client pretty much the same as any other and facilitates transparent access from different locations and devices.

There are many clear security benefits that accrue from this cloud application software model. To start with, it eliminates much of the complex, error-prone management traditionally required for each client. Also, because clients and servers are replicated or stateless, security policies can be enforced using simple, conservative fail-stop mechanisms. Cloud applications are also highly dynamic, with new software versions easily deployed through client restart or rolling server upgrades. Not only does this greatly simplify deploying fixes to software vulnerabilities, it also allows for the possibility of deploying specialized software versions, with custom security aspects, to different clients and servers. Such software instrumentation could be used for many diverse security purposes, especially when combined with randomization: these include artificially-induced heterogeneity as well as the large-scale construction and enforcement of models for appropriate software behavior. In short, cloud applications help with basic, but hard-to-answer security questions such as: Am I running the right software? Or, is it known to be bad? Is it behaving maliciously, and can I recover if it is?

Following my talk, faculty attendees had a variety of insightful questions—as they did for all the presenters today. Roy Campbell, from University of Illinois at Urbana-Champaign, raised the issue of zero-day attacks, and how they might be handled and prevented. My response was that while it might be impossible to eliminate all security bugs, it is possible to get strong guarantees and higher assurance about fundamental software aspects. As an example, I mentioned the Native Client open source Google project that establishes strong, verifiable guarantees about the safety of low-level software. Another question raised was whether Multics-like protection rings were relevant to today's cloud computing applications. Although the mechanisms may not be the same as in Multics, my reply was that layered security and defense in depth are more important than ever, since cloud computing by necessity makes use of deep software stacks that extend from the client through multiple, nested back-end services.

On Friday’s agenda: the technical possibilities of the social web. We’ll be back with more highlights from the summit soon—stay tuned.

Wednesday, July 28, 2010

And the award goes to...



Google's very own Tushar Chandra along with his coauthors, Vassos Hadzilacos, and Sam Toueg, received the prestigious Dijkstra Prize in Distributed Computing at the ACM Symposium on Principles of Distributed Computing conference in Zürich. This award is given for outstanding papers that have had great impact on the theory and practice of distributed computing for over a decade.

Their papers introduced and precisely characterized the notion of unreliable failure detection in a distributed system:

Tushar currently works on large-scale machine learning and distributed systems at Google.

You can find more information about the award and the papers here.

Congratulations to Tushar, Vassos, and Sam!

Googlers receive multiple awards at the 2010 International Conference on Machine Learning



Googlers were recognized in three of the four paper awards at ICML 2010:
I feel a particular connection to this last paper as Rob and Yoram were members of technical staff and Erin a student intern at the department I headed at AT&T Labs when this work was done.

Congratulations to all!

Thursday, July 22, 2010

Announcing our Q2 Research Awards



We’re excited to announce the latest round of Google Research Awards, our program which identifies and supports full-time faculty pursuing research in areas of mutual interest. From a record number of submissions, we are funding 75 awards across 18 different areas—a total of more than $4 million.

The areas that received the highest level of funding for this round were systems and infrastructure, human computer interaction, multimedia and security. We also continue to develop more collaborations internationally. In this round, 26 percent of the funding was awarded to universities outside the U.S.

Here are some examples from this round of awards:
  • Jeremy Cooperstock, McGill University. A Spatialized Audio Map System for Mobile Blind Users (Geo/maps): A mobile audio system that provides location-based information, primarily for use by the blind and visually impaired communities.
  • Alexander Pretschner, Karlsruhe Institute of Technology, Germany. Towards Operational Privacy (Security and privacy): Provide a framework for precise semantic definitions in policies for domain-specific applications to give users a way to define the exact behaviour they expect from a system in application-specific contexts.
  • Erik Brynjolfsson, Massachusetts Institute of Technology. The Future of Prediction - How Google Searches Foreshadow Housing Prices and Quantities (Economics and market algortihms): How data from search engines like Google provide a highly accurate but simple way to predict future business activities.
  • Stephen Pulman, Oxford University Computing Laboratory. Automatic Generation of Natural Language Descriptions of Visual Scenes (Natural language processing): Develop a system that automatically generates a description of a visual scene.
  • Jennifer Rexford, Princeton. Rethinking Wide-Area Traffic Management (Software and hardware systems infrastructure): Drawing on mature techniques from optimization theory, design new traffic-management solutions where the hosts, routers, and management system cooperate in a more effective way.
  • John Quinn, Makerere University, Uganda. Mobile Crop Surveillance in the Developing World (Multimedia search and audio/video processing): A computer vision system using camera-enabled mobile devices to monitor the spread of viral disease among staple crops.
  • Allison Druin, University of Maryland. Understanding how Children Change as Searchers (Human-computer interaction): Do children change as searchers as they age? How do searchers typically shift between roles over time? If children change, how many of them become Power Searchers? If children don’t change, what roles do they typically demonstrate?
  • Ronojoy Adhikari, The Institute of Mathematical Sciences, India. Machine Learning of Syntax in Undeciphered Scripts (Machine learning): Devise algorithms that would learn to search for evidence of semantics in datasets such as the Indus script.

You can find the full list of this round’s award recipients here (pdf). More information on our research award program can be found on our website.

Thursday, July 15, 2010

Google PhD Fellowships go international



(Cross-posted from the Official Google Blog)

We introduced the Google Fellowship program last year in the United States to broaden our support of university research. The students who were awarded the 2009 fellowships were a truly impressive group, many having high profile internships this past summer and even a few with faculty appointments in the upcoming year.

Universities continue to be the source of some of the most innovative research in computer science, and in particular it’s the students that they foster who are the future of our field. This year, we’re going global and extending the fellowship program to Europe, Israel, China and Canada. We’re very happy to be continuing our support of excellence in graduate studies and offer our sincere congratulations to the following PhD students for receiving Google Fellowships in 2010:

Google European Doctoral Fellowships
  • Roland Angst, Google Europe Fellowship in Computer Vision (Swiss Federal Institute of Technology Zurich, Switzerland)
  • Arnar Birgisson, Google Europe Fellowship in Computer Security (Chalmers University of Technology, Sweden)
  • Omar Choudary, Google Europe Fellowship in Mobile Security (University of Cambridge, U.K.)
  • Michele Coscia, Google Europe Fellowship in Social Computing (University of Pisa, Italy)
  • Moran Feldman, Google Europe Fellowship in Market Algorithms (Technion - Israel Institute of Technology, Israel)
  • Neil Houlsby, Google Europe Fellowship in Statistical Machine Learning (University of Cambridge, U.K.)
  • Kasper Dalgaard Larsen, Google Europe Fellowship in Search and Information Retrieval (Aarhus University, Denmark)
  • Florian Laws, Google Europe Fellowship in Natural Language Processing (University of Stuttgart, Germany)
  • Cynthia Liem, Google Europe Fellowship in Multimedia (Delft University of Technology, Netherlands)
  • Ofer Meshi, Google Europe Fellowship in Machine Learning (The Hebrew University of Jerusalem, Israel)
  • Dora Spenza, Google Europe Fellowship in Wireless Networking (Sapienza University of Rome, Italy)
  • Carola Winzen, Google Europe Fellowship in Randomized Algorithms (Saarland University / Max Planck Institute for Computer Science, Germany)
  • Marek Zawirski, Google Europe Fellowship in Distributed Computing (University Pierre and Marie Curie / INRIA, France)
  • Lukas Zich, Google Europe Fellowship in Video Analysis (Czech Technical University, Czech Republic)
Google China PhD Fellowships
  • Fangtao Li, Google China Fellowship in Natural Language Processing (Tsinghua University)
  • Ming-Ming Cheng, Google China Fellowship in Computer Vision (Tsinghua University)
Google United States/Canada PhD Fellowships
  • Chong Wang, Google U.S./Canada Fellowship in Machine Learning (Princeton University)
  • Tyler McCormick, Google U.S./Canada Fellowship in Statistics (Columbia University)
  • Ashok Anand, Google U.S./Canada Fellowship in Computer Networking (University of Wisconsin)
  • Ramesh Chandra, Google U.S./Canada Fellowship in Web Application Security (Massachusetts Institute of Technology)
  • Adam Pauls, Google U.S./Canada Fellowship in Machine Translation (University of California, Berkeley)
  • Nguyen Dinh Tran, Google U.S./Canada Fellowship in Distributed Systems (New York University)
  • Moira Burke, Google U.S./Canada Fellowship in Human Computer Interaction (Carnegie Mellon University)
  • Ankur Taly, Google U.S./Canada Fellowship in Language Security (Stanford University)
  • Ilya Sutskever, Google U.S./Canada Fellowship in Neural Networks (University of Toronto)
  • Keenan Crane, Google U.S./Canada Fellowship in Computer Graphics (California Institute of Technology)
  • Boris Babenko, Google U.S./Canada Fellowship in Computer Vision (University of California, San Diego)
  • Jason Mars, Google U.S./Canada Fellowship in Compiler Technology (University of Virginia)
  • Joseph Reisinger, Google U.S./Canada Fellowship in Natural Language Processing (University of Texas, Austin)
  • Maryam Karimzadehgan, Google U.S./Canada Fellowship in Search and Information Retrieval (University of Illinois, Urbana-Champaign)
  • Carolina Parada, Google U.S./Canada Fellowship in Speech (Johns Hopkins University)
The students will receive fellowships consisting of full coverage of tuition, fees and stipend for up to three years. These students have been exemplary thus far in their careers, and we’re looking forward to seeing them build upon their already impressive accomplishments. Congratulations to all of you!

Wednesday, July 14, 2010

Our commitment to the digital humanities


(Cross-posted from the Official Google Blog)

It can’t have been very long after people started writing that they started to organize and comment on what was written. Look at the 10th century Venetus A manuscript, which contains scholia written fifteen centuries earlier about texts written five centuries before that. Almost since computers were invented, people have envisioned using them to expose the interconnections of the world’s knowledge. That vision is finally becoming real with the flowering of the web, but in a notably limited way: very little of the world’s culture predating the web is accessible online. Much of that information is available only in printed books.

A wide range of digitization efforts have been pursued with increasing success over the past decade. We’re proud of our own Google Books digitization effort, having scanned over 12 million books in more than 400 languages, comprising over five billion pages and two trillion words. But digitization is just the starting point: it will take a vast amount of work by scholars and computer scientists to analyze these digitized texts. In particular, humanities scholars are starting to apply quantitative research techniques for answering questions that require examining thousands or millions of books. This style of research complements the methods of many contemporary humanities scholars, who have individually achieved great insights through in-depth reading and painstaking analysis of dozens or hundreds of texts. We believe both approaches have merit, and that each is good for answering different types of questions.

Here are a few examples of inquiries that benefit from a computational approach. Shouldn’t we be able to characterize Victorian society by quantifying shifts in vocabulary—not just of a few leading writers, but of every book written during the era? Shouldn’t it be easy to locate electronic copies of the English and Latin editions of Hobbes’ Leviathan, compare them and annotate the differences? Shouldn’t a Spanish reader be able to locate every Spanish translation of “The Iliad”? Shouldn’t there be an electronic dictionary and grammar for the Yao language?

We think so. Funding agencies have been supporting this field of research, known as the digital humanities, for years. In particular, the National Endowment for the Humanities has taken a leadership role, having established an Office of Digital Humanities in 2007. NEH chairman Jim Leach says: "In the modern world, access to knowledge is becoming as central to advancing equal opportunity as access to the ballot box has proven to be the key to advancing political rights. Few revolutions in human history can match the democratizing consequences of the development of the web and the accompanying advancement of digital technologies to tap this accumulation of human knowledge."

Likewise, we’d like to see the field blossom and take advantage of resources such as Google Books that are becoming increasingly available. We’re pleased to announce that Google has committed nearly a million dollars to support digital humanities research over the next two years.

Google’s Digital Humanities Research Awards will support 12 university research groups with unrestricted grants for one year, with the possibility of renewal for an additional year. The recipients will receive some access to Google tools, technologies and expertise. Over the next year, we’ll provide selected subsets of the Google Books corpus—scans, text and derived data such as word histograms—to both the researchers and the rest of the world as laws permit. (Our collection of ancient Greek and Latin books is a taste of corpora to come.)

We've given awards to 12 projects led by 23 researchers at 15 universities:
  • Steven Abney and Terry Szymanski, University of Michigan. Automatic Identification and Extraction of Structured Linguistic Passages in Texts.
  • Elton Barker, The Open University, Eric C. Kansa, University of California-Berkeley, Leif Isaksen, University of Southampton, United Kingdom. Google Ancient Places (GAP): Discovering historic geographical entities in the Google Books corpus.
  • Dan Cohen and Fred Gibbs, George Mason University. Reframing the Victorians.
  • Gregory R. Crane, Tufts University. Classics in Google Books.
  • Miles Efron, Graduate School of Library and Information Science, University of Illinois. Meeting the Challenge of Language Change in Text Retrieval with Machine Translation Techniques.
  • Brian Geiger, University of California-Riverside, Benjamin Pauley, Eastern Connecticut State University. Early Modern Books Metadata in Google Books.
  • David Mimno and David Blei, Princeton University. The Open Encyclopedia of Classical Sites.
  • Alfonso Moreno, Magdalen College, University of Oxford. Bibliotheca Academica Translationum: link to Google Books.
  • Todd Presner, David Shepard, Chris Johanson, James Lee, University of California-Los Angeles. Hypercities Geo-Scribe.
  • Amelia del Rosario Sanz-Cabrerizo and José Luis Sierra-Rodríguez, Universidad Complutense de Madrid. Collaborative Annotation of Digitalized Literary Texts.
  • Andrew Stauffer, University of Virginia. JUXTA Collation Tool for the Web.
  • Timothy R. Tangherlini, University of California-Los Angeles, Peter Leonard, University of Washington. Northern Insights: Tools & Techniques for Automated Literary Analysis, Based on the Scandinavian Corpus in Google Books.
We have selected these proposals in part because the resulting techniques, tools and data will be broadly useful: they’ll help entire communities of scholars, not just the applicants. We look forward to working with them, and hope that over time the field of digital humanities will fulfill its promise of transforming the ways in which we understand human culture.

Thursday, July 1, 2010

Google launches Korean Voice Search



On June 16th, we launched our Korean voice search system. Google Search by Voice has been available in various flavors of English since 2008, in Mandarin and Japanese since 2009, and in French, Italian, German and Spanish just a few weeks ago (some more details in a recent blog post).

Korean speech recognition has received less attention than English, which has been studied extensively around the world by teams in both English and non-English speaking countries. Fundamentally, our methodology for developing a Korean speech recognition system is similar to the process we have used for other languages. We created a set of statistical models: an acoustic model for the basic sounds of the language, a language model for the words and phrases of the language, and a dictionary mapping the words to their pronunciations. We trained our acoustic model using a large quantity of recorded and transcribed Korean speech. The language model was trained using anonymized Korean web search queries. Once these models were trained, given an audio input, we can compute and display the most likely spoken phrase, along with its search result.

There were several challenges in developing a Korean speech recognition system, some unique to Korean, some typical of Asian languages and some universal to all languages. Here are some examples of problems that stood out:

  • Developing a Korean dictionary: Unlike English, where there are many publicly-available dictionaries for mapping words to their pronunciations, there are very few available for Korean. Since our Korean recognizer knows several hundred thousand words, we needed to create these mappings ourselves. Luckily, Korean has one of the most elegant and simple writing systems in the world (created in the 15th century!) and this makes mapping Korean words to pronunciations relatively straightforward. However, we found that Koreans also use quite a few English words in their queries, which complicates the mapping process. To predict these pronunciations, we built a statistical model using data from an existing (smaller) Korean dictionary.
  • Korean word boundaries: Although Korean orthography uses spaces to indicate word boundaries (unlike Japanese or Mandarin), we found that people use word boundaries inconsistently for search queries. To limit the size of the vocabulary generated from the search queries, we used statistical techniques to cut rare long words into smaller sub-words (similarly to the system we developed for Japanese).
  • Pronunciation exceptions: Korean (like all other languages) has many exceptions for pronunciations that are not immediately obvious. For example, numbers are often written as digit sequences but not necessarily spoken this way (2010 = 이천십). The same is true for many common alphanumeric sequences like “mp3”, “kbs2” or mixed queries like “삼성 tv”, which often contain spelled letters and possibly English spoken digits as opposed to Korean ones.
  • Encoding issues: Korean script (Hangul) is written in syllabic blocks, with each block containing at least two of the 24 modern Hangul letters (Jamo), at least one consonant and one vowel. Including the normal ASCII characters this brings the total number of possible basic characters to over 10000, not including Hanja (used mostly in the formal spelling of names). So, despite its simple writing system, Korean still presents the same challenge of handling a large alphabet that is typical of Asian languages.
  • Script ambiguity: We found that some users like to use English native words and others the Korean transliteration (example: “ncis season 6” vs. “ncis 시즌6”). This makes it challenging to train and evaluate the system. We use a metric that estimates whether our transcription will give the correct web page result on the user’s smart phone screen, and such script variations make this tricky.
  • Recognizing rare words: The recognizer is good at recognizing things users often type into the search engine, such as cities, shops, addresses, common abbreviations, common product model numbers and well-known names like “김연아”. However, rare words (like many personal names) are often harder for us to recognize. We continue to work on improving those.
  • Every speaker sounds different: People speak in different styles, slow or fast, with an accent or without, have lower or higher pitched voices, etc. To make our system work for all these different conditions, we trained our system using data from many different sources to capture as many conditions as possible.

When speech recognizers make errors, the reason is usually that the models are not good enough, and that often means they haven’t been trained on enough data. For Korean (and all other languages) our cloud computing infrastructure allows us to retrain our models frequently and using an ever growing amount of data to continually improve performance. Over time, we are committed to improve the system regularly to make speech a user-friendly input method on mobile devices.