My Private Collections: 2006

Tuesday, December 12, 2006

Google Research Picks for Videos of the Year

Posted by Peter Norvig

Everyone else is giving you year-end top ten lists of their favorite movies, so we thought we'd give you ours, but we're skipping Cars and The Da Vinci Code and giving you autonomous cars and open source code. Our top twenty (we couldn't stop at ten):

Winning the DARPA Grand Challenge: Sebastian Thrun stars in the heartwarming drama of a little car that could.
The Graphing Calculator Story: A thriller starring Ron Avitzur as the engineer who snuck into the Apple campus to write code.
Should Google Go Nuclear?: Robert Bussard (former Asst. Director of the AEC) talks about inertial electrostatic fusion.
A New Way to Look at Networking: Van Jacobson as the old pro discovering that the old problems have not gone away.
Python 3000: Guido van Rossum always looks on the bright side of life in this epic look at the future of Python.
How to Survive a Robot Uprising: Daniel Wilson stars in this sci-fi horror story.
The New "Bill of Rights of Information Society": Raj Reddy talks about how to get the right information to the right people at the right time.
Practical Common Lisp: In this foreign film, Peter Seibel introduces the audience to a new language. Subtitles in parentheses.
Debugging Backwards in Time: Starring Bil Lewis in this sequel to Back to the Future.
Building Large Systems at Google: Narayanan Shivakumar takes us behind the scenes to see how Google builds large distributed systems. Like Charlie and the Chocolate Factory but without the Oompa-Loompas.
The Science and Art of User Experience at Google: Jen Fitzpatrick continues the behind-the-scenes look.
Universally Accessible Demands Accessibility for All of Humanity: McArthur "Genius Award" Fellow Jim Fruchterman talks about accessibility for the blind and others.
DNA and the Brain: Nobel Laureate James Watson explains how the key to understanding the brain is in our genes.
Steve Wozniak: This one-man show is playing to boffo reviews.
Jane Goodall: The celebrated primatologist discusses her mission to empower individuals to improve the environment.
Computers Versus Common Sense: Doug Lenat reprises his role as the teacher trying to get computers to understand.
The Google Story: David Vise talks about his book on Google.
The Search: John Battelle talks about his book on Google.
The Archimedes Palimpsest: Like Da Vinci Code, only true.
The Paradox of Choice - Why More is Less: With Barry Schwartz. Hmm, maybe I should have made this a top three list?

Wednesday, November 29, 2006

CSCW 2006: Collaborative editing 20 years later

Posted by Lilly Irani & Jens Riegelsberger, User Experience team

9am Mountain View, California. 6pm Zurich, Switzerland. The two of us sit separated by thousands miles, telephones tucked under our ears, talking about this blog post and typing words and edits into Google Docs. As we talk about the title, we start typing into the same paragraph -- and Lilly gets a warning: "You've edited a paragraph that Jens has been editing!" Lilly stops typing so she doesn't lose her thoughts and coordinates with Jens over the phone. Then we realize "We just talked about this problem at the conference we're writing about!"

Two weeks ago four Googlers ventured north to attend ACM CSCW in Banff, Alberta, Canada. CSCW is ACM's conference on Computer Supported Cooperative Work and brings together computer scientists, social scientists, and designers interested in how people live their lives -- at work, at play, and in between -- with and around technology, with a focus on undestanding the design of technological systems. Topics like issues and implementation of collaborative editing are staples at CSCW.

As this year was the conference's 20th anniversary, we had a chance to hear from many of the founders of CSCW: Irene Greif, Jonathan Grudin, Tom Malone, Judy Olson, Lucy Suchman, among others. Not surprisingly, the mood was introspective, with many speakers tracing the impact of the community over time and looking critically and constructively at the future paths the research community might take. Many sessions focused on less traditional areas of research, such as how Facebook figures into college students' school transitions and how tagging vocabularies evolve and are shaped by technology in a movie community. Jens also gave a talk on his pre-Google research on how photos and voice profiles affect people's choice of gaming partners. And he participated in a workshop exploring how people trust -- and learn to trust -- in online environments.

Apart from actively taking part in the debates and Q&As, we also demo-ed Google's tools for getting things done, collaboratively or solo: Google Docs & Spreadsheets and Google Notebook. These were met with much interest, as these publicly available Google tools build on insights gained in the CSCW field over the last 20 years.

If you're interested in these issues, you'd be a great addition to our team. Learn about available positions in user experience research and design.

Friday, September 22, 2006

And the Awards Go To ...

Posted by Proud Googlers

We're usually a modest bunch, but we we couldn't help but let you know about some honors and awards bestowed on Googlers recently:

Ramakrishnan Srikant is the winner of the 2006 ACM SIGKDD Innovation Award for his work on pruning techniques for the discovery of association rules, and for developing new data mining approaches that respect the privacy of people in the data base.
Henry Rowley and Shumeet Baluja, along with CMU professor Takeo Kanade, received the Longuet-Higgins prize for "a contribution which has stood the test of time," namely their 1996 paper Neural Network based face detection. The award was given at the 2006 Computer Vision and Pattern Recognition (CVPR) Conference.
Team Smartass, consisting of Christopher Hendrie, Derek Kisman, Ambrose Feinstein and Daniel Wright won first place in the ICFP (International Conference on Functional Programming) programming contest, using a combination of C++, Haskell and 2D. Third place went to Can't Spell Awesome without ASM, a team consisting of Google engineer Jon Dethridge, former Google interns Ralph Furmaniak and Tomasz Czajka, and Reid Barton of Harvard. They got the judges at the functional programming conference to admit "Assembler is not too shabby."
Peter Norvig was named a Berkeley Distinguished Alumni in Computer Science, and gave the keynote commencement address. We'd also like to congratulate Prabhakar Raghavan, Head of Yahoo Research, who was a co-recipient of this award.
Simon Quellen Field's book Return of Gonzo Gizmos was a selection of the Scientific American Book Club.
Google summer intern Rion Snow (along with Stanford professors Dan Jurafsky and Andrew Ng) got the best paper award at the 2006 ACL/COLING (computational linguistics) conference for his paper titled Semantic taxonomy induction from heterogenous evidence.
Google summer intern Lev Reyzin won the outstanding student paper award at ICML (International Conference on Machine Learning) for work with Rob Schapire of Princeton on How Boosting the Margin Can Also Boost Classifier Complexity.
As we mentioned earlier, Michael Fink, Michele Covell and Shumeet Baluja won a best paper award for Social- and Interactive-Television Applications Based on Real-Time Ambient-Audio Identification.
Update 13 Oct 2006: Paul Rademacher has been named one of the top innovators under 35 by MIT's Technology Review. He was cited
for his mashup of Google Maps and Craig's List housing data at housingmaps.com.
Update 31 Oct 2006: We forgot Alon Halevy, who won the VLDB 10 Year Best Paper Award for Querying Heterogeneous Information Sources Using Source Descriptions with Anand Rajaraman and Joann J. Ordille.

Friday, August 4, 2006

All Our N-gram are Belong to You

Posted by Alex Franz and Thorsten Brants, Google Machine Translation Team

Here at Google Research we have been using word n-gram models for a variety of R&D projects, such as statistical machine translation, speech recognition, spelling correction, entity detection, information extraction, and others. While such models have usually been estimated from training corpora containing at most a few billion words, we have been harnessing the vast power of Google's datacenters and distributed processing infrastructure to process larger and larger training corpora. We found that there's no data like more data, and scaled up the size of our data by one order of magnitude, and then another, and then one more - resulting in a training corpus of one trillion words from public Web pages.

We believe that the entire research community can benefit from access to such massive amounts of data. It will advance the state of the art, it will focus research in the promising direction of large-scale, data-driven approaches, and it will allow all research groups, no matter how large or small their computing resources, to play together. That's why we decided to share this enormous dataset with everyone. We processed 1,024,908,267,229 words of running text and are publishing the counts for all 1,176,470,663 five-word sequences that appear at least 40 times. There are 13,588,391 unique words, after discarding words that appear less than 200 times.

Watch for an announcement at the Linguistics Data Consortium (LDC), who will be distributing it soon, and then order your set of 6 DVDs. And let us hear from you - we're excited to hear what you will do with the data, and we're always interested in feedback about this dataset, or other potential datasets that might be useful for the research community.

Update (22 Sept. 2006): The LDC now has the data available in their catalog. The counts are as follows:


File sizes: approx. 24 GB compressed (gzip'ed) text files

Number of tokens:    1,024,908,267,229
Number of sentences:    95,119,665,584
Number of unigrams:         13,588,391
Number of bigrams:         314,843,401
Number of trigrams:        977,069,902
Number of fourgrams:     1,313,818,354
Number of fivegrams:     1,176,470,663

The following is an example of the 3-gram data contained this corpus:


ceramics collectables collectibles 55
ceramics collectables fine 130
ceramics collected by 52
ceramics collectible pottery 50
ceramics collectibles cooking 45
ceramics collection , 144
ceramics collection . 247
ceramics collection </S> 120
ceramics collection and 43
ceramics collection at 52
ceramics collection is 68
ceramics collection of 76
ceramics collection | 59
ceramics collections , 66
ceramics collections . 60
ceramics combined with 46
ceramics come from 69
ceramics comes from 660
ceramics community , 109
ceramics community . 212
ceramics community for 61
ceramics companies . 53
ceramics companies consultants 173
ceramics company ! 4432
ceramics company , 133
ceramics company . 92
ceramics company </S> 41
ceramics company facing 145
ceramics company in 181
ceramics company started 137
ceramics company that 87
ceramics component ( 76
ceramics composed of 85
ceramics composites ferrites 56
ceramics composition as 41
ceramics computer graphics 51
ceramics computer imaging 52
ceramics consist of 92

The following is an example of the 4-gram data in this corpus:


serve as the incoming 92
serve as the incubator 99
serve as the independent 794
serve as the index 223
serve as the indication 72
serve as the indicator 120
serve as the indicators 45
serve as the indispensable 111
serve as the indispensible 40
serve as the individual 234
serve as the industrial 52
serve as the industry 607
serve as the info 42
serve as the informal 102
serve as the information 838
serve as the informational 41
serve as the infrastructure 500
serve as the initial 5331
serve as the initiating 125
serve as the initiation 63
serve as the initiator 81
serve as the injector 56
serve as the inlet 41
serve as the inner 87
serve as the input 1323
serve as the inputs 189
serve as the insertion 49
serve as the insourced 67
serve as the inspection 43
serve as the inspector 66
serve as the inspiration 1390
serve as the installation 136
serve as the institute 187
serve as the institution 279
serve as the institutional 461
serve as the instructional 173
serve as the instructor 286
serve as the instructors 161
serve as the instrument 614
serve as the instruments 193
serve as the insurance 52
serve as the insurer 82
serve as the intake 70
serve as the integral 68

Thursday, July 13, 2006

Call for attendees - Conference on Test Automation

Posted by Allen Hutchison, Engineering Manager

As we noted earlier, we're hosting our first-ever Conference on Test Automation in London in September.

We've heard from many interested parties, and now have 13 excellent presentations lined up. Now we are soliciting people who want to attend. Because we expect lots of interest and space is limited, we're asking everyone who's interested to write a short note (400 words or less) on why you want to be there. There's an easy form for requesting a spot, and we hope to hear from you. The deadline for writing in is July 28th - and you'll hear back by August 4.

Wednesday, June 7, 2006

Interactive TV: Conference and Best Paper

Posted by Michele Covell & Shumeet Baluja, Research Scientists

Euro ITV (the interactive television conference) took place in Athens last week. The presentations included a diverse collection of user studies, new application areas, and exploratory business models. One of the main themes was the integration of multiple information sources. For example, during a time-out in a live sporting event, some viewers may enjoy reviewing highlight footage, while others may prefer to view a parallel program to view player profiles and statistics before being automatically returned to the soccer match once the event was back underway.

Other papers explored the idea of selecting and recommending videos. When many videos are available, such as through IPTV or digital cable, we see a heavy-tailed distribution of content accesses (much like that on the internet). There are a small number of popular channels but the combined viewings from thousands of "niche" channels outweigh the popular channels. As on the web, the problem that arises from this situation is one of discovery. A TV guide type resource is not practical; methods like collaborative-filtering can help. Nonetheless, new ideas and interfaces are needed.

We also presented our work at the conference. Our paper [pdf] (which received the best paper award :) focused on using broadcast viewing to automatically present relevant information on a web browser. We showed how to sample the ambient sound emitted from a TV and automatically determine what is being watched from a small signature of the sound -- all with complete privacy and minuscule effort. The system could keep up with users while they channel surf, presenting them with a real-time forum about a live political debate one minute and an ad-hoc chat room for a sporting event in the next. And, all of this would be done without users ever having to type or to even know the name of the program or channel being viewed. Taking this further, we could collect snippets from the web describing the actors appearing in a movie or present maps of locales within the movie as it takes place (no matter if users are watching it as a live broadcast or as a recoded broadcast).

Friday, June 2, 2006

Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken

Posted by Joshua Bloch, Software Engineer

I remember vividly Jon Bentley's first Algorithms lecture at CMU, where he asked all of us incoming Ph.D. students to write a binary search, and then dissected one of our implementations in front of the class. Of course it was broken, as were most of our implementations. This made a real impression on me, as did the treatment of this material in his wonderful Programming Pearls (Addison-Wesley, 1986; Second Edition, 2000). The key lesson was to carefully consider the invariants in your programs.

Fast forward to 2006. I was shocked to learn that the binary search program that Bentley proved correct and subsequently tested in Chapter 5 of Programming Pearls contains a bug. Once I tell you what it is, you will understand why it escaped detection for two decades. Lest you think I'm picking on Bentley, let me tell you how I discovered the bug: The version of binary search that I wrote for the JDK contained the same bug. It was reported to Sun recently when it broke someone's program, after lying in wait for nine years or so.

So what's the bug? Here's a standard binary search, in Java. (It's one that I wrote for the java.util.Arrays):

1:     public static int binarySearch(int[] a, int key) {
2:         int low = 0;
3:         int high = a.length - 1;
4:
5:         while (low <= high) {
6:             int mid = (low + high) / 2;
7:             int midVal = a[mid];
8:
9:             if (midVal < key)
10:                 low = mid + 1
11:             else if (midVal > key)
12:                 high = mid - 1;
13:             else
14:                 return mid; // key found
15:         }
16:         return -(low + 1);  // key not found.
17:     }

The bug is in this line:

 6:             int mid =(low + high) / 2;

In Programming Pearls Bentley says that the analogous line "sets m to the average of l and u, truncated down to the nearest integer." On the face of it, this assertion might appear correct, but it fails for large values of the int variables low and high. Specifically, it fails if the sum of low and high is greater than the maximum positive int value (2³¹ - 1). The sum overflows to a negative value, and the value stays negative when divided by two. In C this causes an array index out of bounds with unpredictable results. In Java, it throws ArrayIndexOutOfBoundsException.

This bug can manifest itself for arrays whose length (in elements) is 2³⁰ or greater (roughly a billion elements). This was inconceivable back in the '80s, when Programming Pearls was written, but it is common these days at Google and other places. In Programming Pearls, Bentley says "While the first binary search was published in 1946, the first binary search that works correctly for all values of n did not appear until 1962." The truth is, very few correct versions have ever been published, at least in mainstream programming languages.

So what's the best way to fix the bug? Here's one way:

 6:             int mid = low + ((high - low) / 2);

Probably faster, and arguably as clear is:

 6:             int mid = (low + high) >>> 1;

In C and C++ (where you don't have the >>> operator), you can do this:

 6:             mid = ((unsigned int)low + (unsigned int)high)) >> 1;

And now we know the binary search is bug-free, right? Well, we strongly suspect so, but we don't know. It is not sufficient merely to prove a program correct; you have to test it too. Moreover, to be really certain that a program is correct, you have to test it for all possible input values, but this is seldom feasible. With concurrent programs, it's even worse: You have to test for all internal states, which is, for all practical purposes, impossible.

The binary-search bug applies equally to mergesort, and to other divide-and-conquer algorithms. If you have any code that implements one of these algorithms, fix it now before it blows up. The general lesson that I take away from this bug is humility: It is hard to write even the smallest piece of code correctly, and our whole world runs on big, complex pieces of code.

We programmers need all the help we can get, and we should never assume otherwise. Careful design is great. Testing is great. Formal methods are great. Code reviews are great. Static analysis is great. But none of these things alone are sufficient to eliminate bugs: They will always be with us. A bug can exist for half a century despite our best efforts to exterminate it. We must program carefully, defensively, and remain ever vigilant.

Update 17 Feb 2008: Thanks to Antoine Trux, Principal Member of Engineering Staff at Nokia Research Center Finland for pointing out that the original proposed fix for C and C++ (Line 6), was not guaranteed to work by the relevant C99 standard (INTERNATIONAL STANDARD - ISO/IEC - 9899 - Second edition - 1999-12-01, 3.4.3.3), which says that if you add two signed quantities and get an overflow, the result is undefined. The older C Standard, C89/90, and the C++ Standard are both identical to C99 in this respect. Now that we've made this change, we know that the program is correct;)

Resources

Programming Pearls - Highly recommended. Get a copy today!
The Sun bug report describing this bug in the JDK
A 2003 paper by Salvatore Ruggieri discussing a related problem - The problem is a bit more general but perhaps less interesting: the average of two numbers of arbitrary sign. The paper does not discuss performance, and its solution is not fast enough for use in the inner loop of a mergesort.

Saturday, April 29, 2006

Statistical machine translation live

Posted by Franz Och, Research Scientist

Because we want to provide everyone with access to all the world's information, including information written in every language, one of the exciting projects at Google Research is machine translation. Most state-of-the-art commercial machine translation systems in use today have been developed using a rules-based approach and require a lot of work by linguists to define vocabularies and grammars.

Several research systems, including ours, take a different approach: we feed the computer with billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages. We then apply statistical learning techniques to build a translation model. We have achieved very good results in research evaluations.

Now you can see the results for yourself. We recently launched an online version of our system for Arabic-English and English-Arabic. Try it out! Arabic is a very challenging language to translate to and from: it requires long-distance reordering of words and has a very rich morphology. Our system works better for some types of text (e.g. news) than for others (e.g. novels) -- and you probably should not try to translate poetry ... but do stay tuned for more exciting developments.

Update: We've just opened a discussion forum for all topics related to machine translation.

Update: Fixed broken link to NIST results.

Thursday, April 27, 2006

Our conference on automated testing

Posted by Allen Hutchison, Engineering Manager

Automated testing is one of my passions: it has hard problems to be solved, and they get harder every day. Over the past few years, I've had the opportunity to work on several automation projects, and now I'm getting a chance to combine my passion for automation with my love for the city of London.

I'm happy to announce that Google will be hosting a Conference on Test Automation in our London office on September 7 and 8, 2006. Our goal is to create a collegial atmosphere where participants can discuss challenges facing people on the cutting edge of test automation, and evaluate solutions for meeting those challenges.

Call for Presentations
We're looking for speakers with exciting ideas and new approaches to test automation. If you have a subject you'd like to talk about, please send me email at londontestconf@google.com that includes a description of your 60- or 90-minute talk in 500 words or less. Deadline for submissions is June 1.

We're planning to have 10 people give presentations at the conference followed by adequate time for discussion. If you'd like to attend as a non-speaker, watch this space. Once we've got a slate of speakers, we'll post it along with details on attending.

Sunday, April 23, 2006

See you at CHI

Posted by Rick Boardman, User Experience Researcher

The raison d’etre for our user experience research team is driven by Google's keen interest in focusing on the user. So we help many product teams provide the best possible experience to everyone around the world, primarily by inviting thousands of people to take part in usability tests in our labs, and by analyzing our logs to identify problems which need fixing. From this we get the data we help our engineers make Google products as easy as possible to use for the millions of people out there who think computers are far too complicated. People like my Mum, Dad, girlfriend, Gran — and pretty much everyone I know!

We’re one of several Google teams that publish research at academic and industry conferences, and this week a number of us will be attending the CHI (Computer-Human Interaction) conference in Montreal, the world's premiere gathering for CHI researchers and practitioners. Googlers from several teams will take part in eight sessions, each focusing on different aspects of human-computer interaction. (The full program is here – it’s a PDF file.)

A Large Scale Study of Wireless Search Behavior: Google Mobile Search – In a session on Search and Navigation: Mobiles and Audio, we'll present the first large-scale study of search behavior for mobile users, highlighting some shortcomings of wireless search interfaces.

Scaling the card sort method to over 500 items: Restructuring the Google AdWords Help Center – Here we adapt the popular card-sorting research methodology to large information sets where the traditional approach is impractical and discuss how we've applied this technique.

No IM Please, We’re Testing – During the Usability Evaluations: Challenges and Solutions session we’ll discuss the use of instant messaging tools like Google Talk in usability tests, and the benefits of this technique for enabling live collaboration between test moderators and observers.

Add a Dash of Interface: Taking Mash-Ups to the Next Level – Here we contribute to the discussion of how extendable interfaces like Google Maps are enabling exciting new online innovation through the combining of data sources.

Why Do Tagging Systems Work? – This panel will address the design challenges of scaling tagging systems to meet their recent surge in popularity. Gmail is an example of email tagging that offers more flexibility than traditional hierarchical systems.

Design Communication: How Do You Get Your Point Across? – A key challenge for UI designers is communicating solutions and challenges within product teams. This panel focuses on effective ways to do that.

“It’s About the Information, Stupid!” Why We Need a Separate Field of Human Information Interaction – This interdisciplinary panel will discuss arguments for and against a distinct field focusing on information rather than computing technology. One for the theoreticians? (-;

Incorporating Eyetracking into User Studies at Google – In this Eyetracking in Practice workshop, we’ll talk about some of the challenges we’ve encountered in studies of eyetracking in our labs.

If you work in, or study, the area of human-computer interaction, the user experience team is hiring. Right now we’re looking for user experience researchers (including those with specialized quantitative skills), UI designers, and more.

Thursday, March 23, 2006

First Robots

Posted by Sumit Agarwal, Maryam Kamvar, & Michael Stoppelman

With 4 seconds left to go, the Team Cheesy Poofs robot shouldered its way onto the 3 foot platform, pivoted 90 degrees into scoring position, and rapid-fired 10 balls directly into the 3-point goal. They won the match, and the Google Silicon Valley Regional Championship for US FIRST, a non-profit "For the Inspiration and Recognition of Science and Technology" (FIRST).

Google jumped at the opportunity to sponsor this organization after Dean Kamen (inventor of the Segway and the first implantable dialysis pump) spoke to a packed Google audience about his lifelong crusade to improve education in the United States. Dean founded US FIRST over 15 years ago, and from humble beginnings in the Northeast, FIRST has now grown to involve over 60,000 high school students all over the United States and the world.

FIRST was a natural partner for Google, given their focus on science and technology, their passion for changing the world for the better, and their single-minded focus on making education fun for students. When the final buzzer rang at the recent championship match the students jumped and hugged like they'd won the Superbowl. And in a way, they had. This event has all the excitement, tension, and drama of a major sporting event and then some.

Beyond sponsoring the FIRST tournament, Google also funded half a dozen teams in the Bay Area, ranging from East Palo Alto High School to Notre Dame High School. Several dozen employees also served as team mentors, meeting the students once a week to help construct the competition robots over the frantic six-week design/build cycle. Others volunteered at the Regional event as judges, coordinators, and referees, and plenty of Googlers were on hand to spectate the exciting matches.

We congratulate all the teams at the regional tournament for their hard work and innovation. We wish the six bay-area teams who qualified for the finals in Atlanta the best of luck . Bring home the gold!

Sunday, March 12, 2006

Hiring: The Lake Wobegon Strategy

Posted by Peter Norvig, Director, Google Research

You know the Google story: small start-up of highly-skilled programmers in a garage grows into a large international company. But how do you maintain the skill level while roughly doubling in size each year? We rely on the Lake Wobegon Strategy, which says only hire candidates who are above the mean of your current employees. An alternative strategy (popular in the dot-com boom period) is to justify a hire by saying "this candidate is clearly better than at least one of our current employees." The following graph compares the mean employee skill level of two strategies: hire-above-the-mean (or Lake Wobegon) in blue and hire-above-the-min in red. I ran a simulation of 1000 candidates with skill level sampled uniformly from the 0 to 100th percentile (but evaluated by the interview process with noise of ±15%) starting from a core team of 10 employees with mean 75 and min 65. You can see how hire-above-the-min leads to a precipitous drop in skill level; one we've been able to avoid.

Another hiring strategy we use is no hiring manager. Whenever you give project managers responsibility for hiring for their own projects they'll take the best candidate in the pool, even if that candidate is sub-standard for the company, because every manager wants some help for their project rather than no help. That's why we do all hiring at the company level, not the project level. First we decide which candidates are above the hiring threshold, and then we decide what projects they can best contribute to. The orange line in the graph above is a simulation of the hiring-manager strategy, with the same candidates and the same number of hires as the no-hiring-manager strategy in blue. Employees are grouped into pools of random size from 2 to 14 and the hiring manager chooses the best one. We're pleased that these little simulations show our hiring strategy is on top. You can learn more about our hiring and working philosophy.

Tuesday, March 7, 2006

An experimental study of P2P VoIP

Posted by Neil Daswani & Ravi Jain, Google; and Saikat Guha, Cornell University

VoIP (Voice-over-IP) systems are one of the fastest growing means of communication on the Internet, enabling free or low-cost phone calls. But to date, researchers have had little data to work with to learn how to build VoIP systems better. Some of these systems are proprietary, and obtaining data about their operational characteristics has been particularly challenging. For instance, even though the Skype network has tens of millions of users, it has been hard for researchers to benefit from its commercial success.

Data was collected from a Skype 'supernode' running at Cornell. Skype is a Peer-to-Peer (P2P) system in which clients (for example, a home user's PC) communicate directly to exchange voice packets with other clients (also called peers). However, their communication is facilitated by special peers called supernodes that can allow the peers to connect even if they are behind firewalls or other network elements such as NATs (Network Address Translators). P2P in Skype already connects millions of users behind NATs today. Prior to our research, not much has been known about how Skype users and clients behave, and how supernodes are selected or what kinds of demands they place on the network they reside in.

We learned a couple things from the data. For example, we found that Skype users typically keep their client software open during the workday, as opposed to users of file-sharing P2P systems (such as KaZaa) where users typically join and leave the network with much greater frequency. In further contrast to P2P file-sharing applications, which typically tend to be bandwidth hogs, Skype clients and supernodes use relatively little bandwidth and CPU even when they relay VoIP calls. So this means you can run Skype without having it slow down your Internet connection.

You'll find even more results discussed in the paper. In addition to better P2P systems, researchers can use the data to design a better Internet. Based on what we've learned, perhaps researchers can design a next-generation P2P-friendly Internet that is commercially viable.

Sunday, March 5, 2006

Teamwork for problem-solving

Posted by Corinna Cortes, Head, Google Research NY

Google Research is about teamwork with outstanding engineers to solve novel and challenging problems that have an impact. But it's also about being at the forefront of scientific innovations. We're an active part of the research community, and we like to interact with researchers and scientists in academia. We're happy to serve as a hub for researchers to come and discuss their latest findings and get exposed to the large-scale problems and challenges that we face. Robert Tarjan, John Lafferty, and Brian Kernighan are among the professors that have spent time here.

We host world-renowned scientists spanning diverse areas including neuroscience, climatology, internet security and e-commerce -- and of course, computer science. In the fall, our Research Seminars attracted such prominent figures as John Hopcroft and Michael Rabin. This spring we're welcoming Christos Papadimitriou and Vladimir Vapnik, to name just a few.

So if you're curious about the latest meteor findings in Antarctica or interested in high-end computing and scientific visualization at NASA, do check out our "tech talks" on Google Video. You don't actually need to work at Google to "attend" the talks -- but if you're interested, we're always looking.

Saturday, February 18, 2006

Making a difference

Posted by Peter Norvig, Director, Google Research

We've been asked what Google Research is like, and we thought the best way to answer is with a blog. First let me say that we're not like the stereotype of a Research Lab: the place where you hide all the Ph.D.s to keep them away from the engineers who do the real work.

We're different for two reasons.

First, Google Engineering is different: it contains many world class Ph.D. researchers. For example, the top download from the ACM digital library last month was The Google File System, written by Google Ph.D.s who happen to be "engineers" (although in their previous jobs, two were at research labs and one was a grad student). This week's cover story in Nature describes work by Google Earth engineers in partnership with colleagues at CMU and NASA Ames.

Second, Google Research is different: we also have lots of world class Ph.D.s (and a few non-Ph.D.s). Yes, we write papers and prove theorems, but we're all here because we want to discover and build useful things that will change the world.

So who are we? We're experts in machine translation who came here to work with the largest corpus of bilingual and monolingual text ever assembled. We're experts in machine learning algorithms who came to work on one of the world's largest computing clusters. We're researchers in natural language, vision, security, human-computer interaction, and a dozen other fields who came to help a user base of hundreds of millions of people. And we're working side by side with the engineering team -- not in a separate building or site. Some of us are launching projects on google.com this week and wearing pagers, and some of us are working on goals for the year 2020.

So we're different, and we like it that way. We hope you do too, and hope that you'll learn more about us from this blog.