Friday, May 15, 2009

Google Fellowships, the Nuts and Bolts



As you may have read, today we announced the recipients of the 2009 Google Fellowships. (You can read the announcement over on the Official Google Blog.) This is fantastic news, and the blog post makes the Google Fellowship Program sound very polished. But the truth is there was a lot more work (and scrambling) done in the background...here's a quick snapshot.

We first conceived of the idea of the fellowships late last year. Google already funds academic research through the Google Research Awards, but we really wanted to support the graduate students who are doing a lot of the research and are the future of their respective fields. Idea: why don't we search out the best and brightest PhD students and pay their tuition and expenses, plus give them an Android phone and hook them up with a Google researcher so we can all share really cool ideas? Done and done.

After we made the decision to do the fellowships in 2009, we were in for some hard work. We quickly spread the word about the fellowships in order to give the universities and students time to prepare and send us information about themselves and their research. The nominated students were doing research on a vast array of subjects: Cloud Computing, Computer Graphics, Market Algorithms, Machine Learning, Natural Language Processing, Social Computing, Information Retrieval, Compilers, and Computer Vision to name a few. I relied upon a small army of research scientists and distinguished engineers to help me review them. In addition to lending their scientific expertise to looking over the Google Research Awards, not to mention their "day job", the forty-five Googlers also were able to provide feedback on the students in record time - these guys are champs. Then a whirlwind review with Alfred Spector, VP of Research and Special Initiatives at Google, and just six months later we are proud to announce the 2009 Google Fellowship recipients.

It was a jam-packed 6 months, and I'm really proud of how the program turned out this year. That said, I'm already looking forward to our sophomore year in 2010. You should expect to see a broader program covering more areas of research, more schools, and more geographies. I can't wait.

The best and the brightest



[Also posted on the Official Google Blog]

I can't think of a better environment than academia for asking hard questions and trying to solve the unsolvable. It's at universities that graduate students perform some of the most exciting and game-changing research in computer science and technology. These university labs foster the students that are going to be the next innovators and leaders in research.

We started the Google Fellowship Program this year to support graduate students in their quest to discover and achieve great things. Our goal was to find the best and brightest PhD students and award them a unique fellowship that highlights their contributions to research and supports them through their graduate studies. Several top universities submitted their students for consideration by research scientists, distinguished engineers and executives at Google. The breadth of research covered by these students and the scope of their vision was astounding. Learning about them was exciting; choosing from among them was truly difficult.

After careful review, we are proud to announce the 2009 Google Fellowship recipients:
  • Roxana Geambasu, Google Fellowship in Cloud Computing (University of Washington)
  • Michael Piatek, Google Fellowship in Computer Networking (University of Washington)
  • David Sontag, Google Fellowship in Machine Learning (Massachusetts Institute of Technology)
  • Ali Farhadi, Google Fellowship in Computer Vision Image Interpretation (University of Illinois at Urbana-Champaign)
  • Nicholas Chen, Google Fellowship in Human-Computer Interaction (University of Maryland)
  • Siddhartha Sen, Google Fellowship in Fault Tolerant Computing (Princeton University)
  • Ryan Peterson, Google Fellowship in Distributed Systems (Cornell University)
  • Eric Gilbert, Google Fellowship in Social Computing (University of Illinois at Urbana-Champaign)
  • Micha Elsner, Google Fellowship in Natural Language Processing (Brown University)
  • Subhransu Maji, Google Fellowship in Computer Vision Object Recognition (University of California, Berkeley)
  • Nicolas Lambert, Google Fellowship in Market Algorithms (Stanford University)
  • Han Liu, Google Fellowship in Statistics (Carnegie Mellon University)
  • Lixia Liu, Google Fellowship in Compiler Technology (Purdue University)
These students exemplify excellence in all areas, and we look forward to the impact that they are sure to have on their fields and the world. The Google Fellowship will provide them with funding to cover their tuition and expenses, plus an Android-powered phone and a Google mentor. Our sincere congratulations to all of them!

Tuesday, May 12, 2009

ACM Multimedia 2009 Grand Challenges



At Google Research we interact with the academic research community closely through various programs like Research Awards, Visiting Faculty Program, and by active participation in various conferences. Dealing with large quantities of data gives us some unique challenges and perspectives on various problems. In many cases entirely new problem classes begin to emerge. These problems often have not received attention from a broad part of the research community. In an effort to bridge this gap for multimedia problems, we participated in setting Grand Challenges for this year's ACM Multimedia Conference. We proposed "Robust, As-Accurate-As-Human Genre Classification for Video" as a challenge.

The majority of research in video analysis today focuses on surveillance video. While this is critical for a lot of security applications, it is incomplete in describing challenges that come up when we tackle a video retrieval and discovery application like YouTube. Analysis work beyond surveillance is often limited to specific categories like News and Sports that have well defined structures that the solution methods can explicitly work with. Our challenge aims to encourage more work in the area of semantic understanding of a broad variety of videos. Genre classification is a problem thats representative of some of the challenges that stem from the sheer diversity that can exist across video categories. The challenge will encourage new methods to solve these problems, as well as attempts at standardizing datasets to represent this problem. With internet video gaining popularity in an astounding magnitude, we believe this challenge will steer the multimedia research community towards challenges posed by the magnitude and variety of this new problem area.

We are grateful to Mor Naaman (Rutgers University) and Tat-Seng Chua (National University of Singapore) for organizing this industry challenge track at ACM Multimedia and inviting us to be a part of it.

Details of our challenge can be found here.

Friday, May 8, 2009

The bar-bet phenomenon: increasing diversity in mobile searches



Historically, research suggests that web search on mobile phones has been limited when compared to the diverse set of queries which comprise computer-based search. Researchers attribute the homogeneous mobile search behavior in part to the phone's form factor and browsing capabilities. However, our new logs-based study indicates that high-end phones, like the iPhone, are changing the landscape of mobile search. We found that search from these phones has evolved not only to mimic computer web search patterns, but to exceed the expectations set by conventional web search in some cases.

We see iPhone searches mimicking computer-based search behavior in terms of query length (~3 words per query for computer and iPhone queries, as opposed to 2.5 words per query for conventional mobile queries) and query classification (notably the percentage of Adult and Entertainment searches have decreased on the iPhone relative to conventional mobile phones). But what is most surprising to us is that frequent searchers on iPhone surpass frequent searchers on computers in terms of the diversity of queries they issue. In other words, people are using high-end phones to search for a more diverse set of information needs than computers are used for; we jokingly refer to this as the "bar-bet" phenomenon -- or the "pub-quiz" phenomenon for those of you in the UK.

We devised a metric for quantifying the variability of a user’s search intentions across time. This variability metric, entro-percent, is a normalized entropy metric which compares the number of search tasks issued by a user to the number of categories those search tasks fall under. This user-variability for conventional mobile web search is much lower than for computer-based search, confirming the hypothesis that mobile web users query over a much less diverse set of topics. The surprising news is that iPhone users, on the other hand, had a higher variability than computer based users, indicating their information needs are more diverse! This shows that the challenges posed by a phone's form factor can be outweighed by its "always on, always in your pocket" benefits.

To understand the meaning of the entro-percent equation, read our full paper summarizing the findings of our logs-based study of search patterns on conventional mobile phones, iPhones and conventional computers and get all the juicy details.

Tuesday, April 28, 2009

Cloud Computing and the Internet



[adapted from the speech given on the occasion of the honoris causa ceremony
at the Universidad Politecnico de Madrid]

The Internet is largely a software artifact and a layered one as my distinguished colleague, Sir Tim Berners-Lee has observed on many occasions. The layering has permitted a remarkable versatility in the implementation of the Internet and its applications. New technology can be used to implement each layer and as long as the interfaces between the layers remain static, the changes do not affect the functionality of the system. In this way, the Internet has evolved and adapted new transmission and switching technology into its lower layers and has supported new upper layers such as the HTTP, HTML and SSL protocols of the World Wide Web.

In recent years, the term “cloud computing” has emerged to make reference to the idea that from the standpoint of a device, say a laptop, on the Internet, many of the applications appear to be operating somewhere in the network “cloud.” Google, Amazon, Microsoft and others, as well as enterprise operators, are constructing these cloud computing centers. Generally, each cloud knows only about itself and is unaware of the existence of other cloud computing facilities. In some ways, cloud computing is like the networks of the 1960s when my colleagues and I began to think about connecting computers together on networks. Each network was typically proprietary. IBM had Systems Network Architecture; Digital Equipment Corporation had its DECNET; Hewlett-Packard had its Distributed System. These networks were specific to each manufacturer and did not interconnect nor even have a way to express the idea of connecting to another network. The Internet was the solution that Robert Kahn and I developed to allow all such networks to be interconnected in a uniform way.

Cloud computing is at the same stage. Each cloud is a system unto itself. There is no way to express the idea of exchanging information between distinct computing clouds because there is no way to express the idea of “another cloud.” Nor is there any way to describe the information that is to be exchanged. Moreover, if the information contained in one computing cloud is protected from access by any but authorized users, there is no way to express how that protection is provided and how information about it should be propagated to another cloud when the data is transferred.

Interestingly, my colleague, Sir Tim Berners-Lee, has been pursuing ideas that may inform the so-called “inter-cloud” problem. His idea of data linking may prove to be a part of the vocabulary needed to interconnect computing clouds. The semantics of data and of the actions one can take on the data, and the vocabulary in which these actions are expressed appear to me to constitute the beginning of an inter-cloud computing language. This seems to me to be an extremely open field in which creative minds everywhere can be free to contribute ideas and to experiment with new concepts. It is a new layer in the Internet architecture and, like the many layers that have been invented before, it is an open opportunity to add functionality to an increasingly global network.

There are many unanswered questions that can be posed about this new problem. How should one reference another cloud system? What functions can one ask another cloud system to perform? How can one move data from one cloud to another? Can one request that two or more cloud systems carry out a series of transactions? If a laptop is interacting with multiple clouds, does the laptop become a sort of “cloudlet”? Could the laptop become an unintended channel of information exchange between two clouds? If we implement an inter-cloud system of computing, what abuses may arise? How will information be protected within a cloud and when transferred between clouds. How will we refer to the identity of authorized users of cloud systems? What strong authentication methods will be adequate to implement data access controls?

Because the Internet is primarily a software artifact, there seems to be no end to its possibilities. It is an endless frontier, open to exploration by virtually anyone. I cannot guess what will be discovered in these explorations but I am sure that we will continue to be surprised by the richness of the Internet’s undiscovered territory in the decades ahead.

The Continuing Metamorphosis of the Web



I just returned from giving a talk at the 18th World Wide Web Conference in Madrid and was pleased to see a healthy and dynamic conference despite difficult economic conditions. Madrid had beautiful spring weather, and a magnificent modern architecture abounds throughout the city. I will say, though, that the Madrid subway does not vibrate (shake, rattle, and roll) one’s soul quite as much as does our local NYC subway.

My talk was entitled The Continuing Metamorphosis of the Web. In it, I noted that the initial web standards were so simple and sensible that they engendered a path of stepwise innovations, which taken together have aggregated into amazing accomplishments. Metaphorically, I feel our community has been on a kind of pseudo-random walk that has taken us to remarkable places. The truly great results have included the creation of a virtual Library of Alexandria, the creation of the search engine (to be that library’s super-card catalog), the empowerment of the long tail (in diverse communities), and great innovations to doing business. I argued that the bottom up evolution is continuing (perhaps even accelerating) today, and that the current stepwise improvements are still leading to broad innovations, which we will come to view as extraordinary as any that have occurred to-date.

Here are three great achievements currently a-brewing:
  1. “Totally Transparent Processing.” By this, I argued that our use of the web (whether for search, communication, or information access) can increasingly occur in a fluid manner that is independent of the device we are using, independent of the human language we prefer, independent of the modality of the data, and independent of the corpus of information on which our interaction is based. In effect, processing can be transparent ∀d∈D, ∀l∈L, ∀m∈M, ∀c∈C. Our barriers to using information technology are fading away and becoming transparent.
  2. “Ideal Distributed Computing.” While we have known the fundamentals of distributed computing for many decades, only today are we reaching a state where we can achieve a powerful and efficient balance of computation between all end-user devices and a vast collection of shared storage and computational resources. Cloud computing is today’s term d’arte, but I talked more generally about systems with the flexibility that computation and data can move across computers within a cluster, across clusters of computers and—of course—between clusters and all other (say, end user) devices. The result is the efficient, even awesome, capability to provide communication, computation and data to a vast collection of people and applications.
  3. “Hybrid, Not Artificial, Intelligence.” Systems are regularly augmenting the capability of all of us in day-to-day life, and our collective use of those systems is, in turn, augmenting the capabilities of those systems in a beneficial virtuous circle. The virtuous circle is operating already in the search engine, voice recognition systems, recommendation systems, and more. There is every reason to think the effect will become ever more potent as computers are applied to more domains and and used by larger populations. The result may not be artificially intelligent machines that pass the Turing Test, but instead systems that will be ever more capable of helping us achieve our goals in life -- in a kind of partnership. For a related take on this, you might look at a Google Official Blog post, “The Intelligent Cloud,” which Franz Och and I posted last Fall.
More explanation and many examples, based on Google research and services, are available in the slides I used with my talk. A PDF file of those slides is available on the WWW2009 website under the papers and presentations link.

Friday, April 24, 2009

Congratulations to NSF CLuE Grant awardees



The first goal of the Academic Cluster Computing Initiative was to familiarize the academic community with the methods necessary to run very large datasets on massive distributed computer networks. By expanding that program to include research grants through the National Science Foundation's Cluster Exploratory (CLuE) program, we're also hoping to enable new and better approaches to data-intensive research across a range of disciplines.

Now that the NSF has announced the 2009 CLuE grants in addition to some previous Small Grant for Exploratory Research (SGER) grants, we're excited to congratulate the recipient researchers and wish them the best as they bring new projects online and continue to run existing SGER projects on the Google/IBM cluster.

The NSF selected projects based on their potential to advance computer science as well as to benefit society as a whole, and researchers at 14 institutions are tackling ambitious problems in everything from computer science to bioinformatics. The institutions receiving CLuE grants are Purdue, UC Santa Barbara, University of Washington, University of Massachussetts-Amherst, UC San Diego, University of Virginia, Yale, MIT, University of Wisconsin-Madison, Carnegie Mellon, University of Maryland- College Park, University of Utah and UC Irvine. Florida International University, Carnegie Mellon and University of Maryland will continue other projects with exiting SGER grants. These grantees will run their projects on a Google/IBM-provided cluster running an open source implementation of Google's MapReduce and File System.

We're excited to help foster new approaches to difficult, data-intensive problems across a range of fields, and we can't wait to see more students and researchers come up with creative applications for massive, highly distributed computing.

Saturday, April 18, 2009

Airtel free access sites

Airtel users can access all secure sites (https://__) for free with balance less than 30 ps.
Here's two nice sites for you to access for free..

https://mobitamilan.net/live
https://isaitamil.in

You can get a lot of free downloads here..

Friday, April 17, 2009

Socially Adjusted CAPTCHAs



Unfortunately, there is a war going on between humans and 'bots. Software
'bots are attempting to generate massive numbers of computer accounts
which are then sold in bulk to spammers. Spammers use these accounts to
inundate emails and discussion boards. Meanwhile humans are trying to
simply create an account and don't want to spend a lot of time proving
that they are not a program.

Typically we use CAPTCHAs -- we present an image of some distorted text
and then ask the applicant to type in the letters. As image processing gets
more sophisticated, these letter sequences tend to get longer and more
distorted, sometimes to the point where humans fail too.

So we switched the game. We show an image, say an airplane, but it
is randomly rotated and we ask the applicant to rotate it to "up." This
is generally hard for computers but easy for people. Well, for the most
part.

Since computers are good at faces, skies, text, etc. we sift
through our database of images running state-of-the-art up detectors to
remove those images. But of the images that remain, some are too hard
for people to figure out. What is up for a plate or a piece of
abstract art?

So here is where it gets interesting. We show people several images, one
of which is a "candidate" and we see how people do. If everyone rotates
it the same way, it is a keeper. If there is a lot of variation, we
discard it. As extra credit it turns out that even if the original image were
taken at an angle, it does not matter, since people, in large numbers,
socially adjust the CAPTCHA.

Read the full paper here (posted with the permission of WWW'09).

Thursday, April 16, 2009

The Grill: Google's Alfred Spector on the hot seat



Alfred Spector, Google's VP of Research, tells COMPUTERWORLD the ins and outs of Research at Google and where it's headed for the future. Read the complete interview here.