Showing posts with label conferences. Show all posts
Showing posts with label conferences. Show all posts

Friday, June 15, 2012

Recap of NAACL-12 including two Best Paper awards for Googlers



This past week, researchers from across the world descended on Montreal for the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). NAACL, as with other Association for Computational Linguistics meetings (ACL), is a premier meeting for researchers who study natural language processing (NLP). This includes applications such as machine translation and sentiment analysis, but also low-level language technologies such as the automatic analysis of morphology, syntax, semantics and discourse.

Like many applied fields in computer science, NLP underwent a transformation in the mid ‘90s from a primarily rule- and knowledge-based discipline to one whose methods are predominantly statistical and leverage advances in large data and machine learning. This trend continues at NAACL. Two common themes dealt with a historical deficiency of machine-learned NLP systems -- that they require expensive and difficult-to-obtain annotated data in order to achieve high accuracies. To this end, there were a number of studies on unsupervised and weakly-supervised learning for NLP systems, which aim to learn from large corpora containing little to no linguistic annotations, instead relying only on observed regularities in the data or easily obtainable annotations. This typically led to much talk during the question periods about how reliable it might be to use services such as Mechanical Turk to get the detailed annotations needed for difficult language prediction tasks. Multilinguality in statistical systems also appeared to be a common theme as researchers have continued to move their focus from building systems for resource-rich languages (e.g., English) to building systems for the rest of the world’s languages, many of which do not have any annotated resources. Work here included focused studies on single languages to studies aiming to develop techniques for a wide variety of languages leveraging morphology, parallel data and regularities across closely-related languages.

There was also an abundance of papers on text analysis for non-traditional domains. This includes the now standard tracks on sentiment analysis, but combined with this, a new focus on social-media, and in particular NLP for microblogs. There was even a paper on predicting whether a given bill will pass committee in the U.S. Congress based on the text of the bill. The presentation of this paper included the entire video on how a bill becomes a law.

There were two keynote talks. The first talk by Ed Hovy of the Information Sciences Institute of the University of Southern California was on “A New Semantics: Merging Propositional and Distributional Information.” Prof. Hovy gave his insights into the challenge of bringing together distributional (statistical) lexical-semantics and compositional semantics, which has been a need espoused recently by many leaders in the field. The second, by James W. Pennebaker, was called “A, is, I, and, the: How our smallest words reveal the most about who we are.” As a psychologist, Prof. Pennebaker represented the “outsider” keynote that typically draws a lot of interest from the audience, and he did not disappoint. Prof. Pennebaker spoke about how the use of function words can provide interesting social observations. One example was personal pronouns like “we,” whose increased usage now causes people to feel the speaker is colder and more distant as opposed to engaging the audience and making them appear accessible. This is partly due to a second and increasingly more common meaning of “we” that is much more like “you,” e.g., when a boss says: “We must increase sales”.

Finally, this year the organizers of NAACL decided to do something new called “NLP Idol.” The idea was to have four senior researchers in the community select a paper from the past that they think will have (or should have) more impact on future directions of NLP research. The idea is to pluck a paper from obscurity and bring it to the limelight. Each researcher presented their case and three judges gave feedback American Idol-style, with Brian Roark hosting a la Ryan Seacrest. The winner was "PAM - A Program That Infers Intentions," published in Inside Computer Understanding in 1981 by Robert Wilensky, which was selected and presented by Ray Mooney. PAM (“Plan Applier Mechanism”) was a system for understanding agents and their plans, and more generally, what is happening in a discourse and why. Some of the questions that PAM could answer were astonishing, which reminded the audience (or me at least) that while statistical methods have brought NLP broader coverage, this is often at the loss of specificity and deep knowledge representation that previous closed-world language understanding systems could achieve. This echoed sentiments in Prof. Hovy’s invited talk.

Ever since the early days of Google, Googlers have had a presence at NAACL and other ACL-affiliated events. NAACL this year was no different. Googlers authored three papers at the conference, one of which merited the conference’s Best Full Paper Award, and the other the Best Student Paper:

Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure - IBM Best Student Paper
Award Oscar Täckström (Google intern), Ryan McDonald (Googler), Jakob Uszkoreit (Googler)

Vine Pruning for Efficient Multi-Pass Dependency Parsing - Best Full Paper Award
Alexander Rush (Google intern) and Slav Petrov (Googler)

Unsupervised Translation Sense Clustering
Mohit Bansal (Google intern), John DeNero (Googler), Dekang Lin (Googler)

Many Googlers were also active participants in the NAACL workshops, June 7 - 8:

Computational Linguistics for Literature 
David Elson (Googler), Anna Kazantseva, Rada Mihalcea, Stan Szpakowicz

Automatic Knowledge Base Construction/Workshop on Web-scale Knowledge Extraction
Invited Speaker - Fernando Pereira, Research Director (Googler)

Workshop on Inducing Linguistic Structure
Accepted Paper - Capitalization Cues Improve Dependency Grammar Induction
Valentin I. Spitkovsky (Googler), Hiyan Alshawi (Googler) and Daniel Jurafsky

Workshop on Statistical Machine TranslationProgram
Committee members - Keith Hall, Shankar Kumar, Zhifei Li, Klaus Macherey, Wolfgang Macherey, Bob Moore, Roy Tromble, Jakob Uszkoreit, Peng Xu, Richard Zens, Hao Zhang (Googlers) 

Workshop on the Future of Language Modeling for HLT 
Invited Speaker - Language Modeling at Google, Shankar Kumar (Googler)
Accepted Paper - Large-scale discriminative language model reranking for voice-search
Preethi Jyothi, Leif Johnson (Googler), Ciprian Chelba (Googler) and Brian Strope (Googler)

First Workshop on Syntactic Analysis of Non-Canonical Language
Invited Speaker - Keith Hall (Googler)
Shared Task Organizers - Slav Petrov, Ryan McDonald (Googlers)

Evaluation Metrics and System Comparison for Automatic Summarization
Program Committee member - Katja Filippova (Googler)

Wednesday, March 21, 2012

Google at INFOCOM 2012



The computer networking community will get together in Orlando, Florida the week of March 25th for INFOCOM 2012, the Annual IEEE International Conference on Computer Communications.

At the conference, we will discuss topics such as traffic engineering, traffic anomaly detection, and random walk algorithms for topology-aware networks. We serve so much internet traffic to Google users and exchange so much data between our data centers that computer networking is naturally something we care about. As traffic grows with richer content (photos, video, ...), new modes of engagement (cloud computing, social networking, ...) and an increasing number of users, engineering and research efforts are necessary to help networks scale.

The following papers were co-authored by Googlers from offices around the world:

  • Near-optimal random walk sampling in distributed networks by Atish Das Sarma, Anisur Molla, and Gopal Pandurangan
  • How to split a flow by Tzvika Hartman, Avinatan Hassidim, Haim Kaplan, Danny Raz, and Michal Segalov
  • Upward max-min fairness by Emilie Danna, Avinatan Hassidim, Haim Kaplan, Alok Kumar, Yishay Mansour, Danny Raz, and Michal Segalov (runner up for best paper)
  • A practical algorithm for balancing the max-min fairness and throughput objectives in traffic engineering by Emilie Danna, Subhasree Mandal, and Arjun Singh
  • Traffic anomaly detection based on the IP size distribution by Fabio Soldo and Ahmed Metwally

If you are attending, stop by and say hi!

Thursday, February 23, 2012

Announcing Google-hosted workshop videos from NIPS 2011



At the 25th Neural Information Processing Systems (NIPS) conference in Granada, Spain last December, we engaged in dialogue with a diverse population of neuroscientists, cognitive scientists, statistical learning theorists, and machine learning researchers. More than twenty Googlers participated in an intensive single-track program of talks, nightly poster sessions and a workshop weekend in the Spanish Sierra Nevada mountains. Check out the NIPS 2011 blog post for full information on Google at NIPS.

In conjunction with our technical involvement and gold sponsorship of NIPS, we recorded the five workshops that Googlers helped to organize on various topics from big learning to music. We’re now pleased to provide access to these rich workshop experiences to the wider technical community.

Watch videos of Googler-led workshops on the YouTube Tech Talks Channel:


To highlight a few workshops: The Domain Adaptation workshop organized by Google, which fused theoretical and practical domain adaptation, featured invited talks from Shai Ben-David and Googler Mehryar Mohri from the theory side and Dan Roth from the applications side. This was just next door to Googlers Doug Eck and Ryan Rifkin's workshop on Machine Learning and Music, with musical demonstrations loud enough for the next-door neighbors to ask them to “turn it down a bit, please.” In addition to the Googler-run workshops, the Integrating Language and Vision workshop showcased invited talks by Google postdoctoral fellow Percy Liang on the pragmatics of visual scene description and Josh Tenenbaum on physical models as a cognitive plausible mechanism for bridging language and vision. Finally, Google consultant Andrew Ng was one of the organizers of the Deep Learning and Unsupervised Feature Learning, which offered an extended tutorial, several inspiring talks, and two panel discussions (one with Googler Samy Bengio as panelist) exploring the question of “How deep is deep?”

As the workshop weekend drew to a close, an airline strike in Spain left NIPS attendees scrambling to get home for the holidays. We hope the skies look clear for 2012 when NIPS lands in Google’s neck of the woods, Lake Tahoe!

Tuesday, August 23, 2011

Google at the Joint Statistical Meetings in Miami



The Joint Statistical Meetings (JSM) were held in Miami, Florida, this year. Nearly 5,000 participants from academia and industry came to present and discuss the latest in statistical research, methodology, and applications. Similar to previous years, several Googlers shared expertise in large-scale experimental design and implementation, statistical inference with massive datasets and forecasting, data mining, parallel computing, and much more.

Our session "Statistics: The Secret Weapon of Successful Web Giants" attracted over one hundred people; surprising for an 8:30 AM session! Revolution Analytics reviewed this in their official blog post "How Google uses R to make online advertising more effective"

The following talks were given by Googlers at JSM 2011. Please check the upcoming Proceedings of the JSM 2011 for the full papers.

Google has participated at JSM each year since 2004. We have been increasing our involvement significantly by providing sponsorship, organizing and giving talks at sessions and roundtables, teaching courses and workshops, hosting a booth with new Google products demo, submitting posters, and more. This year Googlers participated in sessions sponsored by ASA sections for Statistical Learning and Data Mining, Statistics and Marketing, Statistical Computing, Bayesian Statistical Science , Health Policy Statistics, Statistical Graphics, Quality and Productivity, Physical and Engineering Sciences, and Statistical Education.

We also hosted the Google faculty reception, which was well-attended by faculty and their promising students. Google hires a growing number of statisticians and we were happy to participate in JSM again this year. People had a chance to talk to Googlers, ask about working here, encounter elements of Google culture (good food! T-shirts! 3D puzzles!), meet old and make new friends, and just have fun!

Thanks to everyone that presented, attended, or otherwise engaged with the statistical community at JSM this year. We’re looking forward to seeing you in San Diego next year.