Using Perceptual Context to Ground Language

Similar documents
Seman&c Learning. Hanna Hajishirzi. (Some slides taken from semantic parsing tutorial)

BALLGAME: A Corpus for Computational Semantics

CS472 Foundations of Artificial Intelligence. Final Exam December 19, :30pm

Lecture 10: Generation

Joint Parsing and Translation

Petacat: Applying ideas from Copycat to image understanding

Opleiding Informatica

Scope asymmetry, QUDs, and exhaustification

Team Description: Building Teams Using Roles, Responsibilities, and Strategies

A Layered Approach to Learning Client Behaviors in the RoboCup Soccer Server

Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held

THE APPLICATION OF BASKETBALL COACH S ASSISTANT DECISION SUPPORT SYSTEM

Syntax and Parsing II

Re-evaluating automatic metrics for image captioning

Approximate Grammar for Information Extraction

SYSTEMS AND TOOLS FOR MACHINE TRANSLATION

THE CANDU 9 DISTRffiUTED CONTROL SYSTEM DESIGN PROCESS

Spreading Activation in Soar: An Update

COMP 406 Lecture 05. Artificial. Fiona Yan Liu Department of Computing The Hong Kong Polytechnic University

Object Recognition. Selim Aksoy. Bilkent University

Application of Dijkstra s Algorithm in the Evacuation System Utilizing Exit Signs

Assessment will involve students in individual and group scenarios.

Statistical Machine Translation

Introduction Definition of decision-making: the capacity of the player to execute an action following some conscious tactical or strategical choice.

Coupling distributed and symbolic execution for natural language queries. Lili Mou, Zhengdong Lu, Hang Li, Zhi Jin

For IEC use only. Technical Committee TC3: Information structures, documentation and graphical symbols

INTRODUCTION TO PATTERN RECOGNITION

6.RP Speed Conversions

A computer program that improves its performance at some task through experience.

Intro to Natural Language Generation

The Champion UT Austin Villa 2003 Simulator Online Coach Team

Evaluating and Classifying NBA Free Agents

Neural Network in Computer Vision for RoboCup Middle Size League

Dialect Recogni-on Using a Phone GMM Supervector Based SVM Kernel

Grade Level: 5. Purpose: Standards and Benchmarks:

Pedestrian Dynamics: Models of Pedestrian Behaviour

Computer Integrated Manufacturing (PLTW) TEKS/LINKS Student Objectives One Credit

Self-Driving Vehicles That (Fore) See

2009 Solutions. (G) Sk8r. Languages are everywhere even in places where you don t expect them.

A Novel Approach to Predicting the Results of NBA Matches

Master s Project in Computer Science April Development of a High Level Language Based on Rules for the RoboCup Soccer Simulator

TOPIC 10: BASIC PROBABILITY AND THE HOT HAND

A THESIS SUBMITTED TO THE FACULTY OF GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA

National Robotics Competition 2018 NRC WRO Challenge Manual

Sentiment Analysis Symposium - Disambiguate Opinion Word Sense via Contextonymy

EVALITA 2011 The News People Search Task: Evaluating Cross-document Coreference Resolution of Named Person Entities in Italian News

The Project The project involved developing a simulation model that determines outcome probabilities in professional golf tournaments.

Lesson 14: Games of Chance and Expected Value

D-Case Modeling Guide for Target System

PREDICTING the outcomes of sporting events

The Cooperative Cleaners Case Study: Modelling and Analysis in Real-Time ABS

Individual Behavior and Beliefs in Parimutuel Betting Markets

Common Core Standards for Language Arts. An Alignment with Santillana Español (Serie Amigos) Grades K-5

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan

FINALS 2013 ~ 2016 General Outline

Commonsense Knowledge Acquisition and Applications

PEAPOD. Pneumatically Energized Auto-throttled Pump Operated for a Developmental Upperstage. Test Readiness Review

Teaching Notes. Contextualised task 35 The 100 Metre Race

OXSY 2016 Team Description

Investigating Content Selection for Language Generation using Machine Learning

Performance of Fully Automated 3D Cracking Survey with Pixel Accuracy based on Deep Learning

NCSS Statistical Software

Dealing with Dependent Failures in Distributed Systems

Grade: 8. Author(s): Hope Phillips

FUT-K Team Description Paper 2016

Basketball data science

Building an NFL performance metric

A Generalised Approach to Position Selection for Simulated Soccer Agents

Black Sea Bass Encounter

Software Reliability 1

NCCP Swimming 301 Course Summary

Iteration: while, for, do while, Reading Input with Sentinels and User-defined Functions

The Incremental Evolution of Gaits for Hexapod Robots

RoboCup Soccer Simulation 3D League

GEN II Robot Soccer EV3 Compass Training Curriculum Brian Thomas

628 Differential Pressure Decay Leak Tester

BADMINTON IN ENGLISH LESSON PLANS. Lluís Roca i Roca

Phoenix Soccer 2D Simulation Team Description Paper 2015

Chapter 20. Sampling the Recreational Creel

To assess progression of learning To assess students knowledge at the end of the learning cycle

Cricket umpire assistance and ball tracking system using a single smartphone camera

Homework 7, Due March

FIRE PROTECTION. In fact, hydraulic modeling allows for infinite what if scenarios including:

#1 Accurately Rate and Rank each FBS team, and

SINGAPORE POOLS (PRIVATE) LIMITED 210 MIDDLE ROAD, #01-01 SINGAPORE POOLS BUILDING, SINGAPORE

The Bruins I.C.E. School Math 3 rd 5 th Grade Curriculum Materials

beestanbul RoboCup 3D Simulation League Team Description Paper 2012

Math Released Item Grade 4 PBA Item #17 Number of Baskets VF565302

Introduction to Pattern Recognition

CS 528 Mobile and Ubiquitous Computing Lecture 7a: Applications of Activity Recognition + Machine Learning for Ubiquitous Computing.

Section I: Multiple Choice Select the best answer for each problem.

A statistical model of Boy Scout disc golf skills by Steve West December 17, 2006

Chapter 4 (sort of): How Universal Are Turing Machines? CS105: Great Insights in Computer Science

8.G Bird and Dog Race

Learning of Cooperative actions in multi-agent systems: a case study of pass play in Soccer Hitoshi Matsubara, Itsuki Noda and Kazuo Hiraki

DATA MINING ON CRICKET DATA SET FOR PREDICTING THE RESULTS. Sushant Murdeshwar

TESTING OF GPS CONTROLLED SALT SPREADING AND DATA COLLECTION

Deakin Research Online

Safety Analysis Methodology in Marine Salvage System Design

Transcription:

Using Perceptual Context to Ground Language David Chen Joint work with Joohyun Kim, Raymond Mooney Department of Computer Sciences, University of Texas at Austin 2009 IBM Statistical Machine Learning and Its Application(SMILe) Open House October 8th and 9th, 2009

Learning Language from Perceptual Context Children do not learn language from annotated corpora. The natural way to learn language is to perceive language in the context of its use in the physical and social world.

Challenge

Challenge

Challenge

Challenge A linguistic input may correspond to many possible events???

Challenge A linguistic input may correspond to many possible events??? Pass(GermanyPlayer1, GermanyPlayer2) Block(SpanishGoalie) Kick(GermanyPlayer2)

Overview Sportscasting task Tactical generation Human evaluation

Tractable Challenge Problem: Learning to Be a Sportscaster Goal: Learn from realistic data of natural language used in a representative context while avoiding difficult issues in computer perception (i.e. speech and vision). Solution: Learn from textually annotated traces of activity in a simulated environment. Example: Traces of games in the Robocup simulator paired with textual sportscaster commentary.

Robocup Simulation League Purple goalie blocked the ball

Learning to Sportscast Learn to sportscast by observing sample human sportscasts Build a function that maps between natural language (NL) and meaning representation (MR) NL: Textual commentaries about the game MR: Predicate logic formulas that represent events in the game

Mapping between NL/MR NL: Purple3 passes the ball to Purple5 Semantic Parsing (NL MR) Tactical Generation (MR NL) MR: Pass ( Purple3, Purple5 )

Robocup Sportscaster Trace Natural Language Commentary Purple goalie turns the ball over to Pink8 Purple team is very sloppy today Pink8 passes the ball to Pink11 Meaning Representation badpass ( Purple1, Pink8 ) turnover ( Purple1, Pink8 ) kick ( Pink8) pass ( Pink8, Pink11 ) kick ( Pink11 ) Pink11 looks around for a teammate Pink11 makes a long pass to Pink8 Pink8 passes back to Pink11 kick ( Pink11 ) ballstopped kick ( Pink11 ) pass ( Pink11, Pink8 ) kick ( Pink8 ) pass ( Pink8, Pink11 )

Robocup Sportscaster Trace Natural Language Commentary Purple goalie turns the ball over to Pink8 Purple team is very sloppy today Pink8 passes the ball to Pink11 Meaning Representation badpass ( Purple1, Pink8 ) turnover ( Purple1, Pink8 ) kick ( Pink8) pass ( Pink8, Pink11 ) kick ( Pink11 ) Pink11 looks around for a teammate Pink11 makes a long pass to Pink8 Pink8 passes back to Pink11 kick ( Pink11 ) ballstopped kick ( Pink11 ) pass ( Pink11, Pink8 ) kick ( Pink8 ) pass ( Pink8, Pink11 )

Robocup Sportscaster Trace Natural Language Commentary Purple goalie turns the ball over to Pink8 Purple team is very sloppy today Pink8 passes the ball to Pink11 Meaning Representation badpass ( Purple1, Pink8 ) turnover ( Purple1, Pink8 ) kick ( Pink8) pass ( Pink8, Pink11 ) kick ( Pink11 ) Pink11 looks around for a teammate Pink11 makes a long pass to Pink8 Pink8 passes back to Pink11 kick ( Pink11 ) ballstopped kick ( Pink11 ) pass ( Pink11, Pink8 ) kick ( Pink8 ) pass ( Pink8, Pink11 )

Robocup Sportscaster Trace Natural Language Commentary Purple goalie turns the ball over to Pink8 Purple team is very sloppy today Pink8 passes the ball to Pink11 Meaning Representation P6 ( C1, C19 ) P5 ( C1, C19 ) P1( C19 ) P2 ( C19, C22 ) P1 ( C22 ) Pink11 looks around for a teammate Pink11 makes a long pass to Pink8 Pink8 passes back to Pink11 P1 ( C22 ) P0 P1 ( C22 ) P2 ( C22, C19 ) P1 ( C19 ) P2 ( C19, C22 )

Robocup Data Collected human textual commentary for the 4 Robocup championship games from 2001-2004. Avg # events/game = 2,613 Avg # English sentences/game = 509 Avg # Korean sentences/game = 499 Each sentence matched to all events within previous 5 seconds. Avg # MRs/sentence = 2.5 (min 1, max 12) Manually annotated with correct matchings of sentences to MRs (for evaluation purposes only).

Overview Sportscasting task Tactical generation Human evaluation

Tactical Generation Learn how to generate NL from MR Example: Pass(Pink2, Pink3) Pink2 kicks the ball to Pink3 Two steps 1. Disambiguate the training data 2. Learn a language generator

WASP: Word Alignment-based Uses statistical machine translation techniques Synchronous context-free grammars (SCFG) [Wu, 1997; Melamed, 2004; Chiang, 2005] Word alignments [Brown et al., 1993; Och & Ney, 2003] SCFG supports both: Semantic Parsing Semantic Parsing: NL MR Tactical Generation: MR NL

WASPER: WASP with EM-like Retraining Sportscaster Robocup Simulator Purple7 loses the ball to Pink2 Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Pass ( Purple5, Purple7 ) Turnover ( purple7, pink2 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Pass ( pink5, pink8) Ballstopped Kick ( pink8 ) Ambiguous Training Data

WASPER: WASP with EM-like Retraining Sportscaster Robocup Simulator Purple7 loses the ball to Pink2 Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Pass ( Purple5, Purple7 ) Turnover ( purple7, pink2 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Pass ( pink5, pink8) Ballstopped Kick ( pink8 ) Ambiguous Training Data Initial Semantic Parser WASP

WASPER: WASP with EM-like Retraining Sportscaster Robocup Simulator Purple7 loses the ball to Kick ( pink2 ) Pink2 Pink2 kicks the ball to Pink5 Pass ( pink2, pink5 ) Pink5 makes a long pass Kick ( pink5 ) to Pink8 Pink8 shoots the ball Kick ( pink8 ) Unambiguous Training Data Purple7 loses the ball to Pink2 Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Pass ( purple5, purple7 ) Turnover ( purple7, pink2 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Pass ( pink5, pink8) Ballstopped Kick ( pink8 ) Initial Semantic Parser Ambiguous Training Data

WASPER: WASP with EM-like Retraining Sportscaster Robocup Simulator Purple7 loses the ball to Kick ( pink2 ) Pink2 Pink2 kicks the ball to Pink5 Pass ( pink2, pink5 ) Pink5 makes a long pass Kick ( pink5 ) to Pink8 Pink8 shoots the ball Kick ( pink8 ) Unambiguous Training Data Purple7 loses the ball to Pink2 Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Pass ( purple5, purple7 ) Turnover ( purple7, pink2 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Pass ( pink5, pink8) Ballstopped Kick ( pink8 ) Ambiguous Training Data Semantic Parser WASP

WASPER: WASP with EM-like Retraining Sportscaster Robocup Simulator Purple7 loses the ball to Turnover ( purple7, Pink2 pink2 ) Pink2 kicks the ball to Pink5 Pass ( pink2, pink5 ) Pink5 makes a long pass Kick ( pink5 ) to Pink8 Pink8 shoots the ball Kick ( pink8 ) Unambiguous Training Data Purple7 loses the ball to Pink2 Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Pass ( purple5, purple7 ) Turnover ( purple7, pink2 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Pass ( pink5, pink8) Ballstopped Kick ( pink8 ) Ambiguous Training Data Semantic Parser WASP

WASPER: WASP with EM-like Retraining Sportscaster Robocup Simulator Purple7 loses the ball to Turnover ( purple7, Pink2 pink2 ) Pink2 kicks the ball to Pink5 Pass ( pink2, pink5 ) Pink5 makes a long pass Kick ( pink5 ) to Pink8 Pink8 shoots the ball Kick ( pink8 ) Unambiguous Training Data Purple7 loses the ball to Pink2 Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Pass ( purple5, purple7 ) Turnover ( purple7, pink2 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Pass ( pink5, pink8) Ballstopped Kick ( pink8 ) Ambiguous Training Data Semantic Parser WASP

WASPER: WASP with EM-like Retraining Sportscaster Robocup Simulator Purple7 loses the ball to Turnover ( purple7, Pink2 pink2 ) Pink2 kicks the ball to Pink5 Pass ( pink2, pink5 ) Pink5 makes a long pass Pass ( pink5, pink8) to Pink8 Pink8 shoots the ball Kick ( pink8 ) Unambiguous Training Data Purple7 loses the ball to Pink2 Pink2 kicks the ball to Pink5 Pink5 makes a long pass to Pink8 Pink8 shoots the ball Pass ( purple5, purple7 ) Turnover ( purple7, pink2 ) Kick ( pink2 ) Pass ( pink2, pink5 ) Kick ( pink5 ) Pass ( pink5, pink8) Ballstopped Kick ( pink8 ) Ambiguous Training Data Semantic Parser WASP

Additional Systems WASPER-GEN Uses tactical generator instead of semantic parser WASPER-GEN-IGSL Same as WASPER-GEN except uses Iterative Generation Strategy Learning (IGSL) to initialize the first iteration WASP with random matching (lower baseline) WASP with gold matching (upper baseline)

Matching 4 Robocup championship games from 2001-2004. Avg # events/game = 2,613 Avg # English sentences/game = 509 Avg # Korean sentences/game = 499 Leave-one-game-out cross-validation Metric: Precision: % of system s annotations that are correct Recall: % of gold-standard annotations produced F-measure: Harmonic mean of precision and recall

Matching Results

Tactical Generation Measure how accurately NL generator produces English sentences for chosen MRs in the test games. Use gold-standard matches to determine the correct sentence for each MR that has one. Leave-one-game-out cross-validation Metric: BLEU score: [Papineni et al, 2002], N=4

Tactical Generation Results

Overview Sportscasting task Tactical generation Human evaluation

Human Evaluation Used Amazon s Mechanical Turk to recruit human judges (~40 judges per video) 8 commented game clips 4 minute clips randomly selected from each of the 4 games Each clip commented once by a human, and once by the machine Presented in random counter-balanced order Judges were not told which ones were human or machine generated

Human Evaluation Score English Fluency Semantic Correctness Sportscasting Ability 5 Flawless Always Excellent 4 Good Usually Good 3 Non-native Sometimes Average 2 Disfluent Rarely Bad 1 Gibberish Never Terrible

Human Evaluation Syntax Semantic Overall Human? 2001 Human 3.735 3.588 3.147 0.206 2001 Machine 3.888 3.806 3.611 0.4 2002 Human 4.132 4.579 4.027 0.421 2002 Machine 3.971 3.735 3.286 0.118 2003 Human 3.541 3.730 2.611 0.135 2003 Machine 3.893 4.263 3.368 0.193 2004 Human 4.029 4.171 3.543 0.2 2004 Machine 4.125 4.375 4.0 0.563

Conclusion Current language learning work uses expensive, annotated training data. We have developed a language learning system that can learn from language paired with an ambiguous perceptual environment. We have evaluated it on the task of learning to sportscast simulated Robocup games. The system learns to sportscast as well as CS students who don t watch soccer

Demo Clip Game clip commentated using WASPER- GEN with IGSL, since this gave the best results for generation. FreeTTS was used to synthesize speech from textual output. YouTube link: http://www.youtube.com/watch?v=l_mirs7nbpu

Backup Slides

Overview Sportscasting task Related works Tactical generation Strategic generation Human evaluation 24

Semantic Parser Learners Learn a function from NL to MR NL: Purple3 passes the ball to Purple5 Semantic Parsing (NL MR) Tactical Generation (MR NL) MR: Pass ( Purple3, Purple5 ) We experiment with two semantic parser learners WASP (Wong & Mooney, 2006; 2007) KRISP (Kate & Mooney, 2006) 25

KRISP: Kernel-based Robust Interpretation by Semantic Parsing Productions of MR language are treated like semantic concepts SVM classifier is trained for each production with string subsequence kernel These classifiers are used to compositionally build MRs of the sentences More resistant to noisy supervision but incapable of tactical generation 27

Overview Sportscasting task Related works Tactical generation Strategic generation Human evaluation 50

Strategic Generation Generation requires not only knowing how to say something (tactical generation) but also what to say (strategic generation). For automated sportscasting, one must be able to effectively choose which events to describe. 51

Example of Strategic Generation pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badpass ( purple3, pink9 ) turnover ( purple3, pink9 ) 52

Example of Strategic Generation pass ( purple7, purple6 ) ballstopped kick ( purple6 ) pass ( purple6, purple2 ) ballstopped kick ( purple2 ) pass ( purple2, purple3 ) kick ( purple3 ) badpass ( purple3, pink9 ) turnover ( purple3, pink9 ) 52

Strategic Generation For each event type (e.g. pass, kick) estimate the probability that it is described by the sportscaster. Requires correct NL/MR matching Use estimated matching from tactical generation Iterative Generation Strategy Learning 53

Iterative Generation Strategy Learning (IGSL) Directly estimates the likelihood of an event being commented on Self-training iterations to improve estimates Uses events not associated with any NL as negative evidence 54

Robocup Sportscaster Trace Natural Language Commentary Meaning Representation ballstopped purple6 passes to purple2 kick ( purple6 ) pass ( purple6, purple2 ) purple3 loses the ball to pink9 turnover ( purple3, pink9 ) kick ( purple2 ) pass ( purple2, purple3 ) purple2 makes a short pass to purple3 kick ( purple3 )

Robocup Sportscaster Trace Natural Language Commentary Meaning Representation purple6 passes to purple2 pass ( purple6, purple2 ) purple3 loses the ball to pink9 pass ( purple2, purple3 ) purple2 makes a short pass to purple3

Robocup Sportscaster Trace Natural Language Commentary Meaning Representation purple6 passes to purple2 kick ( purple6 ) purple3 loses the ball to pink9 kick ( purple2 ) purple2 makes a short pass to purple3 kick ( purple3 )

Robocup Sportscaster Trace Natural Language Commentary Meaning Representation kick ( purple 3 ) ballstopped purple6 passes to purple2 kick ( purple6 ) pass ( purple6, purple2 ) kick ( purple2 ) purple3 loses the ball to pink9 turnover ( purple3, pink9 ) kick ( purple2 ) pass ( purple2, purple3 ) purple2 makes a short pass to purple3 kick ( purple3 ) kick (purple 3

Robocup Sportscaster Trace Natural Language Commentary Meaning Representation kick ( purple 3 ) purple6 passes to purple2 kick ( purple6 ) kick ( purple2 ) purple3 loses the ball to pink9 kick ( purple2 ) purple2 makes a short pass to purple3 kick ( purple3 ) kick (purple 3

Robocup Sportscaster Trace Natural Language Commentary Meaning Representation kick ( purple 3 ) purple6 passes to purple2 purple3 loses the ball to pink9 kick ( purple6 ) kick ( purple2 ) kick ( purple2 ) Negative Evidence purple2 makes a short pass to purple3 kick ( purple3 ) kick (purple 3 )

Strategic Generation Performance Evaluate how well the system can predict which events a human comments on Metric: Precision: % of system s annotations that are correct Recall: % of gold-standard annotations correctly produced F-measure: Harmonic mean of precision and recall 55

Strategic Generation Results F-measure 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Average results on leave-onegame-out cross-validation inferred from WASP inferred from KRISPER inferred from WASPER inferred from WASPER-GEN IGSL inferred from gold matching 56

Demo Clip Game clip commentated using WASPER- GEN with IGSL, since this gave the best results for generation. FreeTTS was used to synthesize speech from textual output. English: http://www.youtube.com/watch?v=l_mirs7nbpu Korean: http://www.youtube.com/watch?v=dur9k5aik8y