Supplementary Material. Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training

Similar documents
Supplementary Material: Convolutional Image Captioning

Encyclopedia of World Sport

COAST TO COAST AMENITIES WORKSHEET Web PDF Amenities Worksheet

2015/2016 State of Nordic

Flyers. Reading & Writing. Cambridge Young Learners English. My name is:... There are 50 questions. You have 40 minutes.

UNIDADE 1. Conteúdos UNIDADE 2. Conteúdos

Vocabulary 1 Find six sports in the puzzle and complete the sentences below. 2 Choose the word that doesn t belong. 3 Choose the correct answer.

SHIN LING GOES SKATING Hal Ames

Squeak the Skater Goes Surfing By Michael Stahl

9 Technology in sport

TROOP 349 Falls Church, VA. Troop 349 Ski Trip

One off the Best Beach in Holgate. All Fees/Taxes Included in Prices

R E RIVERSIDE ENCLOSURE A T H E N L E Y R O Y A L R E G A T T A

COPE PARK MASTER PLAN

Activities to help students add the suffixes ing and ed to words.

DATE: June 22, General Release SUBMITTED BY: LAND DEVELOPMENT SERVICES. RE: City Centre Survey Results

I am going to visit Titanic Belfast. It is a big, exciting place with lots to see and do. I may travel there by bus, car or walking.

CAUTION BIKE ROUTE AUTOMATIC DOOR BIKE CROSSING

ACTIVE TRANSPORTATION Active Community Checklist

Bicycle Count Corner of 116th Street & Broadway Manhattan, New York City

Sreyasi Nag Chowdhury, Niket Tandon, Gerhard Weikum. Max Planck Institute for Informatics, Saarbrücken, Germany

Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore

Nervous System: Reaction Time Student Version

Resort Information Pack - La Plagne Montalbert

Fulton Market Streetscape

Steps to a Healthier Way of Life SM

Phrase-based Image Captioning

Walking Audit Supporting Information

Higher Level. Test Listening. Name: Class: There s a rugby match between and Australia. The match is on Channel 1.

An educational tool provided by the experts at

QUEEN ELIZABETH SCHOOL PARK AND GLENGARRY PARK USAGE SURVEY RESULTS Integrated Strategic Development- Citizen Services- City of Edmonton

WALKABILITY CHECKLIST How walkable is your community?

the little boy 1 a good boy 1 then you give 1 is about me 1 was to come 1 old and new 1 that old man 1 what we know 1 not up here 1 in and out 1


Montmorency Park Facility Audit

Wheelchair Use Confidence Scale for Manual Wheelchair Users (WheelCon-M, Version 3.0)

Athlete Information Packet

LoveladiesCottage--Classic and Relaxed Lagoon- Front Home, Dogs OK

THE VELVETEEN RABBIT VISUAL STORY

LBC Shetland Division 2018 Rules

ENGLISCH. Schularbeits-Trainer. Tapescripts

Bar Harbor Route 3 Gateway Project Advisory Committee September 21, 1-3 PM Atlantic Oakes Hotel Minutes

Fill in the rating for each section. Total up the ratings to see how your neighbourhood scores overall for walkability.

Multiple Meaning Words: Kindergarten to Grade 2 More Teaching Tools at

kids english book 5 international edition

» 100 Ways to Add 2,000 Steps

AN EARTHWORK HILL, RINGED WITH SEATING AND SHAPED TO SLOPE DOWN TO THE RIVER, FORMS THE CENTERPIECE TO A NEW SOCIAL PIER. A WOOD SEATING TERRACE

Vail Ski Experience. For ski season

GIS Based Data Collection / Network Planning On a City Scale. Healthy Communities Active Transportation Workshop, Cleveland, Ohio May 10, 2011

Tip & Hints for Equipment

CHAPTER 1. Knowledge. (a) 8 m/s (b) 10 m/s (c) 12 m/s (d) 14 m/s

Val Thorens Sensations

Orientation and Conferencing Plan Stage 4

Buddy System (what to do if lost)

Autism on the Seas Past Guest Excursion Reviews St Maarten

Co-operation Ireland Maracycle Saturday 27 & Sunday 28 June 2015

ANATOMY OF A DRAGON BOAT

City of Toronto Complete Streets Guidelines

Meeting the Needs of those Living with Autism Across the Lifespan

Les Deux Alpes. Discover the sun-kissed Alps and their symbolic glacier. Germany Isère

Before Statements After Personal Thoughts. Agree/Disagree outdoor activity among Canadians. Agree/Disagree. Agree/Disagree.

JULY 26th, 2015 ATHLETE GUIDE

cardboard _G3U5W5_ indd 1 2/19/10 4:43 PM

International Physical Activity Prevalence Study SELF-ADMINISTERED ENVIRONMENTAL MODULE

Countdown to moving BEACH

PROPS TO TRAIN GOALIES

Basic Rules of Pedestrian Safety (Primary, Elementary)

Active Neighborhood Checklist: Protocol

European Canoe Sprint Championships Juniors and Under 23

HAWTHORN BLAST UPDATE

Val d Isere Resort Information Getting There

[05:01:05.24] Aerial footage showing lots of smoke. Aerial footage of 4 helicopters flying across the smoky area.

Categories of sports

DEAR ATHLETE We are pleased to welcome you to the 6 th edition of this triathlon event in Aalborg. Dear Athlete General information...

Crime Prevention Bulletin January 30 February 12, Child Safety Tips

AfL Playbook: 5-8 years old 12 days of active fun for kids, parents, and caregivers

York Region Rapid Transit Corporation

THE FIRST MOUNTAIN BIKE TRAIL CENTRE IN THE GCC & MIDDLE EAST

Q: Do you like/enjoy/love ing? A: Yes, I do. Yes, I like ing. A: No, I don t. No, I don t like ----ing. Find Someone Who...

Rose Parsonage. April Good Associate Broker April s Cell:

Memorial Week Special 9 nights $4000 Oceanfront floor to ceiling views

Lou Gehrig 20 th Annual Summer Classic Girls Fastpitch Softball Tournaments Amherst, New York (20 Minutes from Niagara Falls)

Harmonic Motion: Pendulums Student Version

SDTrucksprings. Hawaii Off-roading/4x4 Guide Copyright 2015 We Specialize In:

McLaren Park. Disc Golf Meeting. April 20, 2010

19th St/Oakland Station Modernization

What type of bats do Major League Baseball players use?...38

THIRD&GRAND. Public Workshop #1. Transportation Hub Area Plan. June 12, 2013

Winterdreams & Summerholidays

The Wendell Buzz Wendell Parks and Recreation February 2016

Bikes & Boards. written by Andrew Funk STAPLE HERE

Baseball Hitting Principles Version 2

Guide to Creating Walking Route Maps for Safe Routes to School

SheShreds.co NOBODY CAN STOP YOU FROM BEING YOU. BUT YOU.

GIS Based Non-Motorized Transportation Planning APA Ohio Statewide Planning Conference. GIS Assisted Non-Motorized Transportation Planning

Plainfield Gateway. Plainfield Context

L-8-3 (L-8-3) Which of these is one way that mechanical waves differ from electromagnetic waves?

Audit information collected by: Foot Auto Both. Location information collected by: Foot Auto Both LAND USE ENVIRONMENT

COMPLETE STREETS PLANNER S PORTFOLIO

Ball Hockey Rules. May (amended August 2018)

Transcription:

Supplementary Material Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training Rakshith Shetty Marcus Rohrbach 2, Lisa Anne Hendricks 2 Mario Fritz Bernt Schiele Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany 2 UC Berkeley EECS, CA, United States Facebook AI Research We present several qualitative examples to illustrate the strengths of our adversarially trained caption generator. All the examples are from the led versions of the adversarial (adv-) and the baseline (base-) models presented in the main paper. We show qualitative examples to highlight two main merits of the adversarial caption generator. First, we illustrate diversity across images in Section. Next, we demonstrate diversity when ling multiple captions for each image in Section 2.. Illustrating diversity across images Our adversarial model produces diverse captions across different images containing similar content, whereas the baseline model tends to use common generic captions which is mostly correct but not descriptive. We quantify this by visualizing the most frequently generated captions by the baseline model on the test set in Table a. Note that only the most likely caption according to the model is considered. Table a shows that the baseline model repeatedly generates identical captions for different images. However, the adversarial model is less prone to repeating generic captions, as seen in Table b. This is visualized in Figures, 2, and. Here we show sets of five images for which the baseline model generates identical generic caption. The five images are picked from among the images corresponding to captions in Table a, starting from the most frequent caption. Some entries are skipped, for example the caption in the last row, to avoid repeated concepts. While the baseline model produces a fixed generic caption for these images, we see that the adversarial model produces diverse captions which are often more specific to the contents of the image. For example, in the last row of Figure 2 we can see that the baseline model produces the generic caption a man riding skis down a snow covered slope, whereas captions produced by the adversarial model include more image specific terms like jumping, turn, steep, and corss country skiing. 2. Illustrating diversity in captions for a single image To qualitatively demonstrate the diverse captions produced by our adversarial model for each image, we visualize three captions produced by the adversarial and the baseline model for each input image. This is shown in figures 4 and 5. The captions are obtained by retaining the top three caption les out of five (ranked by models probability) from each model. Here bi-grams which are top- frequent bi-grams in the training set are highlighted in red (e.g., a group and group of ). Additionally captions which are replicas from training set are marked with a symbol. We can see that adversarial generator produces more diverse sets of captions for each image without over-using more frequent bi-grams and producing more novel sentences. For example, in the two figures, we see that the baseline model produces 22 captions (out of 45) which are copies from the training set, whereas the adversarial model does so only six times.

Sentence # baseline # adversarial Sentence # adversarial # baseline a man riding a wave on top of a surfboard a bathroom with a toilet and a sink a baseball player swinging a bat at a ball a man riding skis down a snow covered slope a man holding a tennis racquet on a tennis court a bathroom with a sink and a mirror a man riding a snowboard down a snow covered slope a baseball player holding a bat on a field a man riding a skateboard down a street a bus is parked on the side of the road 54 2 2 54 44 7 29 2 5 6 6 5 26 25 24 4 5 22 2 2 2 a man riding a wave on top of a surfboard a skateboarder is attempting to do a trick a female tennis player in action on the court a living room filled with furniture and a flat screen tv a bus that is sitting in the street a long long train on a steel track a close up of a sandwich on a plate a baseball player swinging at a pitched ball a bus that is driving down the street a boat that is floating in the water 9 8 (a) (b) Table : Most frequently repeated captions generated by (a) the baseline model and (b) the adversarial model on the test set of 5 images. Columns show the number of times the caption was produced by the two models. The caption a man riding a wave on top of a surfboard is also the most frequently generated caption by the adversarial model, albeit less than half the times of the baseline model. a group of men sitting around a meeting room a group of people sitting at a bar drinking wine a laptop and a desktop computer sit on a desk a group of people sitting at tables outside a group of people sitting around a table a person is working on a computer screen a group of friends enjoying lunch outdoors at a outdoor event a cup of coffee sitting next to a laptop a laptop computer sitting on top of a desk next to a desktop computer a laptop computer sitting on top of a desk Figure : Illustrating diversity across images a couple of men that are working on laptops a picture of a computer on a desk

Adv- a surfer rides a large wave in the ocean a surfer is falling off his board as he rides a wave Adv- a bathroom with a walk in shower and a sink Adv- a baseball player getting ready to swing at a pitch Adv- a person on skis jumping over a ramp a group of kids playing baseball on a field a cross country skier makes his way through the snow a crowd watches a baseball game being played a skier is headed down a steep slope Figure 2: Illustrating diversity across images a small bathroom has a broken toilet and a broken sink a baseball game in progress with the batter upt to swing a man riding skis down a snow covered slope a surfer rides a small wave in the ocean a white toilet in a public restroom stall a baseball player swinging a bat at a ball a skier is making a turn on a course a view of a very nice looking rest room a bathroom with a toilet and a sink a boy in a baseball uniform swinging a bat a man surfing on a surfboard in rough waters a man riding a wave on top of a surfboard a dirty bathroom with a broken toilet and sink a person on a surfboard riding a wave a person cross country skiing on a trail

Adv- a tennis player gets ready to return a serve two men dressed in costumes and holding tennis rackets Adv- a young boy riding a skateboard down a street Adv- a dish with noodles and vegetables in it Adv- a group of people standing around a shop a meal consisting of rice meat and vegetables a group of soldiers stand in front of microphones a plate with some meat and vegetables on it a couple of women standing next to a man in front of a store Figure : Illustrating diversity across images a boy is skateboarding down a street in a neighborhood a close up of some meat and potatoes a group of people standing around a table a man in white is about to serve a tennis ball a boy in white shirt doing a trick on a skateboard a plate of food with meat and vegetables a group of young people standing around talking on cell phones a boy wearing a helmet and knee pads riding a skateboard a man riding a skateboard down a steet a plate of food that has some fried eggs on it a male tennis player in action on the court a man holding a tennis racquet on a tennis court a skateboarder is attempting to do a trick a tennis player hits the ball during a match a group of people posing for a photo in formal wear

a red motorcycle parked on the side of the Advroad a motorcycle is parked on a city street a red motorcycle parked on a street in a city a man riding a motorcycle down a street Base a person riding a motorcycle on a city street a motorcycle is parked on the side of the road a skier is jumping over a snow covered hill a person on skis jumping over a hill a skier is in mid air after completing a jump a man riding skis down a snow covered slope a man riding skis down a snow covered slope a person on a snowboard jumping over a ramp a man standing next to a mail truck Adva picture of people standing outside a business a couple of people standing by a bus a group of people standing around a white truck Basea group of people standing around a white truck a group of people standing on a street next to a white truck a motor cycle parked outside a building with people nearby a motor bike parked in front of a building a row of bicycles parked outside of a building a motorcycle parked in front of a group of people a police officer on a motorcycle in front of a crowd of people a police officer on his motorcycle in front of a crowd a group of people riding bikes down a street a group of people on a street with motorcycles a group of people on a street with motorcycles a motorcycle parked on the side of a road a motorcycle is parked on the side of a road a motorcycle parked on the side of a road with a person walking by a group of people watching a skateboarder do stunts a group of skateboarders performing tricks at a skate park a group of skateboarders watch as others watch a stop sign with graffiti written on it a man doing a trick on a skateboard at a skate park a man riding a skateboard down a rail a man doing a trick on a skateboard in a park a bouquet of flowers in a vase on a table a bouquet of flowers in a vase on a table a stop sign with a few stickers on it a stop sign has graffiti written all over it a stop sign with a street sign on top of it a stop sign with a street sign on top of it a stop sign with a street sign above it a vase full of purple flowers sitting on a table a plant in a vase on a wooden porch a small blue flower vase sitting on a wooden porch a large flower arrangement in a vase on a corner a vase filled with flowers sitting on top of a table a vase filled with flowers sitting on top of a table a vase of flowers sitting on a table a vase with flowers in it sitting on a table a vase of flowers sitting on a table a vase with flowers in it on a table Figure 4: Comparing captions led from adversarial model to the baseline model. Bi-grams which are top- frequent bi-grams in the training set are highlighted in red. Captions which are replicas from training set are marked with a.

a long line of stairs leading to a church a large church with a very tall tower several stop signs in front of some buildings a large cathedral filled with lots of pews a large tall brick building with a clock on it a stop sign in front of some graffiti writing a cathedral with stained glass windows and a a church steeple with a clock on its side several stop signs lined up in a row few people a large building with a large window and a a large clock tower in a city a stop sign with graffiti on it building a row of benches in front of a building a large clock tower in a city with a sky background a stop sign is shown with a lot of graffiti on it a church with a large window and a large a large clock tower in the middle of a park a stop sign and a stop sign in front of a building building a family enjoying pizza at a restaurant party a view of a city street at dusk a white toilet sitting underneath a shower curtain a group of friends enjoying pizza and drinking a city street with many buildings and buildings a white toilet sitting underneath a bathroom beer a group of kids enjoying pizza and drinking beer a view of a city intersection in the evening window a bathroom with a shower curtain open and a toilet in it a group of people sitting at a table with pizza a traffic light in a city with tall buildings a bathroom with a toilet and a shower a group of people sitting at a table with pizza a traffic light in a city with tall buildings a bathroom with a toilet and a shower and drinks a group of people sitting at a table with pizza a traffic light and a street sign on a city street a white toilet sitting next to a shower in a bathroom Figure 5: Comparing captions led from adversarial model to the baseline model. Bi-grams which are top- frequent bi-grams in the training set are highlighted in red. Captions which are replicas from training set are marked with a.