INTERVIEW by MEGGIE O’DELL | PHOTOGRAPHY courtesy of DAVID & YALE
Robots aren’t just for science fiction anymore — they build our cars, vacuum our floors, and even help surgeons operate. They also fight our wars: unmanned military drones have proven to be one of the most controversial elements of recent U.S. operations in Afghanistan and Pakistan. But can robots learn to understand, not just follow, orders? MIT researchers Yale Song and David Demirdjian think so. They’re developing software that will help robotic aircraft recognize the hand and arm signals of landing crews, enabling them to land planes on moving targets without a pilot.
So, first off, I want to say that your project sounds pretty amazingly space-age: robots, drones… can you explain the basic way your software works and what it does?
David: Our system is built to recognize specific arm and hand gestures. The particular application we developed allows a flight director to control a drone using hand signals and safely guide it through the tarmac. The gesture recognition software is divided into two modules. The first module detects the user, fitting a human model to the 3-dimensional input. The second module recognizes gestures by analyzing the shape and movements of the human model.
Some people might not be familiar with unmanned drones as part of military strategy. What are the unmanned drones used for, and what advantages do they have over piloted planes?
David: Unmanned drones have become a critical asset in modern warfare. They are generally used to fly over hostile areas to collect data using video and infrared cameras, and radar. For instance, the data is used by military operators to localize and identify the enemy’s activity. The fact drones are unmanned saves lives. Before drones were introduced, reconnaissance missions were given to actual pilots, unfortunately resulting in occasional casualties. Drones are also much cheaper to make and operate than piloted aircraft.
How did this project get started? Were you approached by the military, or was this just something you thought would be cool and the brass agreed?
Yale: The project is funded by the Office of Naval Research.
David: I have always been fascinated with creating ways for humans to interact with machines. I have built human-tracking and gesture-based human-machine interaction systems for over 10 years now. When Yale told me about this opportunity, I jumped on the chance to collaborate with him and Professor Davis.
A lot of our readers are probably familiar with the X-Box 360’s similar movement-recognition system. How is this different?
Yale: The X-Box 360 system performs full body tracking, but not gesture recognition. An easy analogy: it knows you are waving your arm left and right, but it doesn’t know you are saying goodbye. Also, the X-Box 360 system does not estimate hand shapes, so it cannot distinguish thumb up and down.
How do you test the accuracy of the software? What have the test results been?
Yale: Me and my colleagues at MIT went to the naval training station down in Pensacola, Florida, to learn the aircraft-handling gestures in a realistic setting. After coming back to MIT, we recruited 20 people in the institution and taught them how to perform the gestures, accounting the real world factors such as fatigue effect and abbreviation. Each person performed each of the 24 gestures 20 times, giving us 400 sample points per gesture. To test the accuracy of our system, we split the entire dataset so that the test split contains data samples from the first five volunteers, the validation split contains the next five volunteers, and the remaining ten volunteers for the training split. To date, the recognition accuracy on the test split was 75.37%; for the validation split it was 86.35%.
How will you improve on the system before it gets put into use?
David: Before it gets deployed, the system performance would have to be improved to guarantee its safe use. Since the drone movements on the tarmac are driven by recognized gestures, it is crucial that our system be near error-free. In addition to technological improvements to human tracking and gesture recognition algorithms, a complete system would require the development of autonomous navigation. This would provide enough safeguards for the deployment of our system in areas where soldiers are present.
Yale: Using contextual information could help improving the accuracy significantly. Also, to ensure that the interaction is natural, we need an appropriate feedback mechanism from the system to humans in order. For example, the system should could say to deck handlers, “I didn’t get it, could you do it again?” in a natural way.
What do you think could be some other applications for the software? Will it have civilian as well as military utility?
David: The applications for technology like ours are endless and span everything from military to civilian use. Gestures and speech are central to human communication and therefore create a natural way to communicate with computer systems. Interfaces based on gestures and speech seem to be the best choice because they are intuitive and could be used across different systems. “Living room” devices such as video gaming consoles and TVs have already integrated this technology. The next step is the ubiquitous home interface. Imagine pointing at any device in your house and activating it using spoken commands; lights, heater, blinds, stove, etc.
I’m sure you must have done a lot of work in programming and robotics up to now — can you talk about some of your other projects? What else are you working on now…if it’s not Top Secret!
Yale: My work concentrates on probabilistic modeling of multimodal human behavior. When humans communicate with each other, we make use of all sorts of modality — facial expression, speech, arm and hand gestures, etc. My goal is to build robust and efficient computational models that could understand complex human behavior in a probabilistic setting.
David: As project leader at Vecna Technologies, a Cambridge-based company, I am in charge of DoD projects that develop computer vision and artificial intelligence technology for robotic platforms.
What can Daily BR!NK readers do to support you and your projects?
David: Creative ideas and feedback are always welcome.