The ability to obtain and communicate complex knowledge about a visual scene, in order to answer simple questions about the objects, agents, and actions portrayed, requires the integration of vision with language understanding. In this unit, you will learn about the state-of-the-art in automated question answering systems; models that leverage visual recognition and tracking with language understanding to describe the content of a video in linguistic terms; and a system that can understand stories. Turning to biology, you will learn about the representations of semantic information in the brain as revealed by fMRI studies.
(Image © Journal of Artificial Intelligence Research. All rights reserved. This content is excluded from our Creative Commons license. Source: Yu, H., N. Siddharth, A. Barbu, and J. M. Siskind. “A Compositional Framework for Grounding Language Inference, Generation, and Acquisition in Video.” J. Artif. Intell. Res. (JAIR) 52 (2015): 601-713.)
Boris Katz describes key elements of the START system, an online question answering system that has been operating for over two decades, and compares its capabilities to IBM’s Watson system that can beat human players at Jeopardy.
Andrei Barbu shows how the simple ability to compare an English sentence and a video clip can form the basis for many tasks such as recognition, image and video retrieval, generation of video captions, question answering, and language acquisition.
Patrick Winston addresses a cognitive ability that distinguishes human intelligence from that of other primates: The ability to tell, understand and recombine stories. The Genesis story understanding system is a powerful and flexible platform for exploring this capability.
Guest speaker Tom Mitchell shows how the neural representations of language meaning can be understood using machine learning methods that can decode fMRI signals to reveal the semantics of words experienced by a viewer.
- Introductions to machine learning, neuroscience