Helping computers fill in the gaps between video frames

Thursday, September 13, 2018 - 23:30 in Mathematics & Economics

Given only a few frames of a video, humans can usually surmise what is happening and will happen on screen. If we see an early frame of stacked cans, a middle frame with a finger at the stack’s base, and a late frame showing the cans toppled over, we can guess that the finger knocked down the cans. Computers, however, struggle with this concept. In a paper being presented at this week’s European Conference on Computer Vision, MIT researchers describe an add-on module that helps artificial intelligence systems called convolutional neural networks, or CNNs, to fill in the gaps between video frames to greatly improve the network’s activity recognition. The researchers’ module, called Temporal Relation Network (TRN), learns how objects change in a video at different times. It does so by analyzing a few key frames depicting an activity at different stages of the video — such as stacked objects that are...

Read the whole article on MIT Research

More from MIT Research

Latest Science Newsletter

Get the latest and most popular science news articles of the week in your Inbox! It's free!

Check out our next project, Biology.Net