Sizing samples
Across science and engineering, computers are often enlisted to find patterns in data. The data might be genetic information about a population, and the pattern could be which gene variants predispose people to asthma. Or the data might be frames of video, and the patterns could be objects that move or stand still from frame to frame, which data-compression or image-sharpening algorithms might want to locate.In most cases, more data means more reliable inference of patterns. But how much data is enough? Vincent Tan, a graduate student in the Department of Electrical Engineering and Computer Science, and his colleagues in Professor Alan Willsky’s Stochastic Systems Group have taken the first steps toward answering that question.Tan, Willsky and Animashree Anandkumar, a postdoc in Willsky’s group, envision data sets as what mathematicians call graphs. A graph is anything with nodes and edges: Nodes are generally depicted as circles and edges as lines...