Data diversity
When data sets get too big, sometimes the only way to do anything useful with them is to extract much smaller subsets and analyze those instead. Those subsets have to preserve certain properties of the full sets, however, and one property that’s useful in a wide range of applications is diversity. If, for instance, you’re using your data to train a machine-learning system, you want to make sure that the subset you select represents the full range of cases that the system will have to confront. Last week at the Conference on Neural Information Processing Systems, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory and its Laboratory for Information and Decision Systems presented a new algorithm that makes the selection of diverse subsets much more practical. Whereas the running times of earlier subset-selection algorithms depended on the number of data points in the complete data set, the running time of the new...