XDATA: Enabling Efficient Data Science at Scale
By: Wade Shen, Program Manager for the Information Innovation Office (I2O) at Defense Advanced Research Projects Agency (DARPA)
Both in business and government applications, the understanding of data is increasingly important for decision processes. These data are often large, unstructured, noisy and complex yet the tools we have for processing these data are either:
1) easy to use, but don't scale to large datasets
2) difficult to use, but scale well
Furthermore, the processing of unstructured data in the form of images, video, speech, text, sensors, etc. often require deep technical expertise for proper processing and handling.
XDATA aims to meet these challenges by developing computational techniques and software tools for processing and analyzing large, imperfect and incomplete data. For scalable analytics, this approach includes researching distributed databases, statistical sampling methods and new algorithmic advances to lower the computational
complexity of pattern matching. For information visualization, the approach includes developing human-computer interaction tools that are web-based, factor computation between client and server, and build from an open code base to enable rapid customization to different missions.
In this talk we describe the work and outcomes of the XDATA program. We show how DARPA-sponsored open-source tools can be used to enable large scale data exploration/discovery and how these tools reduce the time/effort to produce engineered solutions for complex data problems.