Boris Glavic, assistant professor of computer science, along with collaborators from the University of Buffalo and New York University, has received almost $405,000 as a subcontractor on a $2.7 million grant from the National Science Foundation (NSF) for a project that aims to make data cleaning and wrangling easier. The Vizier project is part of an NSF effort to develop software for data exploration, cleaning, curation, and visualization.
In the current world, data is ubiquitous. There are sensors in phones, watches, homes, roads, and factories. Open government regulations put statistics like health code violations and legislative decision-making within reach of an average person. Today, this data is used by doctors, sociologists, business owners, and even ordinary citizens trying to improve their communities. However, using such data to answer simple questions like “Where do police issue the most traffic tickets?” or “What am I doing when my heart rate goes over 90 bpm?” is still hard. The data might be available, but this does not imply that it is fit for use. It may exhibit errors, inconsistencies, and other data quality problems. Data errors are everywhere, and need to be resolved to ensure that analysis results are correct. In corporate settings, analysts will spend days, weeks, or even months “cleaning” their data even before asking a single question.
The Vizier project will streamline the data curation process, making it easier and faster to explore and analyze raw data. The tool used in the project, Vizier, will combine a simple “notebook-style” interface with powerful back-end tools that track changes, edits, and the effects of automation. These forms of “provenance” capture the exploratory curation process—how the cleaning workflows evolve and how data changes over time.
To learn more about the Vizier project, visit the website.
Illinois Tech and Glavic are listed as subcontractors and are receiving $404,979