New scientific opportunities are emerging as a result of increasingly effective data organization, access, and usage. Many fields of study have been transformed to a new level by new tools and data infrastructure. For example, the analysis of DNA sequence data has transformed medical research. But currently, much of this effort is focused in the natural sciences, where data is generated by digital instruments (e.g., satellite data, telescope data). We need to push the frontier of social sciences by doing the same with digital data available about our society; this will enable us to gain fundamental insights into the many facets of our society. A key source of information about all aspects of our society resides in government administrative data. From the day we are born until our death, most all of our activities leave footprints in various government data systems. Birth, marriage, and death certificates are filed with the government, education records remain with departments of public instruction, and traces of employment can be found in the ESC UI (Employment Security Commission Unemployment Insurance) wage data. Without a doubt, a well-integrated data system that can encompass much of the government data systems will hold the footprints of our society, our social genome. The two main hurdles to building such a system to transform the social sciences are (1) privacy concerns and the laws in place to protect individual confidentiality, and (2) the physiology of administrative data, which is fragmented, short-lived, and sometimes has questionable reliability.
Our group’s research focuses on resolving these two barriers to building a federated data system of government administrative data. Once resolved, we can build the social genome data infrastructure that could finally allow us to move toward understanding how current policies play out in our society and how to make informed policies using information and knowledge gathered from administrative data.