Overcoming Barriers to Data Sharing in the United States
Without policy change, the United States will continue trending toward data siloes—an inefficient world in which data is isolated, and its benefits are restricted.
Both public and private sector actors face legal, social, technical, and economic barriers to data sharing in the United States, inhibiting much-needed innovation and discoveries. Overly restrictive data privacy laws and a lack of technical standards hinder sector-specific data sharing in fields such as education and health care, and the misfire of past experiments has led to both a lack of trust and data siloes. This report details the challenges associated with data sharing and the steps U.S. policymakers can take to overcome these barriers and bring the social and economic benefits of data to all Americans.
There are countless facets of the economy and society that could be improved with better data. Data enables people and organizations to better understand the world and use that understanding to make better decisions, large and small. Better data would help researchers understand how to best treat infectious diseases and which interventions are most likely to alleviate poverty. Better data would allow scientists to improve predictions about extreme weather events and natural disasters. And better data would enable educators to understand which pedagogical practices work best for which kinds of students.
But better data requires more data sharing, and getting the right data to the right place at the right time is not always easy. For example, one government agency might need data held by a different government agency or a firm in the private sector. Organizations may need to transfer, aggregate, or combine datasets before they can use or reuse data. However, legal, social, technical, and economic barriers may impede data sharing. When organizations cannot obtain data already collected by another organization, they must either proceed without it (leading to suboptimal services) or collect it again (creating duplicative costs eventually passed on to consumers and taxpayers, as well as creating an onslaught of additional requests for personal information for individuals). Moreover, continued obstacles to data sharing can greatly inhibit the burgeoning AI economy. For example, the potential of large language models is only as great as their training data. Effective data-sharing mechanisms are therefore essential for individuals and organizations to overcome these barriers and obtain the social and economic benefits of data.
While many organizations in the United States do share data, whether it be internally, via set agreements with other parties, or even via data brokers, more is still needed, particularly in high-value areas. Certain parts of the economy, including health care, financial services, and education, share less data than they could despite the potential for data-driven innovation. This is due to a variety of challenges that come with data sharing. For example, privacy laws in some sectors, such as HIPAA (Health Insurance Portability and Accountability Act) in health care, tend to be more restrictive rather than enabling, leading organizations to shy away from sharing information to avoid the risk of penalties for noncompliance. Likewise, anti-data advocates have fueled fears and mistrust about data sharing, creating an environment wherein people are averse to data sharing. Moreover, data sharing can be costly to the participating actors and can require complex technical components that under-resourced areas are unlikely to prioritize.
Without policy change, the United States will continue trending toward data siloes—an inefficient world in which data is isolated, and its benefits are restricted. Data siloes are repositories of information that exist in a closed system, often sealed off from the rest of an organization or other organizations and incompatible with other datasets. Data sharing spans a whole spectrum of possibilities: on one end are data siloes, where data remains isolated and unshared, and on the other end are data collaboratives, where data flows freely between organizations with no restrictions on use. The United States needs to move more toward data collaboratives, and doing so will require overcoming these legal, social, technical, and economic barriers. It will take coordinated government action to both enable data sharing by default and counter pervasive privacy fears. Specifically, policymakers should:
- reform existing data protection laws to reduce legal barriers to data sharing;
- direct key federal agencies to create model data-sharing contracts to simplify legal agreements;
- create data literacy initiatives to help communities understand the benefits of data and how data can be shared securely;
- enable consumers to easily donate their data, particularly in high-impact areas such as health care and education;
- develop data standards in high-impact areas; and
- identify and address instances where fragmented ownership of data prevents compiling valuable datasets.