ITIF Search
Data Is Not Oil, Bacon, or Gold: An Actual Measure of Data as an Asset

Data Is Not Oil, Bacon, or Gold: An Actual Measure of Data as an Asset

April 3, 2023

Data is central to today’s economy, yet its economic value is unknown. Traditional economic statistics (and stock market prices) do a poor job of capturing the value of data held by firms. The struggle to understand data leads to bad analogiesthat it is the new oil, the new bacon, or the new gold—and to bad government policies, as policymakers can’t make an informed cost-benefit analysis of digital policies, such as privacy, cybersecurity, digital trade, or innovation. The U.S. Bureau of Economic Analysis’s (BEA) new measure of the value of data held by American firms helps address this gap by measuring the number, data intensity, and wages of the data-intensive jobs that turn disparate information in datasets into real economic value.

Data is more challenging to measure than key economic inputs such as land, labor, and capital. Firms use data automatically, constantly, and behind the scenes. More data does not automatically equate to a higher value. A large part of the global growth in datais due to massive increases in relatively low-value high-definition video, compared to the relatively small but highly valuable code that comprises an AI model. Furthermore, while individuals may find their own data useful, it often only becomes commercially valuable when aggregated and analyzed with data from others. Data’s value may be time sensitive and depend on being updated with new data. Specifying the data’s value at any given point is difficult.

Statistical agencies like BEA have long grappled with how best to capture the value of data, going back to 2004. In the interim, the BEA has done what it can to analyze traditional statistical accounts most relevant to the digital economy. For example, in November 2022, the BEA estimated that the U.S. digital economy accounted for $3.7 trillion of gross output from 2005 to 2020 and represented 10.3 percent of GDP and 8 million jobs in 2021. This is highly consequential and deserves more attention, but the figure only indicates the value of the data. It needs to be more granular.

The measurement gap is likely substantive. Goodridge, Haskell, and Edquist’s aptly titled article “We See Data Everywhere Except in the Productivity Statistics estimated that 43 percent of employment engaged in the capital formation of software and data is unaccounted for in measured own-account software and databases (production performed by a business for its own use) and that this missing piece of capital formation is growing faster than the measured piece. The article broke new ground as it was the first to provide a cross-country estimate for data assets in European Union countries and an estimate of how much the economy’s growing data intensity has on productivity growth.

The BEA’s new study focuses on what can genuinely be counted and creates value from datapeople. This is intuitive: huge datasets aren’t worth much of anything without skilled people making sense of and using them. It’s the same reason why countries forcing firms to store data locally (a policy known as data localization) is misguided: a data center without skilled people connected to it and using it is wasted.

This post reviews the BEA’s study valuing data as an asset for U.S. firms, provides expanded results, analyzes what this means for the debate around data localization, data flows, and digital trade, and offers recommendations to build on its important results.

Data Is a Large and Growing Asset of Incredible Value to the U.S. Economy

The BEA’s new study shows the growing value of data to the U.S. economy. It estimates that the annual current-dollar investment of in-firm data assets grew from $84 billion in 2002 to $186 billion in 2021, representing an average annual growth rate of 4.2 percent. During this 20-year period, this in-house investment represented 1 percent of business sector value-added investment, 5 percent as a share of investment in private fixed assets, and 10.2 percent as a share of investment in intellectual property (IP) assets.

Beyond the headline figures, the report includes a range of other interesting data points for policymakers. Using an experimental price index for data, the study estimates that the average annual growth rate in real data investment over the 20-year period was 7.5 percent, which yields an average annual increase in real business sector value-added growth of 4 basis points (bps) and an increase in growth in real investment in IP products of 31 bps. It’s interesting to note that software IP as an asset category grows faster over this period than data. There’s obviously a complementary relationship between data and software in that software transforms and extracts value from data. In contrast, real data investment growth is lower than real investment in software. Furthermore, the average annual growth rate of real investment in software when factoring out data-related investments is 7.7 percent. When one doesn’t factor out data-related investments, the rate falls by 26 bps.

For 2002–2021, the study estimates that cumulative nominal investment in data was $2.6 trillion. In analyzing the data intensity via occupations and wages in different sectors, the study breaks down the nominal investment in data by sector from 2002 to 2021 (via aggregating North American Industry Classification System codes). It shows that while all sectors have invested in data, some invest more than others. It also shows that growth in data investment varies. This would be an interesting figure to watch over time as a measure of how successful (or not) businesses are in using data and digital tools and how successful government programs are in encouraging further use. For example, ideally, the growth in data investments in manufacturing should be higher, given the critical role of robotics and automation in driving competitiveness and productivity.

Table 1: Nominal investment in data assets (2002–2021) and average annual growth in investment in data (2003–2019) by NAICS sector



Investment ($ Billions)

Average Annual Growth in Investment (%)


Agriculture, Forestry, Fishing, and Hunting




Mining, Quarrying, and Oil and Gas Extraction
















Wholesale Trade




Retail Trade




Transportation and Warehousing








Finance and Insurance




Real Estate and Rental and Leasing




Professional, Scientific, and Technical Services




Management of Companies and Enterprises




Administration & Support and Waste Management & Remediation Services




Accommodation and Food Services




Other Services (except Public Administration)







The BEA’s Methodology and its Key Processes and Challenges

The BEA study measures the value of own-account (production performed by a business for its own use) data stocks and flows for the U.S. business sector using the “sum-of-costs” method, in which the data’s value is calculated as the cost of its production. Production costs include costs of labor, capital, and intermediate goods. However, the costs of capital and intermediate goods (as well as additional employee costs such as employee benefits) are unavailable. Therefore, to estimate production costs, the authors apply a markup of 2.52 to their estimates of the wage costs for data-related activities—that is, production costs are assumed to be 2.52 times labor costs.

The authors estimate wage costs by industry and year by multiplying the number of employees by the average annual wage and the share of work time spent on data-related activities—which the authors refer to as the “time-use factor”—for each occupation within the industry and year and summing the results. That is, industry image’s production cost of own-account data, image, in year image is calculated as


where image is the markup factor of 2.52 and image denotes the occupation. Therefore, image denotes occupation image’s time-use factor, image denotes the average annual wage for occupation image in industry image and year image, and image denotes the number of employees in occupation image in industry image and year image.

Employment and wage data came from the Bureau of Labor Statistics Occupational Employment and Wage Statistics database. The authors use job advertisement data from Burning Glass Technologies for 2010–2019 and a machine learning model they developed to calculate occupations’ time-use factors. Specifically, an occupation’s time-use factor is defined as the share of job advertisements for that occupation that lists at least one of the data-related skills times a “similarity factor,” which takes a value between 0 and 1. The authors identify 17 “landmark occupations” for which the share of job advertisements mentioning any data-related skills is greater than 50 percent. This group includes occupations such as data entry keyers, computer and information research scientists, and database administrators. An occupation’s similarity factor is then calculated based on how “close” it is to the closest landmark occupation, where closeness is determined by the authors’ machine learning model. By definition, landmark occupations receive a similarity factor of 1.

The model’s methodology addresses several measurement challenges. First, the authors avoid double counting the value of data, given its overlaps with other assets like software and IP. This issue is not unique to data. For example, the BEA also employs methods to account for overlaps between software and R&D assets. Second, the authors needed a method to distinguish between data used in production and capital formation. Because of a lack of relevant empirical data, the authors assume that half of the firms’ data is used in capital formation. Relatedly, for the data processing and hosting industry (NAICS 158), the authors needed to distinguish between purchased data and data produced in-house. Here, too, the authors assumed that half of the industry’s data assets were purchased. Third, the authors had to make assumptions about the cost of capital in computing the markup (of 2.52). And finally, because there are no market transactions for in-house data that would reveal prices or depreciation rates, the authors adopted international methods to apply price changes and depreciation to software and databases and applied them to own-account data.

Overall, the study represents a significant step toward developing a more granular measurement of the value of data. It adapts methods used on similar economic assets, such as intellectual property. However, the authors resorted to three major assumptions that raise concerns that will hopefully be addressed in subsequent studies: 1) the occupation-specific time-use factors were constant throughout the 2002–2021 period; 2) job advertisements accurately reflect the actual jobs; 3) the markup factor does not vary by time, industry, or occupation.

Policy Implications for Digital Development in Developing Countries: Focus on Enabling Factors, Not Data Centers

For countries and policymakers involved in digital development, the study proves a central point: skilled workers generate economic value from data, meaning the extent of a data-skilled workforce at the firm-, industry-, regional-, and the national level becomes much more important than simply the number of data centers or the amount of data a firm or firm holds. This is central to the debate about data and digital developmentthat the value of data comes from its use (not its storage), the aggregation of data (rather than its use in discrete form) into what people term “big data” is often where the most value is created, and businesses can only maximize the value of data when it’s able to move freely across borders (so that businesses can aggregate it with other data for analysis by skilled workers).

Unfortunately, policymakers in a growing number of countries worldwide—developed and developing, democratic and authoritarianare drawn to the false and costly allure of data nationalism: the notion that the value of data depends on where it’s stored. This focus on the location of data storage manifests a costly distraction from the need to develop the fundamental enablers that help countries create value from data, such as skills training and education.

Conceptual reframing of digital development and trade

Some multilateral organizations (like the United Nations Committee on Trade and Development), development agencies (like the World Bank), and many non-governmental organizations (NGOs) struggle withor outright opposehelping developing countries develop open and competitive digital economies. Some naturally prefer state-led industrial development and structural transformation (from agriculture to manufacturing to services), which is poorly suited for helping countries learn how to extract value from data. State-directed digital development often leads to digital protectionist industrial policies, including data localization. Of all the enabling factors that are critical to digital development, development agencies should ensure that skills and education are a priority. This BEA study highlights that skilled workers are critical for all countries.

Recommendations for the United States and Other Leading Digital Countries

The United States and other countries that support the development and use of digital technologies and an open global digital economy should do more to better measure the value of data. The more policymakers appreciate the size and role of data, the better the subsequent policies. Or so we hope.

Data and data-related policies hold enormous potential economic and societal significance, given data’s use in a growing range of digital technologies. However, policymakers need both the right framing (forget the bad analogies) and measurement data to better decide how best to support its use.

This BEA study and related studies from Australia and Canada (linked below) lay a solid foundation for U.S. policymakers to achieve this. Below are several ideas.

The Biden administration should not only build on this study and make it a permanent part of statistical processes but a central part of a national strategy to develop statistics for the digital economy. In January 2023, the Biden Administration launched a similar national strategy to develop statistics for environmental-economic decisions. The BEA should work toward turning this study into a satellite account for new types of economic activity (such as the space economy), with the goal of refining the methodology to eventually include it in core national statistical accounts. This is the same process that R&D spending followed to eventually become a part of official U.S. economic statistics. Data should be next.

BEA should follow through on its own idea to research how to quantify the value of data to other stakeholders, such as governments, consumers/households, non-profit institutions, and big data firms and less data-intensive firms (who are more users, than creators, of data), and the value of data exchanges between them, such as between government data and U.S. firms, data-intensive firms to non-data intensive firms, and U.S. firms to the government. This would also help dispel the misconception that data and data flows are only important to “tech” firms.

Similarly, the BEA should expand efforts to address the data divide by closing critical data gaps for underrepresented communities and ensuring quality data. Data can be a fundamental enabler of social good, but only if enough communities can access data and put it to productive use.

BEA and other agencies should explore how they could better analyze the role that cross-border data flows contribute to domestic data value creation and how this value is both generated internally by the firm and delivered back to users elsewhere around the world. Even if this research is initially limited to surveys and case studies, it would provide valuable insights into the underappreciated value of an open and global Internet. This would help dispel the popular zero-sum misconception held by many policymakers (especially in developing countries) that data leaving a country to be processed and delivered back (often as a free service) means lost economic value.

Similarly, core components of this study could be adapted or combined with other measures to develop a better proxy measurement for digital trade. Like many research organizations, ITIF has estimated digital intensity measurement via software usage per worker in each U.S. industry. It uses noncapitalized software expenditures from the 2013 U.S. Census Information and Communication Technology Survey as a proxy for intangible software expenditure per industry. This data is divided by the number of workers in each corresponding industry as provided by the U.S. Bureau of Labor Statistics (BLS) for the same reference year (2013).

The BEA should work with statistical agencies in like-minded digital trading partners like Australia, Canada, Japan, Singapore, and the United Kingdom to explore ways to capture the value of cross-border collection and use of data. Canada and Australia have done somewhat similar work to this BEA study. The BEA and its counterparts in these countries should explore joint projects to better capture the international extension of their domestic work to capture the value of data.

Policymakers in the United States and like-minded countries should elevate the importance of data-related economic policy. China formally designated data as a factor of production, joining land, labor, capital, and technology. This has led China to create data exchanges and other policies to try and maximize the value of data. While these data exchanges may well be unsuccessful, China is right to elevate the importance of data policy in its strategic economic policymaking. The United States does not even have a strategy for the global digital economy. Policymakers in the United States and elsewhere should put data policy at a similarly strategic level in terms of its economic importance.

Back to Top