Compustat Data: A Misleading Measure of Corporate Market Power and Market Competition

August 12, 2024

Neo-Brandeisians repeatedly claim that large firms are raising prices, increasing profits, and becoming more dominant. As evidence that their contention represents reality, these anticorporate proponents consistently point to studies that use Compustat data. However, after a thorough examination, Compustat data is a poor representation of concentration and economic competition in the United States. As such, policymakers should be wary of using these studies as a basis for legislation aiming to increase competition by targeting large firms and instead use data from the Economic Census.

Background

Concentration data from the U.S. Census Bureau is limited and lagging, with official data only being released twice per decade. For example, the 2022 Economic Censuses will not be released until 2025. These frequency and timeliness constraints have prompted anticorporate researchers to search for alternative, more convenient sources, such as Compustat data. Compustat provides concentration data annually and is more likely to report rising concentration.

Indeed, these proponents often cite a recent paper by Brauning, Fillat, and Joaquimusing using Compustat data to conclude that the economy was at least 50 percent more concentrated in 2018 than in 2005 and, as a result, has led to higher prices. These proponents have also widely cited another study by De Loecker and Eeckhout, which uses Compustat data to show that markups rose from 18 to 67 percent alongside rising concentration from 1980 to 2017. For example, a recent Democratic staff report from the House Small Business Committee repeatedly referenced this paper to show that concentration has risen and is subsequently harming the economy.

Similarly, a report by the Economic Innovation Group also cited the study by De Loecker and Eeckhout to imply that rising market power is resulting in higher markups. Moreover, these proponents have consistently referenced a study by Grullon, Larkin, and Michaely using Compustat data to conclude that over 75 percent of U.S. industries have become more concentrated since the late 1990s. The studies using Compustat data that anticorporate proponents have weaponized are not limited to these three pieces of literature. One report notes that at least 102 studies have used Compustat data from 2010 to 2016.

Challenges With Compustat Data

Compustat datasets have multiple limitations, making citing these studies as evidence of rising concentration (or declining competition) problematic. This data has three problems at a minimum:

1. It only includes public firms.

2. It likely categorizes firms into incorrect industries.

3. It measures worldwide sales rather than domestic sales.

Inclusion of Public Firms Only

The first issue with Compustat data is the inclusion of only public firms. This is problematic because a large share of private firms compete vigorously with public firms for market shares. Indeed, a study by Decker and Williams suggests that private firms comprise more than half of U.S. economic activity and a larger unweighted share of all firms. As a result, any measure of concentration excluding private firms could fail to account for the top firms in an industry, causing a skewed concentration ratio or HHI that fails to capture the true state of competition.

For example, Cargill, the 14th largest company in the United States, is not included in the dataset because it is a privately held company. As a result, the concentration ratio of the chocolate and confectionery manufacturing from cacao beans (NAICS: 311351), which Cargill is part of, will likely be under or overestimated from the exclusion of Cargill from the denominator and, potentially, numerator (if it is one of the top firms—which it likely is). Indeed, without Cargill in the denominator, the concentration ratio would likely suggest that the top firms have a larger market share than they actually do and that the industry is more concentrated than it really is. And if it is a top firm, Cargill’s exclusion from the numerator will show the industry is less concentrated than it is. This bias from the exclusion of private firms is why Decker and Williams also assert that “the measurement of top firm concentration requires accurate data covering all top firms in an industry as well as accurate measures of total industry sales.”

Incorrect Categorization of Firms

The second problem is that firms are likely miscategorized because Compustat data does not break down a firm’s activities. Indeed, a study by Keil highlights that Compustat datasets “assign industry codes to the company level by identifying a main line of business. All operating activities in ‘non-core’ industries are treated as being a part of this primary focus of the company.” In other words, Compustat data does not break down a firm’s activities into their respective industry.

Incorrectly categorizing firms is problematic because it means that a company operating in multiple lines of business—e.g., an e-commerce store, such as Amazon, that also operates its warehouses—will have its total sales represented in one industry, overestimating its true market share and underestimating competition in that industry. Conversely, the failure to include a firm’s other lines of business in their respective sector means that the concentration of those industries is also biased. This industry misclassification problem is further exacerbated by the lack of legal mechanisms ensuring a firm reports the correct industry they operate in. Case in point, Ford, General Motors (GM), and Chrysler are all vehicle manufacturers. Yet, Ford and GM were assigned to 5-digit automobile and light-duty motor vehicle manufacturing, while Chrysler was assigned to the 6-digit automobile manufacturing industry. As a result, concentration measures using this Compustat dataset fail to capture competition in an industry and the overall economy accurately.

Worldwide Sales Rather Than Domestic Sales

The third issue is that the dataset measures worldwide sales rather than domestic sales. This is problematic because including foreign sales will suggest that the domestic market (measured in sales) is larger than it is, biasing domestic concentration and competition measures. For example, if foreign sales are included in measuring competition in domestic markets, the resulting concentration ratio could show that a few firms making most of their revenue from abroad have a large proportion of the industry’s market shares, suggesting low competition in the domestic market. Yet, this conclusion would likely be inaccurate because these firms would have very low domestic sales and market shares. In this case, competition in the domestic market could be high, but including foreign sales would make it seem otherwise. This is why a study by Bessen asserts, “If one wants to analyze concentration in domestic markets, it can be misleading to use measures based on international sales.”

Conclusion

Due to these limitations, Compustat data is a poor measure of concentration and competition in the U.S. economy. Indeed, Compustat data fails to accurately replicate concentration measures using official Economic Census data, which represents the overall economy. According to a study by the Board of Governors of the Federal Reserve System, “industry-level correlations of top-firm concentration ratios between Compustat and Census data are low, with correlations generally below 0.2 and typically closer to 0.1” for data ranging from 2002 to 2017. More specifically, the correlation coefficients for 6-digit NAICS industries’ C4 ratio (market share of the four largest firms) were about or below 0.15 for 2002, 2007, 2012, and 2017 (the years Census data is available).

Corroborating this, a study by Bessen also concluded that the correlation coefficient between the four-firm ratio for these two datasets was only 0.196. Even more concerning, the correlations for 6-digit NAICS industries’ HHI was -0.1 for 2017, suggesting that concentration for industries trended upwards in one dataset and downwards for another. As such, any concentration measures using Compustat data do not represent competition in the overall economy. Indeed, this is why studies using Compustat data often find large increases in concentration. In contrast, a report by ITIF, using U.S. Census data, found that the average C4 ratio increased by just one percentage point from 2002 to 2017.

Given Compustat data’s limitations as a measure of concentration, policymakers should largely ignore studies of concentration using Compustat data. Both researchers and advocates who rely on this data do so to advance an ideological and political agenda. Instead, policymakers should focus on studies that use the official and much more comprehensive U.S. Economic Census data.

Editors’ Recommendations

June 29, 2020