ITIF Search
The Economics of Data, With David Deming

The Economics of Data, With David Deming

Data is one of the most essential and valuable assets in the world. It impacts everything from the ads we see and the products we buy to national security. Rob and Jackie sat down with David Deming, the Academic Dean and a Professor of Political Economy at the Harvard Kennedy School and the Director of the Malcolm Wiener Center for Social Policy at the Harvard Kennedy School, to discuss the importance of data, data sharing, and ways to protect individual data privacy.




Rob Atkinson: Welcome to Innovation Files. I’m Rob Atkinson, President and Founder of the Information Technology and Innovation Foundation. We’re a DC-based think tank that works on tech policy.

Jackie Whisman: And I’m Jackie Whisman. I head development at ITIF, which I’m proud to say is the world’s top ranked think tank for science and technology policy.

Rob Atkinson: And this podcast, it’s about the kinds of issues we cover at ITIF, from the broad economics of innovation to specific policy and regulatory questions about new technologies. Today we’re going to talk about data and data sharing and how to safely share socially-valuable data, while also safeguarding individual privacy.

Jackie Whisman: Our guest is David Deming, who is the Academic Dean and a Professor of Political Economy at the Harvard Kennedy School and is the Director of the Malcolm Wiener Center for Social Policy at the Harvard Kennedy School. His research focuses on higher education, economic inequality, skills, technology and the future of the labor market. Welcome.

David Deming: Thank you. It’s really great to be here with you both today.

Jackie Whisman: That’s a big bio. You’re busy.

David Deming: I try. Yeah.

Jackie Whisman: So to start, could you tell us a little bit more about what all of that means about your work at the Harvard Kennedy School, particularly as it relates to our topic today, data and privacy?

David Deming: Sure, Jackie. Yeah, so I’m an economist by training. And so I tend to think about benefits and costs, trade-offs, as economists do. My own work is really focused on, as you said, education, skills, technology. And, I guess, it’s kind of thinking about human potential, for a lack of a better way to say it. How can we leverage technology to empower people to be more productive in the workplace and to lead happier more prosperous lives is something I’ve become increasingly interested in, in part, because if you look at the way that the labor market has changed, so much of it is about the way that technology has changed work. So, you think about the sort of digital revolution, the computerization of the labor market, what are these new technologies good at? They’re good at replacing the kinds of things that people used to do that were kind of repetitive and rote and can be recorded in some sort of formal process, for example, using data.

And what’s left are the things that are kind of hard to record and ambiguous. And so soft skills like creativity, and teamwork, and problem solving become more important, because we’ve got machines to do the other things. And if you kind of just carry that forward, well, first, we had machines kind of replacing these routine mechanical and physical operations. You used to have a shovel to dig a hole and now you got a bulldozer. That’s a mechanical replacement of labor. And now, increasingly, we have digital replacement of people. So where we used to have payroll clerks who would add up a ledger, if you owned a grocery store or something, adding up what you bought, looking at what you sold, making sure you’re keeping the bottom line healthy, now, we have tools like Microsoft Excel spreadsheets, digital recording of data to replace that.

And that data lives on forever and has a record and can be used by people who are running a business to make good decisions about what they should buy, what they should not buy, what they should sell their products for and so on. And you can sort of play out that analogy throughout the labor market. And so the value of data for improving decision making is central. And as we have acquired more data, the ability to use that data well becomes a really important human skill. And so that was kind of my, it’s a long story, Jackie. But that’s my entree into this space. And then I just started to think more about, well, one of the neat things about data is that it’s a kind of renewable resource. It can be replenished. So my use of the data doesn’t crowd out your use of the data.

And so that means all else equal, the more people who can use it, the better off we can all be. But then there are these privacy trade-offs. So that was kind of the genesis of this column that I wrote in the New York Times about balancing the benefits of data sharing against the privacy cost of it.

Jackie Whisman: And in that column, which is actually how we found you, so thanks for being here, in that column for the New York Times “Economic View,” you wrote about a lot of necessary public uses for data. Can you go into those a little bit?

David Deming: Sure. So, I mean, again, just to say that the really neat thing about data is it’s what economists would call non-rival, which means my use of it doesn’t prevent your use of it. Most things are not non-rival. Most things are rival, right? If I’m using a physical tool, that means you can’t use it while I’m using. It could be a bulldozer to go back to my example, or really anything else, a computer. But I’m a labor economist and so I work a lot looking at data on the state of the economy. If I’m looking at the unemployment data for the recent month, and I’m kind of crunching the numbers to figure out what do we think the job market’s going to look like next month, I could be doing that at the same time Rob and Jackie are doing that on their computers. And it doesn’t prevent any of us from doing it at the same time. So we can all use the same data to produce different insights or the same insights. And it’s kind of, as I said, a renewable resource.

And so that’s really important for thinking about economic growth, because you think about, so this is Paul Romer who recently won, not that recently, but a few years ago won the Nobel Prize for this fundamental insight that knowledge is a non-rival good. And that creations of knowledge, what he calls recipes, so like new ways of doing things, developing vaccines is one example, is something that if I figure out a new recipe or a new formula for how to get something done using data, then that can immediately replicate all over the world. So once you figure out the formula to creating, let’s say, an mRNA vaccine, you can distribute that knowledge everywhere. And so the example that I gave in the column was about how the US government led the human genome project by enforcing strong norms about data sharing.

There’s what they call the Bermuda Principles, which was all of the labs who were participating in sequencing the human genome agreed together that they would post their data on a public website within 24 hours of sequencing it. So whenever they sequenced a new gene, they’d post that data for anyone to use. And it was actually that set of norms, that Bermuda principle commitment, that allowed folks to immediately map the coronavirus genome. Within a few days of it servicing, we actually had the ingredients to produce a vaccine before it even came to the US. That was how quick the diffusion of that knowledge was. So that’s a case where making a commitment in advance to making data open and available to everyone, literally saved millions of lives, in ways that nobody predicted at the time.

So the Bermuda Principles were not about preventing the next pandemic. They were a general norm about the benefits of having data available to everyone, for lots of reasons. No one foresaw this particular use. But you could imagine that those same set of principles will have other benefits many years later from now that we can’t foresee. And so that’s just one example of how a commitment to making data available to everyone frees up innovation and frees up knowledge and the growth of knowledge in a way that can save millions of lives.

Rob Atkinson: I like the fact that you’re thinking about this from an economics perspective, because as much as I’m sometimes critical of economics for being sort of too focused on allocation efficiency and not enough on innovation, it is important to analyze public policy questions, at least looking at economic issues. It doesn’t have to mean they’re paramount in the decision making, but they at least need to be considered. And we can talk a little bit more about why that isn’t happening as much as I think it should be in Washington.

But one of the keys to me is it seems like this is partly a prisoner’s dilemma issue. If I’m an individual and my data is there and I can share it, and let’s just say there’s a minuscule risk of my data being shared in a way that could do harm to me, I actually think the risks are pretty low if you do de-identification and other things. But let’s just say those are minuscule. So I, as a prisoner in this game, my choice would be to not share my data, but to have everybody else share their data, because then I get almost all the benefits and I get no risk of cost. And I think when we look at the debate, that seems to be people in the privacy debate are acting as if they’re the prisoner making that choice and not looking at the broader thing. Do you agree with that? And do you have a sense of why and how that plays out?

David Deming: Yeah, so I do agree, Rob. I mean, in the sense that the benefits of data, because data is a public good, we know, and this is something that economists say, “This is why we call it the dismal science,” I’ve been saying for a long time, is that any time something has benefits that are not monetized, it tends to be under provided, because people are selfish. So that’s exactly the reason you said. It’s well, it would really be best for me if I didn’t share my data, but everybody else did. It’s similar to debates on vaccination. If everyone else gets vaccinated, then you don’t need to get vaccinated, because of herd immunity. And so it’s the same logic that we’ve been seeing play out in the public sphere. It’s really hard to get people to contribute to public goods, because of the incentives to what people say is free ride. So free ride on the participation of others.

And so I think where I might depart from what you might consider what an economist would say, is one solution would be to actually monetize it, right? So we could pay some people. Some people talk about this as a data dividend. So if your data is being used by companies, you’re paid out for the benefits of that data. I think I actually wouldn’t go that direction, in part because the benefits for any one individual are likely pretty small. Maybe you’d get paid out a couple bucks per year for big companies using your data. It’s like, really going to move the needle on anything? Probably not. But I think what’s more important is to solve the technological problems with privacy, to increase public trust, to communicate better about the benefits of data sharing and then to take privacy very seriously. That’s not a political problem. It’s a technological problem.

So if you can convince people that actually privacy can be protected and that there are big social benefits to sharing your data, I think you can convince people to do it as long as everyone else is doing it. And that’s about social norms. So that’s the direction I’d go in. It’s not really a kind of doctrinaire economist direction. I think it’s maybe a kind of thing that an economist who works at a school of public policy, or is used to thinking about public problems, would say. And so, guilty as charged in that respect. Maybe that, to go back to your first question, that’s my Kennedy School influence speaking, is I think about this as a problem we have to solve publicly through norms and technology.

Rob Atkinson: So you mentioned that yeah, you wouldn’t make very much money. David Moschella, who’s a really great IT expert, used to be at Computer World Magazine and written a number of books, he’s writing a series for us this year called, Defending Digital. And in the last piece he wrote was just on this point, that if everybody was able to monetize their data, you’d get almost nothing. It’s minuscule. But one thing, you mentioned norms and there’s a whole literature on this notion of nudge. Government can nudge you in a certain way. And I think one of the most important areas there from a policy perspective is this question of opt-in versus opt-out. If you give people a choice of opt-out, “Hey, do you want to opt out?” I don’t know, 5%, 6, 8% of people do it. If you give them a choice of opt-in, yeah, 5, 6, 8% of people opt in. And we saw that with the new Apple.

Every time you download an app, you want to opt in. And it seems to me, that’s one of the ways of getting the norm, is not, at least in our view, not mandating an opt-in standard, but rather for the people who really care and don’t want to do it, sure, go ahead and feel free to opt out. What are your thoughts on that?

David Deming: I think for the behavioral reasons you suggest, Rob, that could probably work, just because most people don’t pay attention. You still have the free rider problem, that the people who opt out are still going to get the benefits. But maybe it’s a small enough share that it works okay. I think if it started to get bigger, you’d worry about kind of selection in who opts. So it’s going to make all these predictions from data worse if the people who opt out are really different than the people who opt in. And actually, a good example of that is the polling errors in the 2016 election of Donald Trump, where it was well known that there were systematic errors and who responded at the polls and who didn’t that made the forecasting off in ways that swung the election relative to expectations. And you’re talking about huge data from pollsters, but it’s all biased a little bit in favor of Clinton, in this case.

And so you’ve got lots and lots of data, but if there’s a small amount of bias in it in a close election, it can give you the wrong answer in a predictive sense. And so I do worry a little bit about the opt-out if it gets to be more than a couple of percentage points. So there’s been some proposals to create a federal agency, or to create a sub-agency within an existing agency, that handles data privacy. Senator Kirsten Gillibrand and others have proposed something like that. I like that idea, because I think these issues aren’t going away. So I think you need some body that takes this issue seriously. I would just give them a dual mandate, as I said in the column, to not just think about privacy, but also say, “Hey, look, if we can protect the data, let’s go the extra mile to make it easily available to people, because of the benefits of it,” going back to this conversation about mRNA vaccines.

And then I think it’s some of these questions are really questions that we should resolve as a public. So I have an opinion about how to solve the public goods problem and how to protect data. You have an opinion about it. It’s something we kind of have to agree on as a society, after we’ve had the conversation about the benefits. I think if you ask people now, they tend to only think about the privacy, only the costs of giving my data away. So I would like to do a little more public communication on the benefits and then have [inaudible 00:13:04].

Rob Atkinson: Yeah, absolutely. Daniel Castro leads our Center for Data Innovation and that’s really one of their core missions is to just show... I mean, we just did a report or just coming out, I can’t remember, on data use and education, particularly K12, but also college. It’s really a very powerful way to improve outcomes, just because you have a better idea of what works. By the way, one thing I can’t resist, because I agree with you on this potential opt-out bias and there’s all this concern, and some of it justified, on AI and bias. Well, if you have biased data sets because of opt-out, that can lead to that. But one of the things that some of the laws have proposed is that if you opt out, you have to be given the exact same suite of services as if you opt in, or don’t opt out. To me, that’s a little bit like free riding. You’re not contributing something, but you’re mandated to get the same services. So, I don’t know, thoughts on that?

David Deming: Well, so I guess maybe it’s worth talking a little bit more about what we’re talking about opting into. So I think there’s one set of things is if you’re working with Facebook or another company, should you be able to opt out of the things they do with your data? Yes, absolutely. I guess, that’s not really what I’m talking about. That’s a private company using your data to make a profit. You should have the right to say, “No, you can’t use my data.” But there’s a different thing, which is, let’s say, the US Census, or public agencies that are collecting data to serve a public purpose, that I would prefer to not have an opt-out process in. I would prefer rather to say to people, “We’re going to protect your data. We’re going to use state-of-the-art methods to do that. And we’re going to punish violators of this, with certainty.” So we’re going to take seriously violations of privacy standards.

And the public can feel confident that we’re going to deter people from thinking about stealing their data. And I think that’s something where you might say, “Well, government, it’s so hard to get things done.” But my view is, we have to find a way to do it right, because it’s so important. So I would love for there to basically be a federal agency or again, office within an agency, that takes data provision and data protection seriously, across all units of government. And really thinks hard by the tech technological problem, recruits talented people to think about this, people who have internalized the importance of it for the public good. I think actually, we could train some folks like that at the Kennedy School. And I’d be excited to do that. So kind of a cohort of people who understand the benefits to the public of data and the importance of protecting privacy. We’re talking about public data, so.

Rob Atkinson: We have some of that now. I agree with you that that’s a broad-based skill that all the agencies should know and all. But we do have some of that now. For example, in HHS, there’s a team there that it looks at what you’re doing with data and assesses, statistically, the likelihood of re-identification. And it’ll come down on you and say, “Wait a minute. You can’t use this. If you include this variable, the risks of re-identification go up to unacceptable amounts.” So it seems to me that we could just be doing a lot more of that kind of thing and educate people on how to do this.

David Deming: Absolutely, Rob. And I think there are, as you say, there are some agencies that are doing this. I would say in different ways, which is fine for the purposes of experimentation, because this is uncharted waters, in some sense. But I think it’d be better to have an integrated approach. Just to give you one example of a contentious issue, the Census has adopted these differential privacy standards with data that do protect individual privacy, but also come at pretty serious costs, because they basically are perturbing the micro data in ways that prevent detection, but also lead to non-trivial errors and estimates in any small sample of what... So if you want to know what’s the unemployment rate by occupation in a metropolitan area and you’ve got perturbed data with relatively small samples, you’re going to get a meaningfully wrong answer with perturbed data.

An agency has just decided, this is where we want to place the trade-off. And maybe that’s right, maybe that’s wrong. But it feels like something we ought to have a public conversation about and not let every agency make their own choice in their own way and kind of in a non-systematic way. And I should just say, my personal belief, not with respect to Census, but in general, is that all the incentives push us toward privacy and not toward public sharing of data. And so that’s why I wrote the column, to try to be a voice on the other direction, to say, “We can’t ignore these huge benefits.”

Jackie Whisman: It seems like another example of this is HIPAA, which everybody is now familiar with. It seems like there’s a lot of things-

Rob Atkinson: Unfortunately, Jackie.

Jackie Whisman: Yeah. I know. But we’ve all just kind of accepted that HIPAA is a great methodology or whatever. And it feels like there’s a lot of information of my own that I could be sharing that is not being shared, that is not particularly interesting or proprietary, but could be helpful for health research.

David Deming: Yeah. It’s a great point, Jackie. If you think about HIPAA, you would say, “Well, do I want to share my data with the hospitals about my sensitive medical history? No, not really. And if I’m the only one sharing it, I definitely don’t want to share it, because there’s no value to just knowing David Deming’s medical history.” But if I say, “Well, if I share it, everyone else is going to share it too. And then in terms of predictive analytics, knowing the interaction between somebody’s medical history, the treatment they received and the outcome, would let us increase survival rates from a procedure by 20%, because we know on whom it works and on whom we ought to do more watchful waiting, or whatever, based on data. And that if you’re going to go have a bypass surgery you have a 20% higher chance of living or something, sign me up for that!”

I don’t know if it’s 20% or if it’s 1%. I don’t know. But the point is, we haven’t made those benefits concrete to people in a way that would get them excited about sharing their data, or even willing to share their data. We just have not had that conversation. So, of course, people are going to say no. We’re not making the benefits clear. So maybe in this conversation we are. But we, as a society, are not, so.

Jackie Whisman: And Rob uses this example a lot. Rob says frequently, when we’re talking about this, is he wants his kids to have access to better medication and resources than we have. And if you put it in those terms, if my daughter having access to my medical data when I’m gone helps her live longer, then, of course, I would want that.

David Deming: Yep. That’s very well said. Couldn’t have said it better myself.

Rob Atkinson: Yeah. David, I mean, I really think this is really critical, almost the core issue here. Because if you’re a member of Congress and all you’re hearing is data privacy, for example, in Europe they actually say strange things to me all the time. But one of the strange things they say is that privacy is a fundamental human right. And when they were passing GDPR over there, their Data Protection Bill, one of the opponents, or group that opposed, at least, not like they opposed the entire bill, but they opposed provisions that would’ve made it difficult, was the Swedish Oncologist Association, because they said, they thought this would make it harder for them to do oncology research. And I remember I was in a debate in Brussels with somebody and I said, “Surviving cancer is a fundamental human right, too.” And we haven’t done a good enough job of explaining, as you said, those trade-offs and the balances there.

David Deming:I think that’s right. And I that one of the reasons why is because, well, most of us, the way we think about possible violations of our privacy with data are with private companies who are trying to sell us advertising on the web or on our phones. That’s the dominant way in which we experience privacy loss is some company, I logged in to get some discount and they used my email address and they harvested it and somebody stole the password. And it’s not hard to understand why people would come to that conclusion. I think we just have to have a clearer conversation about the benefits to the public of making certain kinds of data available to public agencies, or researchers, or whatever.

It’s very different than allowing private companies to harvest your data and micro target advertisements to you. The public benefits of that latter thing are basically zero, maybe negative. And so I, in a way, at least for me, this comes back to a kind of very slow progress on understanding the value of data for solving public problems. We need more technological and more data sophistication in our public problem solving.

Rob Atkinson: I guess, I might differ with you a little bit on that. I mean, I think there’s a difference between a private company, who uses your data without your permission, to sell it versus a private company that anonymously just matches you. Hey, I like basketball. And so fine, I get a basketball ad, but they don’t know me. Also, one of the other interesting things, I was in Australia a few years ago and was talking to the, there’s a big government research center there, and they were actually using publicly, well, data that’s public, like Facebook and Twitter, actually, to solve some really interesting questions.

And I know Facebook has done this, for example. They did something a few years ago on where they were able to give academics access to anonymized data sets. And one of the issues was something about postpartum depression. And that’s about all I remember. But they were able to make some interesting findings and discoveries. And so, again, to me, it’s a question of how do you use it? And are there the right restrictions in place to keep it private?

David Deming: That’s a very fair point, Rob, and allow me to make an addendum to what I said before. I don’t think it’s about whether the company who’s using the data is private or public, but rather whether the data is being used to solve a public or private problem. So you can use privately-collected data, in fact, many people do, to solve public problems. And I think that’s great. And your example is that.

Rob Atkinson: David, maybe one last question and that’s, I don’t know how familiar you are with this concept, but I know the UK government, they actually have a national data office. I think it’s in the prime minister’s office, or somewhere there. I met with them and one of the things that they’ve put in place is this notion of data trusts. So it’s the notion of, “Hey, we’re going to put all the healthcare data together. But hey, here are the rules. Here are the procedures. Here are the statistical things you have to do.” And the same thing, for example, with smart cities. I know some places have, “Let’s put all the smart city data together and then we can learn from each other.” That requires government, though, to organize a trust, to put it in place, the sharing mechanisms, the protection mechanism. Just curious what your thoughts are on that.

David Deming: Well, I’m not intimately familiar with that initiative, but from what you say, it sounds like the UK is ahead of the US, at least the federal government, on this. I think there are really, as you say, smart cities is one example. There are promising models at the state and local level of using data to solve problems. Part of this, I think, it will develop over time, because these data have not actually been around and usable for that long. And we’re just beginning to understand how they can be helpful and how they can be harmful. And so part of it is just creating knowledge over time. And younger generations will be more familiar with these techniques and using data and so forth. So I think part of it is that.

But I would like to accelerate that by getting people excited about, frankly, about what problems we can solve with data that are hard to solve without it. And so that’s where again, that’s why I wrote the column. And that’s why I’m delighted to talk to you all about it, is I don’t hear this conversation happening often enough for my taste. Because as somebody who uses data for a living and uses it to understand, not just with research questions, the really important questions, like when I was thinking about where to send my kids to school. I’m looking up data. And lots of people are. And so it just seems like a kind of core competency in the 21st century and therefore, something that governments ought to be good with, as well.

Rob Atkinson: The point you made, I thought, was such an important one, which is we’re still pretty early on in this. And we haven’t fully explored all the technology issues, all the governance issues. And what I worry about is I think the Europeans went too fast, too soon and they precluded and closed off certain areas. So it’s not like we shouldn’t have regulation, but I think we’ve got really be careful that as we regulate this area, we put both of those priorities in place, data innovation and use, like you said, and also privacy. And I worry that if we maybe go too fast, too far, we might close off the latter, or the former, I should say.

David Deming: I’m worried about that as well. And I think that the only way out of that is to, again, make the conversation about the benefits of data use just as important as the conversation about the costs of privacy violations. It’s not that privacy’s not important. It’s very important. It’s just not the only thing that matters in this space. And that conversation, the European example, I think, shows that that conversation about privacy has moved much faster and gone much deeper than the conversation about the benefits of data sharing. So time to rebalance.

Rob Atkinson: On that note, David, this was fantastic. Thank you so much for joining us.

David Deming: Oh, it was my pleasure. A lot of fun to talk to you both.

Jackie Whisman: And that’s it for this week. If you liked it, please feel free to grade us and subscribe. Also, email show ideas or questions to [email protected]. You can find the show notes and sign up for our weekly email newsletter on our website, and follow us on Twitter, Facebook and LinkedIn @ITIFdc.

Rob Atkinson: We have more episodes and great guests lined up. New episodes drop every other Monday. So we hope you’ll continue to tune in.

Back to Top