Data shapes our economic, political, and social ecosystems. However, we have little control over data's effect on those ecosystems, and its influence can distort, manipulate, and even undermine them, leading to undesirable consequences. Like rivers, data affects ecosystems through flows. Analyzing these dataflows reveals how data is used (and misused) and uncovers opportunities to harness its value. Dataflows may generate value, such as when hospitals share patient data to improve care. They can also cause harm, such as when individuals' data is sold to self-serving data brokers. Non-existing dataflows are equally significant; the lack of sharing among competitive actors, such as banks and governments, leaves much potential value unrealized. Despite dataflows' outsized impact on our lives, we have little insight into what drives them and lack integrated means to control them when they are harmful. Controls boil down to regulation (legal instruments), incentives (economic instruments), and privacy-enhancing technologies (technical instruments) that are today independently developed and whose effectiveness we do not truly understand.
My research agenda, which I call Data Ecology, aims to uncover the principles that cause dataflows and to design interventions - technical, economic, and legal - to steer them in a beneficial direction. Given a goal (a desirable outcome) for a data ecosystem (such as a company, city, or government), what interventions shall we engineer so that agents' actions lead to that goal? The research line on data ecology includes: i) formalizing this question; ii) designing new interventions (examples below); and iii) evaluating interventions' ability to steer dataflows in diverse data ecosystems. While some literature has explored these questions, data ecology offers a new lens and perspective that brings existing work into a common framework to help us advance our understanding.
My group has studied many data ecosystems by applying this dataflow lens, including data sharing and data markets; we use the latter to illustrate some data ecology interventions. Data marketplaces suffer from Arrow's Information paradox. Sellers will not release data to buyers before payment (there is no "try-before-you-buy" with non-rival goods such as data), and buyers will not pay before understanding the data's benefits; consequently, few transactions occur, even when beneficial. By applying data ecology's dataflow lens to marketplaces, we identified the uncertainty faced by buyers and sellers as the culprit of poor performance, and that helped us design a technical intervention to address it, a data escrow. Sellers register their data with the escrow, and buyers delegate computation that signals the data's value. For example, the escrow can train and evaluate an ML model on a seller's dataset and tell the buyer about the performance improvement without revealing the raw data. This escrow intervention reduces uncertainty for sellers and buyers, causing data to flow when beneficial. The data escrow is just one technical intervention; we have combined data escrows with economic incentives to facilitate the formation of data-sharing consortia (e.g., among banks and government agencies) and create beneficial dataflows that do not occur naturally. We are also studying techno-legal interventions in data ecosystems. Looming regulations and a society growing uneasy with the current data ecosystem may force changes soon. And if change is coming, we are better off understanding the effect of interventions on the data ecosystem.
While these interventions are valuable in their own right, the ultimate goal of data ecology is to provide a general theory and mechanisms for understanding and controlling dataflows. Data shapes our world, but the final form need not be fixed. Data ecology provides tools to shape it so it is compatible with our values. These tools are more critical than ever as data's influence on our world broadens and intensifies.
Go back to the main page.