Data is changing the fight against corruption. Two stories—one from Panama, the other from Brazil—illustrate how.
We start in Panama. On April 3, 2016, a torrent of press articles about the dodgy tax practices of the world’s wealthiest flooded the media. In a coordinated feat orchestrated by the International Consortium of Investigative Journalists (ICIJ), reporters from 76 countries began publishing stories exposing the ingenious ways the rich hide their wealth from tax authorities. These efforts were based on a massive leak of data from Mossack Fonseca, a Panama-based law firm specializing in wealth management and “tax optimization,” that covered tax-avoidance practices on a global scale spanning nearly four decades. It contained information on the opaque offshore companies, trusts, and foundations in tax havens used by the firm’s clients to hide their wealth—legally acquired or not—including criminal cartels and rogue industries such as illegal arms sales. This was the result of years of secretive investigation by journalists and data scientists.
The trove included 11.5 million documents, or 2.6 terabytes of data, from about 215,000 offshore bank accounts and shell companies. The ICIJ provided reporters the data-mining expertise needed to make the documents transparent. Dataanalytics startups such as Linkurious extracted the metadata and helped “connect the dots” through data-visualization tools. Storing the information in the cloud enabled almost 400 reporters worldwide to work together on the same project.
Those behind the leak were motivated by the injustice of rising global inequality that the tax evasion of the wealthiest revealed. “Income inequality is one of the defining issues of our time,” the anonymous whistleblower who leaked the documents wrote in the May 6, 2016, edition of the Munich, Germany, daily newspaper Süddeutsche Zeitung. “It affects all of us, the world over. The debate over its sudden acceleration has raged for years. … Still, questions remain: why? And why now? The Panama Papers provide a compelling answer to these questions: massive, pervasive corruption.” His concern about corruption is echoed by youth worldwide: A recent World Economic Forum survey of millennials in 181 nations revealed that nearly 60 percent see corruption as the most serious issue facing their countries.
Let us turn to Brazil, where open data sparked scandals about tax evasion and kickback schemes at the state oil giant Petrobras that contributed to President Dilma Rousseff’s downfall. In the wave of mass protests and recriminations that have embroiled the country, it is easy to lose sight of the narrowly technical legal case for her August 31, 2016, impeachment. In essence, Rousseff was found guilty of moving funds between government budgets— seen by many as using accounting tricks to cover up the true state of public finances. The irregularities were first detected by the NGO Contas Abertas, which reviewed budget data and documents made public under Brazil’s access-to-information laws and facilitated by its two main transparency portals, Orçamento Federal and Portal da Transparência. Contas Abertas tipped the Federal Court of Accounts (Tribunal de Contas da União, or TCU) off to the irregularities. The TCU opened an investigation, and politics took its course. Paradoxically, government-mandated transparency paved the way for uncovering government-engineered deception.
The Panama Papers scandal demonstrates the power of data analytics to uncover corruption in a world flooded with terabytes needing only the computing capacity to make sense of it all. The Rousseff impeachment illustrates how open data can be used to bring leaders to account. Together, these stories show how data, both “big” and “open,” is driving the fight against corruption with fast-paced, evidence-driven, crowd-sourced efforts. Open data can put vast quantities of information into the hands of countless watchdogs and whistle-blowers. Big data can turn that information into insight, making corruption easier to identify, trace, and predict. To realize the movement’s full potential, technologists, activists, officials, and citizens must redouble their efforts to integrate data analytics into policy making and government institutions.
A Worldwide Movement
“Sunlight is said to be the best of disinfectants,” US Supreme Court Justice Louis Brandeis wrote. The global push to unleash the power of data is generating rays. The emerging big data movement is the result of the confluence of structural transformations in our societies allowing data activists to utilize information for greater accountability. These changes include the surge of big data, the parallel increase in computing and analytics capabilities to make sense of it all, and the worldwide push to open government data to public scrutiny. Let us review these trends in turn.
First, the big data movement rises from the global surge of data generated by the private sector and individuals in their daily interactions. The new information-driven economy generates an astonishing wealth of new data every day across the globe. In 2015, there were 3.5 billion Internet users and 4.6 billion people using mobile phones to communicate and transact. According to a 2015 analysis by IBM’s Big Data & Analytics Hub, 2.5 quintillion bytes of data are produced every day and 90 percent of existing data was produced in the past two years alone. According to a 2013 report by McKinsey & Company that reviewed seven major economic sectors, harnessing data can help unlock between $3 trillion and $5 trillion in economic value. The big data market represented $5 billion in 2013 and is expected to grow to more than $50 billion by 2017, according to auditing and consulting firm PricewaterhouseCoopers.
Big data characterized by volume (datasets are massive), velocity (new data is produced at a high frequency), variety (data has a range of sources and format), and concerns over veracity (uncertainty often arises about accuracy).1 The big data movement has heretofore focused on extracting and exploiting data about citizens in the new digital economy, in particular to better understand and target them as consumers. By contrast, it has so far been less aggressively applied to analyzing how governments and corporations work. But this is gradually changing in our new digital era.
Second, the big data movement relies on strengthening computing capabilities to analyze data for the common good. Big data is mainly generated by the private sector and individuals, while open data comes from the public sector. In the public sector, the data revolution is fueled by the digital transformation of government worldwide. A first group of state reforms has focused on making government work better through upgrading “back office” functions. These modernization efforts seek to improve the efficiency of bureaucracies and the delivery of public services. They are driven by information technologies and e-government innovations, and they entail automating and digitalizing government processes. A second group of digital government reforms focuses on “front office” functions and aims to make government work for all citizens, thereby recasting the relationship between the state and the people. This includes the integration of public services in single online platforms that enable citizens to do such things as obtaining birth certificates or registering property online. Both sets of processes, in turn, generate a constant stream of data yet to be fully exploited.
Concurrently, governments are gradually opening up their data to public scrutiny. A first wave of freedom of information laws that rose in the late 1960s, marked by the passage of the Freedom of Information Act in the United States in 1966, was based on the citizens’ right to access public information. A second wave marked a transition toward a more proactive disclosure of government information with the adoption of government Web portals in the 1990s. This tide is still washing through countries, with some governments only recently adopting such laws (Paraguay in 2014) and others proceeding to updating them (Argentina). It often takes individual courage on the part of reform-minded politicians to push through data-transparency reforms. In late 2013, President Horacio Cartes of Paraguay launched a program to open government through a new access-to-information law, declaring: “What is public must be public.”
Some countries are advancing to a third stage of reforms to improve access to information based on the more stringent principle of “open by default.” In 2009, President Barack Obama launched an ambitious open-government initiative that mandated federal agencies to disclose government information proactively through user-friendly integrated websites where all government services are centralized at a single Web portal. This included a commitment to releasing government databases in open-data format. The “openness” of this data format refers not only to its public availability but also to its “interoperability”—the ability to integrate, combine, and triangulate datasets with readily available software and computing capabilities.2 In May 2013, President Obama also signed an executive order that made open and machine-readable data the new standard for government information. In the anticorruption arena, the real value of open data lies in the ability to interconnect multiple datasets to discern patterns and expose signs of corruption.
Though it holds tremendous promise, opening government data to public scrutiny remains a work in progress. Seventy countries now participate in the Open Government Partnership, launched in 2011, and have committed to opening up their government data. In July 2013, G8 leaders signed the G8 Open Data Charter, which outlines six core open-data principles. However, there remains a frustratingly low level of data openness worldwide. As of 2015, less than 10 percent of global government datasets were open—public, machine-readable, and nonproprietary—according to the Global Open Data Index, an initiative of the Open Knowledge Foundation that tracks open government data in 122 countries.
As Brazil has demonstrated, one crucial area for employing open data to prevent corruption is in the management of public finances, including budgets, taxes, and procurement. The International Budget Partnership’s Open Budget Index reveals large differences and slow progress in governments opening up budget data to public scrutiny, with only 24 of the 102 countries surveyed in 2015 having sufficient levels of budget transparency (a score of 62 or higher on a 100-point scale assessing the public availability of budget information).
Public procurement is a key risk area for corruption: Governments worldwide spend an estimated $9.5 trillion, or 15 percent of global GDP, through contracts every year, according to the Center for Global Development. Yet less than 10 percent of the 120 countries surveyed in the Open Data Index provide quality, timely, machine-readable data on government contracts, the Open Contracting Partnership reports. The World Economic Forum estimates that 10 to 30 percent of the $7 trillion that governments spend annually on construction is lost through corruption.
The Many Faces of Data Analytics
Making big data open cannot, in itself, drive anticorruption efforts. “Without analytics,” a 2014 White House report on big data and individual privacy underscored, “big datasets could be stored, and they could be retrieved, wholly or selectively. But what comes out would be exactly what went in.”
In this context, it is useful to distinguish the four main stages of data analytics to illustrate its potential in the global fight against corruption: Descriptive analytics uses data to describe what has happened in analyzing complex policy issues; diagnostic analytics goes a step further by mining and triangulating data to explain why a specific policy problem has happened, identify its root causes, and decipher underlying structural trends; predictive analytics uses data and algorithms to predict what is most likely to occur, by utilizing machine learning; and prescriptive analytics proposes what should be done to cause or prevent something from happening.
Several applications of data analytics to anticorruption are especially promising. Crowdsourcing, particularly though mobile applications, provides a potent tool for citizens to uncover corruption in its various manifestations. Mobile technology is making public information more accessible to citizens and also provides them a channel to file complaints about public services or to denounce malpractices.
In India, for example, the application “I paid a bribe” helps combat corruption by enabling citizens to report bribery and fraud by officials. Similarly, Colombia’s Transparency Secretariat of the Presidency has developed an app that allows citizens to report “white elephants”— incomplete or overbilled public works. By the end of 2015, it identified 83 such white elephants for a total value of almost $500 million, which led to the initiation of criminal proceedings by law enforcement authorities, the daily newspaper El Tiempo reported.
Neither application, however, allows for two-way interaction between citizens and officials, nor do they open the anonymized raw data for users to conduct their own analyses, perhaps due to privacy concerns. They do, however, generate a wealth of data that fraud investigators and anticorruption agencies can exploit. Experience suggests that such crowdsourcing anticorruption efforts are most effective when linked to or embedded in oversight institutions and law enforcement agencies.
Data-mining techniques are also being deployed to scrutinize public finances more aggressively. Data mining typically refers to machine-learning explorations that dive into the data to discover what patterns exist without a predetermined research question. New York City is leading the way: In 2010, the Comptroller’s Office opened the city’s $70 billion annual budget to public scrutiny by launching an online transparency portal. Checkbook NYC 2.0aims at putting “government spending at taxpayer’s fingertips.” Citizens can download the data and use data analysis techniques to reveal patterns that might raise red flags, which can create a powerful deterrent. Civic organizations can also use such open data to scrutinize government more effectively. For example, in the United States, OpenSecrets.org, an initiative of the Center for Responsive Politics, tracks the influence of money in politics using the Federal Election Commission’s data to monitor campaign finance. Similarly, the Sunlight Foundation, a nonpartisan organization, uses the tools of civic tech, open data, and data analytics to make government more transparent and accountable at state and federal levels.
In government procurement, big data analytics can also help people scrutinize the efficiency of public contracts, historically vulnerable to cronyism and kickbacks. In 2011, Slovakia introduced legislation to enforce greater transparency in public procurement, and the local chapter of Transparency International used the new open data system to expose serious inefficiencies in hospital procurement. In Georgia, the local chapter of Transparency International launched an open-source procurement monitoring and analytics portal, which extracts and repackages data from the government’s central e-procurement website to enable investigations into suspicious transactions, particularly in noncompetitive contracts. The Czech Republic, Hungary, and Slovenia have launched similar initiatives.
Fraud analytics is helping governments to reduce tax evasion and fraud in social programs by enabling the detection of suspicious transactions. For example, in the United States, the Centers for Medicare and Medicaid Services are using predictive analytics to flag likely instances of reimbursement fraud before claims are paid. The Fraud Prevention System helps identify the highest-risk health-care providers for waste, fraud, and abuse in real time and has already stopped, prevented, or identified $42 billion in fraudulent payments, The Fiscal Timesreported. In Brazil, the federal anticorruption body used fraud analytics to compare the list of beneficiaries of Bolsa Familia, the country’s largest social welfare program, with the list of automobile owners in the federal car registry and identified thousands of ineligible beneficiaries. Further digging revealed that wealthy Brazilians were evading taxes by fraudulently registering their cars under the names of Bolsa Familia recipients.
Given the vast number of interactions between taxpayers and the state and the perennial threat of evasion, tax administration is ripe for anticorruption analytics. For example, the Australian Taxation Office is using big data to search through vast amounts of records to find evidence of the use of tax havens and data matching to identify small online retailers that are not meeting their compliance obligations. In Luxembourg, in the aftermath of the LuxLeaks scandal, in which a whistle-blower released data about tax evasion schemes, analytics techniques employed by the city’s then-finance commissioner, David Frankel, helped target tax audits and improve the efficiency of investigations into companies suspected of underpaying or eluding taxes.
Three Ways to Bolster Data-Driven Anticorruption
As news from around the world confirms, data affords an enormous opportunity to aid the fight against corruption. But that promise will not be fully realized without further support. In particular, there are three ways in which data’s anticorruption potential can be maximized worldwide, even in developing countries, where corruption is a perennial plague and data collection and analysis are generally less advanced.
First, improve data quality and coverage | Advanced analytical tools can offer useful insights only insofar as the data inputed is trustworthy and broad-based. Public discussion of data initiatives tends to emphasize dissemination (open data) and usage (big data) at the expense of production. If we are going to give greater weight to data in decision making, we should care more about the quality of data to start with. Governments need to be able to produce, collect, and disseminate high-quality official statistics and maintain effective administrative registries that generate timely, credible data.
In terms of coverage, the availability of data and the overall level of economic development tend to go hand in hand. The entire data landscape in most developing countries contrasts sharply with that of the developed world: E-government is less advanced, Internet penetration and digital literacy is lower, and in some areas, the electricity grid is unreliable or nonexistent. As a result, less data is produced, and what is produced is not as widely disseminated. This poses several risks of perpetuating patterns of inequality and contributing to social exclusion. First, decisions based largely on data may be biased in favor of areas for which data is available or simply uninformed for areas without data. Second, data-driven accountability is likely to be weak. Citizens without Internet access will be unable to check procurement contracts or school-performance ratings posted online and make decisions. These risks highlight the need to continue efforts to expand broadband access and train citizens in basic computing and Internet-navigation skills, as well as in government oversight tools such as transparency portals.3
Increasing data coverage and openness, particularly for anticorruption purposes, may confront significant political resistance in developing countries, especially those with limited institutional capacity and restricted autonomy for government agencies. Influential actors may see the dissemination of information even as basic as census data as against their interest. This was the case, for example, in Guyana, where concerns over the political consequences of releasing the 2012 census data (revealing significant demographic shifts) before the 2015 presidential elections led the data to be embargoed. More-sensitive information in the fight against corruption, such as income and asset declarations and the details of public sector contracts, will naturally face even stronger resistance. Depending on the risk that influential actors perceive from releasing data, and on the comparative strength of pro-openness political forces, the impact of big and open data may continue to be limited in the places where it is most needed.
Second, build government data analytics capability | In order to realize the potential of big and open data for anticorruption, governments have to build in-house capacity both to generate useful insights and to integrate them into policymaking and policy implementation. In-house technical capacity is especially important for sustainability. While outsourcing to firms may prove an effective solution for specific tasks, the typically proprietary nature of the algorithms and software means that once the firm is gone (or it raises prices to unreachable levels), the government is unable to upgrade, modify, or expand. Governments also struggle to attract and retain data analysts, who have become a scarce commodity even in the private sector. Given an overall dearth of talent, skilled data professionals often sign on with the highest bidder, which is rarely the government.
Government innovation labs in several countries are helping address these challenges. They not only provide an attractive employment opportunity for data professionals interested in public service, but also are advancing the frontiers of big data in government, starting with descriptive analytics (and increasingly moving toward predictive and, in some cases, prescriptive analytics). These labs can help link government complaint and prosecutorial systems with government anticorruption institutions so as to help generate corrective measures. Leading models, which are focused on incubating innovation and utilizing data to improve policies, include Denmark’s and Great Britain’s. In Latin America, Chile, Colombia, Mexico, Brazil, and Uruguay, as well as cities such as Buenos Aires, Mexico, Quito, and Montevideo, have also created such labs.4
There are also important gaps in the data analytics capabilities of investigative and prosecutorial agencies, which could benefit greatly from the opportunities offered by big data. Anticorruption agencies in particular ought to strengthen their analytics capabilities by establishing anticorruption labs. In Brazil, for example, the Office of the Comptroller General of the Union established the Public Spending Observatory (Observatório da Despesa Pública) in 2008 to help detect suspicious transactions and deter corrupt practices. Procurement expenditure data are cross-checked with other government databases to identify atypical situations that, while not direct evidence of irregularities, warrant further examination.
More fundamentally, to have an impact on the design and implementation of public policies, data analytics must be integrated more directly with the policymaking process so that officials focus on the right issues, select good questions to ask the data, feed responses back into policymaking, and enable anticorruption reforms. While analytics can boost the government’s capacity for oversight, insight, and foresight, it cannot replace the deliberations of experienced policymakers, nor can it improve them without greater integration. As The Economist recently noted, “Algorithms help people make decisions, not make decisions for them.”
To be sure, not every question can be answered via big data analytics alone, and not every policy challenge should. An effective big data anticorruption strategy would determine which policy issues to focus on and what questions to ask the data, with the larger goal of designing the most effective reforms to address the underlying causes of corruption. There is also need for what international development practitioners call a “theory of change”—the map of preconditions, actions, expected results, and associated assumptions necessary to achieve a given goal—to guide research into the data. A central challenge in this regard is to move from largely descriptive analytics to prescriptive analytics, which are more actionable in policy terms. More specifically, anticorruption analytics must be connected with a country’s integrity system—including complaint mechanisms and investigative and prosecutorial institutions—in order to generate corrective measures that can prevent corruption in the first place.
Third, make data analytics more transparent and expand its reach | Any tool requiring a highly specialized skill set runs the risk of being captured by a select few, and big data is no exception. The big data movement needs to “make algorithms accountable,” as ProPublica reporter Julia Angwin demanded in an op-ed in The New York Times, by revealing how decisions are made: where the data comes from, what assumptions underpin the calculations, what weights are assigned to different data points, and what thresholds are used to identify red flags. The algorithms must be subject to checks in order to prevent data from creating or perpetuating bias. Findings from data explorations and extrapolations must also be taken with a significant grain of salt, keeping in mind that correlation—no matter how suggestive—does not establish causality.
Furthermore, to be credible, the big data movement’s anticorruption efforts must confront risks in the corporate sector more aggressively. The outrage produced by the Panama Papers has intensified pressure for greater international tax and corporate transparency. Several startup initiatives are embracing this challenge. Open Corporates, a digital platform, is creating an open database of all beneficial owners—the real owners who enjoy the income of the companies without necessarily appearing by name in the ownership title—of all of the registered companies in the world by compiling data made available by governments and companies. The database currently includes information on more than 110 million companies from 115 different jurisdictions.
Additionally, internal auditors of corporations committed to stamp out corruption are increasingly using data analytics to investigate transactions in procurement and payment models, check for anomalies, and identify suspicious transactions, such as illicit financial flows. A 2014 report by Ernst & Young highlights the relevance of forensic technology in managing compliance and mitigating fraud risks in private firms.
Several economic sectors are particularly vulnerable to corruption and therefore offer special opportunities to the big data movement. In the oil, gas, and mining sector, for example, Open Oil, a consultancy, has launched a search engine that allows the tapping of key corporate data from more than 40,000 extractive companies, including ownership, contract, and concession information. These database solutions are gradually changing the environment in which the industry operates. In real estate, another high-risk sector, further steps could be taken by opening up property registries to identify the real beneficiaries of high-value properties.
Realizing Data’s Anticorruption Promise
The data revolution is just starting, as John Doe, the whistleblower behind the Panama Papers, has suggested. Young data activists with pent-up expectations for greater government transparency and accountability are spearheading the big data movement, which promises to dramatically advance the global fight against corruption. However, to realize big data’s full potential and prevent it from becoming another disappointing fad, officials, activists, and citizens must demand that data analytics be fully integrated into countries’ broader anticorruption efforts and remain vigilant that it not be misused to scrutinize the ruled, rather than the rulers.
Disruptive technologies with even greater potential in the anticorruption fight are already emerging. In particular, blockchain technology has a number of promising applications, including tracking payments, securing public registries, and enabling smart contracts. Blockchain is in essence a shared digital ledger with no central authority in which transactions are recorded publicly and updated continuously, unalterable by any one party. It could even resolve the key challenge of beneficial ownership of shell companies to tackle tax avoidance and evasion.
For example, land titles and property registration are critical challenges for many developing countries. In Honduras, only 14 percent of the population legally occupies properties and only 30 percent of legally held properties are registered, according to a USAID report. Honduras and Georgia are working with startups to create more transparent and reliable land registries using blockchain technology. In terms of tracking applications, Everledger, a tech startup launched in 2015, aims to stamp out fraud and counterfeiting in the diamond industry with a secure and unalterable certification system using blockchain technology. If successful, this innovation could mark the end of the infamous illegally traded “blood diamonds” from conflict zones.
Despite the big data movement’s promise for fighting corruption, many challenges remain. The smart use of open and big data should focus not only on uncovering corruption, but also on better understanding its underlying causes and preventing its recurrence. Anticorruption analytics cannot exist in a vacuum; it must fit in a strategic institutional framework that starts with quality information and leads to reform. Even the most sophisticated technologies and data innovations cannot prevent what French novelist Théophile Gautier described as the “inexplicable attraction of corruption, even amongst the most honest souls.” Unless it is harnessed for improvements in governance and institutions, data analytics will not have the impact that it could, nor be sustainable in the long run.
This article originally appeared in Stanford Social Innovation Review. Used here for demonstration purposes only.