Can Big Data Improve Economic Measurement?

Brian Sturgess - April 2018


The amount of data generated and stored in the digital age is accelerating rapidly, but this is not reflected in official economic statistics. Half a decade ago one estimate put the daily global creation of data at around 2.5 exabytes [2.5×1018 bytes] and it was predicted last year that this would rise to 163 zettabytes of data by 2025 [1.63 x 1022 ]. To meet this demand global per-capita information storage capacity has doubled every 40 months since the 1980s.1   Unfortunately, in the face of this revolution the tortoise-like evolution of the methods used to generate official economic statistics means that measurements of real economic activity are becoming increasingly irrelevant. 

The underlying methodology to measure concepts such as Gross Domestic Product (GDP), Consumer Price Inflation (CPI) and other price indices required to estimate real growth and productivity are determined by a global standard: the System of National Accounts (SNA). Since the end of the second-world war, apart from some revisions based on experience during the 1960s, these standards have been significantly upgraded only four times in 1953, 1968, 1994 and 2008. Much of these revisions have attempted to make better estimates of economic activity that is not tangible or easily measurable such as financial and public services and government. To make comparisons of the real output for these sectors with material economic activities such as the production of shoes, cars, processed food etc., national accountants have calculated ingenious, although often dubious output and price indices. 

The explosion in the use of the Internet and mobile telephony since the millennium has compounded the measurement issues faced by the SNA casting further doubt on the validity of official statistics. The main problem in measuring economic activity in the digital age is that services are offered at a zero or low price which makes the calculation of price indices irrelevant undervaluing output and consequently measures of productivity. But economic activity is still occurring and the problem is greater than trying to estimate the size of what has been termed the ‘gig’ economy. National accountants have not realised that consumers are still offering value for these free services in terms of the data about them and their habits supplied to the likes of Facebook, Google, Apple, Amazon and Microsoft. Effectively the currency of the net giants is data and until data is valued correctly the use of price indexes to deflate nominal national output will produce biased results. It is no accident that the combined capital market value of big five technology giants is already around US$3 trillion just less than 20% of the USA’s standard estimate of GDP. 

The potential to use big data to gain more realistic and timely estimates of economic activity and side-step the laborious methods employed by national accountants is already being explored. Employing different big datasets is allowing almost instantaneous estimates to be made of some aspects of overall economic activity particularly where the distrust of official statistics has been greatest in Africa and China and for a while in Argentina. For example, distrusting the official inflation data in 2006 Professors Cavallo and Rigobon at the MIT used online retail price data to construct an alternative price index. This has now been transformed into the Billion Prices project2 which collects data to provide daily inflation estimates for 22 economies. Satellite data on the emission of light generated across Africa data has been used to make estimates of GDP3 but satellites can also monitor transport, freight and construction  activity more accurately than surveys and more quickly than accounting data. One company SpaceKnow4  produces the China Satellite Manufacturing Index based on 2.2 billion snapshots of 6,000 industrial sites spread out over 500,000 sq. km. 

These examples show that the potential to use big data to produce more accurate and timely estimates of some aspects of economic activity than the backward facing quarterly estimates produced by national statistics offices is already here. However, there are a number of problems. The SNA aggregates data in line with received economic models of how economies are composed and although they suffer serious problems in measuring the digital economy because of lack of prices, nobody as yet has produced a theory to understand how this economy, which is still evolving rapidly, is constructed. 

Furthermore, satellite and online price data are publically available, while the really valuable data about consumers is held by private companies such as social media platforms, online retailers, telecom operators and banks. While there are important security and privacy issues about sharing this data, estimates of economic activity will not be complete unless statistical agencies are given access to it. Professor Diane Coyle, a fellow at the UK’s Office of National Statistics (ONS) believes the government should be given free access to private sector data to make faster and more accurate estimates of GDP for the public good. Given the growing public concern over the market, social and political influence of the net giants, this data might be offered as part of a grand bargain. Alternatively given the data that governments already have about private economic activity garnered by law and the temptation for politicians to manipulate figures – witness the whole Brexit debate - a more comprehensive distribution of public estimates of economic activity might indeed be a public bad. Sir John Cowperthwaite, Financial Secretary of Hong Kong throughout the 1960s, the period of its economic miracle, banned the publication of all, but the basic minimum of economic statistics on the grounds that they were dangerous and subject to distortion.  Until these technical and legal issues are addressed big data will only be used partially to provide valuable estimates of some aspects of economic activity.

  1. Hilbert, Martin; López, Priscila (2011). "The World's Technological Capacity to Store, Communicate, and Compute Information". Science. 332 (6025): 60–65. doi:10.1126/science.1200970. PMID 21310967.
  2. http://www.thebillionpricesproject.com/
  3. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4108272/ and http://documents.worldbank.org/curated/en/912151468188369841/pdf/WPS7461.pdf
  4. https://spaceknow.com/