R in the statistical office: The case of UNIDO V. Todorov 1 1 United Nations Industrial Development Organization, Vienna New Techniques and Technologies for Statistics 2017 Brussels, Belgium 14-16 March, 2017 Todorov (UNIDO) R in UNIDO NTTS 2017 1 / 47
Outline 1 About UNIDO, UNIDO Statistics and R 2 R for Data Exchange 3 R as a graphical engine: package yearbook 4 Imputation of Key Indicators: package unidocip2 5 REST APIs 6 IO Analysis, WIOD and the package rwiot 7 Industrial statistics for business structure: package indstat 8 Competitive Industrial Performance (CIP) index: package CItools 9 Maintenance of UNIDO databases with R 10 Technical assistance 11 Summary and conclusions Todorov (UNIDO) R in UNIDO NTTS 2017 2 / 47
About UNIDO, UNIDO Statistics and R Outline 1 About UNIDO, UNIDO Statistics and R 2 R for Data Exchange 3 R as a graphical engine: package yearbook 4 Imputation of Key Indicators: package unidocip2 5 REST APIs 6 IO Analysis, WIOD and the package rwiot 7 Industrial statistics for business structure: package indstat 8 Competitive Industrial Performance (CIP) index: package CItools 9 Maintenance of UNIDO databases with R 10 Technical assistance 11 Summary and conclusions Todorov (UNIDO) R in UNIDO NTTS 2017 3 / 47
About UNIDO, UNIDO Statistics and R About UNIDO UNIDO was set up in 1966 Became a specialized agency of the UN in 1985 Promote industrialization throughout the developing world 168 Member States (as of January 2017) Headquarters in Vienna Represented in 35 developing countries Todorov (UNIDO) R in UNIDO NTTS 2017 4 / 47
About UNIDO, UNIDO Statistics and R About UNIDO Statistics Service Module Industrial Governance and Statistics : monitor, benchmark and analyse the industrial performance and capabilities formulate, implement and monitor strategies, policies and programmes to improve the contribution of industry to productivity growth and the achievement of the Sustainable Development Goals (SDG) UNIDO is a custodian agency for six indicators in Goal 9. Building capabilities in industrial statistics - providing technical assistance to: Introduce best practice methodologies and software systems Enhance the quality and consistency of the industrial statistics databases Todorov (UNIDO) R in UNIDO NTTS 2017 5 / 47
R for Data Exchange Outline 1 About UNIDO, UNIDO Statistics and R 2 R for Data Exchange 3 R as a graphical engine: package yearbook 4 Imputation of Key Indicators: package unidocip2 5 REST APIs 6 IO Analysis, WIOD and the package rwiot 7 Industrial statistics for business structure: package indstat 8 Competitive Industrial Performance (CIP) index: package CItools 9 Maintenance of UNIDO databases with R 10 Technical assistance 11 Summary and conclusions Todorov (UNIDO) R in UNIDO NTTS 2017 6 / 47
R as a graphical engine: package yearbook Outline 1 About UNIDO, UNIDO Statistics and R 2 R for Data Exchange 3 R as a graphical engine: package yearbook 4 Imputation of Key Indicators: package unidocip2 5 REST APIs 6 IO Analysis, WIOD and the package rwiot 7 Industrial statistics for business structure: package indstat 8 Competitive Industrial Performance (CIP) index: package CItools 9 Maintenance of UNIDO databases with R 10 Technical assistance 11 Summary and conclusions Todorov (UNIDO) R in UNIDO NTTS 2017 7 / 47
Imputation of Key Indicators: package unidocip2 Outline 1 About UNIDO, UNIDO Statistics and R 2 R for Data Exchange 3 R as a graphical engine: package yearbook 4 Imputation of Key Indicators: package unidocip2 5 REST APIs 6 IO Analysis, WIOD and the package rwiot 7 Industrial statistics for business structure: package indstat 8 Competitive Industrial Performance (CIP) index: package CItools 9 Maintenance of UNIDO databases with R 10 Technical assistance 11 Summary and conclusions Todorov (UNIDO) R in UNIDO NTTS 2017 8 / 47
Imputation of Key Indicators: package unidocip2 Manufacturing and industrial statistics Industrial development is a driver of structural change which is key in the process of economic development. Industrial statistics allow to identify and rank the key production sectors, major economic zones in the country, major size classes Specialized and structural statistics on industry (as well as on other economic sectors) are demanded more than ever by researchers and analysts to assess implications of the process of the globalization to individual countries: Synthesized data on world development trends. Internationally comparable data to assess the growth and structure of one region in the world vis-à-vis others. A complete set of data on their field of interest to avoid measurement discrepancies. Regular data production to update/correct policy measures. Todorov (UNIDO) R in UNIDO NTTS 2017 9 / 47
Imputation of Key Indicators: package unidocip2 Structural statistics for industry: UNIDO databases UNIDO databases Cover the manufacturing sector Refer to economic statistics, mainly production and trade related, not technological or environmental data Include statistical data from the annual observation within the quality assurance framework (no experimental or one-time study data) Official data supplied by NSOs (abided by the resolution of UN Statistics Commission) Further details: http://www.unido.org/index.php?id=1002103 Follow the UNIDO Quality Framework (Upadhyaya and Todorov, 2008, 2012) Todorov (UNIDO) R in UNIDO NTTS 2017 10 / 47
Imputation of Key Indicators: package unidocip2 UNIDO databases: summary INDSTAT DB by ISIC and by country Number of establishments Number of employees Number of female employees Wages and salaries Gross output Value added Gross fixed capital formation Index numbers of industrial production MVA DB by country GDP at current prices GDP at constant prices MVA at current prices MVA at constant prices Population IDSB by ISIC and by country Output = Y Import= M Export = X Apparent consumption = C C = Y + M X Todorov (UNIDO) R in UNIDO NTTS 2017 11 / 47
Imputation of Key Indicators: package unidocip2 UNIDO Statistics online portal http://stat.unido.org/ Todorov (UNIDO) R in UNIDO NTTS 2017 12 / 47
Imputation of Key Indicators: package unidocip2 Imputation in international statistics Survey data (micro) Multiple variables observed for a sample of observation units from a population at one point in time Gaps in the data are classified as: Item non-response Unit non-response Variables not included in the survey Time series data (macro) Contain data for multiple time periods Contain data for aggregate (or macro) units (sections) Sections are usually countries Variables are usually statistical indicators (like GDP, MVA, etc.) Todorov (UNIDO) R in UNIDO NTTS 2017 13 / 47
Imputation of Key Indicators: package unidocip2 Imputation INDSTAT: Cross-sectional time series data Four different types of time series data structures (Denk and Weber, 2011): 1. Single univariate time series 2. Single multivariate time series 3. Cross-sectional univariate time series 4. Cross-sectional multivariate time series Missingness patterns The relevance and applicability of missing data techniques depends on: 1. missing items; 2. missing periods, 3. missing variables, and 4. missing sections (countries). Todorov (UNIDO) R in UNIDO NTTS 2017 14 / 47
Imputation of Key Indicators: package unidocip2 Imputation INDSTAT: Description of the data set Variables of interest 1. GO - Gross output 2. VA - Value added 3. WS - Wages and salaries 4. EMP - Number of employees Auxiliary variables 1. IIP - Index of Industrial Production 2. MVA - Manufacturing Value Added (from SNA) 3. IMVA - Index of MVA 4. CPI - Consumer price index Todorov (UNIDO) R in UNIDO NTTS 2017 15 / 47
Imputation of Key Indicators: package unidocip2 Imputation INDSTAT: Description of the data set The following variables will not be considered: GFCF - Gross fixed capital formation the economic relation to GO and VA is too weak EST - Number of establishments too heterogeneous due to difference in definitions Todorov (UNIDO) R in UNIDO NTTS 2017 16 / 47
Imputation of Key Indicators: package unidocip2 Imputation INDSTAT: Analysis of the missingness Package VIM VIM Visualization and Imputation of Missing Values An R package (Temple et al., 2010) Tools for visualization of missing values, useful for exploring the data and the structure of the missing values May help to identify the mechanism generating the missings What to analyze Time series evolution of missingness The multivariate dependence in the missingness across the variables Todorov (UNIDO) R in UNIDO NTTS 2017 17 / 47
Imputation of Key Indicators: package unidocip2 INDSTAT: Time series evolution of missingness (main) Employment Wages and Salaries 0.0 0.2 0.4 0.6 0.8 1.0 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 missing 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 missing 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Gross Output Value Added 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 missing 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 missing 0.0 0.2 0.4 0.6 0.8 1.0 Todorov (UNIDO) R in UNIDO NTTS 2017 18 / 47
Imputation of Key Indicators: package unidocip2 INDSTAT: Time series evolution of missingness (auxiliary) IIP CPI 0.0 0.2 0.4 0.6 0.8 1.0 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 missing 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 missing 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 IMVA 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 missing Todorov (UNIDO) R in UNIDO NTTS 2017 19 / 47
Imputation of Key Indicators: package unidocip2 INDSTAT: Multivariate dependence of missingness across variables Proportion of missings 0.00 0.05 0.10 0.15 0.20 0.25 Combinations 67 89 89 97 207 258 301 1352 Output IIP CPI Output IIP Proportion of missings 0.00 0.05 0.10 0.15 0.20 0.25 CPI Combinations Output IMVA CPI 1 18 23 38 118 163 507 1592 Output IMVA CPI Todorov (UNIDO) R in UNIDO NTTS 2017 20 / 47
Imputation of Key Indicators: package unidocip2 Imputation INDSTAT: Deterministic approach based on economic relations Impute the four variables of interest using economic relationships between the variables. Start with estimation of the missing observations for Gross output based on available production indexes or Value added. Estimate Value added, Wages and salaries and Employment on the basis of past trends in the relationships between output and these three variables. At total manufacturing level. Todorov (UNIDO) R in UNIDO NTTS 2017 21 / 47
Imputation of Key Indicators: package unidocip2 Deterministic approach: algorithm STEP 1 Imputation of GO using IIP and CPI: EGO t = GO t 1 (1 + IIPt:0CPIt:0 IIPt 1:0CPIt 1:0 IIP t 1:0CPI t 1:0 ) STEP 2 Imputation of GO using VA and lagged ratio GO/VA EGO t = VA t GO t 1 VA t 1 ) STEP 3 Imputation of GO using IMVA and CPI EGO t = GO t 1 (1 + IMVAt:0CPIt:0 IMVAt 1:0CPIt 1:0 IMVA t 1:0CPI t 1:0 ) Todorov (UNIDO) R in UNIDO NTTS 2017 22 / 47
Imputation of Key Indicators: package unidocip2 B. Imputation INDSTAT: Deterministic approach: algorithm II STEP 4 Imputation of VA using GO and lagged ratio VA/GO EVA t = GO t VA t 1 GO t 1 ) STEP 5 Imputation of WS using VA and lagged ratio WS/VA EWS t = VA t WS t 1 VA t 1 ) STEP 6 Imputation of EMP using real VA and lagged ratio EMP/real VA EEMP t = VA t /CPI t EMP t 1 VA t 1/CPI t 1 Todorov (UNIDO) R in UNIDO NTTS 2017 23 / 47
Imputation of Key Indicators: package unidocip2 B. Imputation INDSTAT: Deterministic approach: algorithm III STEP 7 Imputation at industry level: will be based on the observed share of the industry in the manufacturing total. There are three ways to compute these shares: Historical average share. This method is based on the average share observed over the full history of the series and does not take into account time-variation in the industrial structure of the country. It is also sensitive to outliers. Historical median share. The share is estimated by taking the median of the whole history of the series. It is less sensitive to outliers than the average, but also does not take into account time-variation in the industrial structure of the country. Lagged share. This method takes the (imputed) share of the preceding year. It takes the time-varying structure of the economy into account, but is a less efficient estimate since it is based on only one observation and sensitive to outliers in that one observation. Todorov (UNIDO) R in UNIDO NTTS 2017 24 / 47
Imputation of Key Indicators: package unidocip2 Imputation INDSTAT: Deterministic approach: Example 1: Egypt Imputation of all missing values using IIP and CPI Todorov (UNIDO) R in UNIDO NTTS 2017 25 / 47
Imputation of Key Indicators: package unidocip2 Imputation INDSTAT: Deterministic approach: Example 2: imputation by industry Todorov (UNIDO) R in UNIDO NTTS 2017 26 / 47
REST APIs Outline 1 About UNIDO, UNIDO Statistics and R 2 R for Data Exchange 3 R as a graphical engine: package yearbook 4 Imputation of Key Indicators: package unidocip2 5 REST APIs 6 IO Analysis, WIOD and the package rwiot 7 Industrial statistics for business structure: package indstat 8 Competitive Industrial Performance (CIP) index: package CItools 9 Maintenance of UNIDO databases with R 10 Technical assistance 11 Summary and conclusions Todorov (UNIDO) R in UNIDO NTTS 2017 27 / 47
REST APIs Accessing international statistical databases with R Economics studies (e.g. competitiveness analysis or benchmarking) - necessary to access different sources of data. Many international organizations maintain statistical databases which cover certain types of data: COMTRDAE, UNCTAD and WTO for international trade data, World development indicators (WDI) from the World bank, World Economic Outlook (WEO) and International Financial statistics (IFS) from the International Monetary Fund (IMF), Industrial statistics databases (INDSTAT) by UNIDO and many more. Some of these organizations already provide application program interface (API) for accessing the data. How to use these APIs in R? Todorov (UNIDO) R in UNIDO NTTS 2017 28 / 47
REST APIs World Development Indicators (WDI) A comprehensive collection of cross-country comparable development indicators Compiled from officially-recognized international sources. Contains more than 1300 time series for more than 200 economies, for more than 50 years. The R package WDI makes it easy to search and download data from the WDI. The package is available from CRAN. Todorov (UNIDO) R in UNIDO NTTS 2017 29 / 47
REST APIs UNIDO REST API dblist() Returns the list of all available data sets (currently 9) dbinfo(db) Returns the info about the content of a data set: countries, variables, years, ISIC dbdata(db,...) Retrieves data from data set db. Example: > for(db in dblist) ## print the names of all data sets + print(dbinfo(db=db)$dbname) [1] "INDSTAT 2 2016, ISIC Revision 3" [1] "INDSTAT 4 2016, ISIC Revision 3" [1] "INDSTAT 4 2016, ISIC Revision 4" [1] "IDSB 2016, ISIC Revision 3" [1] "IDSB 2016, ISIC Revision 4" [1] "MINSTAT 2016 ISIC Revision 3" [1] "MINSTAT 2016 ISIC Revision 4" [1] "MVA 2016" [1] "CIP 2016" Todorov (UNIDO) R in UNIDO NTTS 2017 30 / 47
REST APIs UNIDO REST API Retrieve data: dbdata(db=dblist[1], ct=100, variable=20, from=2000, to=2006, isic=15) country variable isic isiccomb year value 1 100 20 15 NULL 2000 233600844 2 100 20 15 NULL 2001 232982867 3 100 20 15 NULL 2002 236882397 4 100 20 15 NULL 2003 350320309 5 100 20 15 NULL 2004 452031922 6 100 20 15 NULL 2005 547604073 7 100 20 15 NULL 2006 642608400 Todorov (UNIDO) R in UNIDO NTTS 2017 31 / 47
IO Analysis, WIOD and the package rwiot Outline 1 About UNIDO, UNIDO Statistics and R 2 R for Data Exchange 3 R as a graphical engine: package yearbook 4 Imputation of Key Indicators: package unidocip2 5 REST APIs 6 IO Analysis, WIOD and the package rwiot 7 Industrial statistics for business structure: package indstat 8 Competitive Industrial Performance (CIP) index: package CItools 9 Maintenance of UNIDO databases with R 10 Technical assistance 11 Summary and conclusions Todorov (UNIDO) R in UNIDO NTTS 2017 32 / 47
Industrial statistics for business structure: package indstat Outline 1 About UNIDO, UNIDO Statistics and R 2 R for Data Exchange 3 R as a graphical engine: package yearbook 4 Imputation of Key Indicators: package unidocip2 5 REST APIs 6 IO Analysis, WIOD and the package rwiot 7 Industrial statistics for business structure: package indstat 8 Competitive Industrial Performance (CIP) index: package CItools 9 Maintenance of UNIDO databases with R 10 Technical assistance 11 Summary and conclusions Todorov (UNIDO) R in UNIDO NTTS 2017 33 / 47
Competitive Industrial Performance (CIP) index: package CItools Outline 1 About UNIDO, UNIDO Statistics and R 2 R for Data Exchange 3 R as a graphical engine: package yearbook 4 Imputation of Key Indicators: package unidocip2 5 REST APIs 6 IO Analysis, WIOD and the package rwiot 7 Industrial statistics for business structure: package indstat 8 Competitive Industrial Performance (CIP) index: package CItools 9 Maintenance of UNIDO databases with R 10 Technical assistance 11 Summary and conclusions Todorov (UNIDO) R in UNIDO NTTS 2017 34 / 47
Competitive Industrial Performance (CIP) index: package CItools Competitive Industrial Performance (CIP) index The Competitive Industrial Performance (CIP) Index developed by UNIDO aims at benchmarking industrial performance at the country level. In contrast to other competitiveness indices currently available, the CIP index provides a unique crosscountry industrial performance benchmarking and ranking based on quantitative indicators and a selected number of industrial performance indicators. Rankings are provided at the global and regional levels, as well as by adopting different country groupings for 144 countries in 2016. This offers governments the possibility to compare their country s competitive industrial performance with relevant comparators, that is, not only with countries from the same region but also with countries at the same stage of economic or industrial development across the globe. More at stat.unido.org Todorov (UNIDO) R in UNIDO NTTS 2017 35 / 47
Competitive Industrial Performance (CIP) index: package CItools Competitive Industrial Performance (CIP) index (2) The CIP index combines 3 dimensions (comprising 8 indicators) of industrial performance into a single measure: 1. Capacity to produce and export manufactures (2) 2. Structural change towards manufactures and technology intensive sectors (4) 3. Impact in world MVA and in world manufactures (2) Only quantitative indicators are considered. Todorov (UNIDO) R in UNIDO NTTS 2017 36 / 47
Competitive Industrial Performance (CIP) index: package CItools Competitive Industrial Performance (CIP) index (2) Todorov (UNIDO) R in UNIDO NTTS 2017 37 / 47
Competitive Industrial Performance (CIP) index: package CItools CIP Ranking Todorov (UNIDO) R in UNIDO NTTS 2017 38 / 47
Competitive Industrial Performance (CIP) index: package CItools CIP profile I Todorov (UNIDO) R in UNIDO NTTS 2017 39 / 47
Competitive Industrial Performance (CIP) index: package CItools CIP profile II Todorov (UNIDO) R in UNIDO NTTS 2017 40 / 47
Maintenance of UNIDO databases with R Outline 1 About UNIDO, UNIDO Statistics and R 2 R for Data Exchange 3 R as a graphical engine: package yearbook 4 Imputation of Key Indicators: package unidocip2 5 REST APIs 6 IO Analysis, WIOD and the package rwiot 7 Industrial statistics for business structure: package indstat 8 Competitive Industrial Performance (CIP) index: package CItools 9 Maintenance of UNIDO databases with R 10 Technical assistance 11 Summary and conclusions Todorov (UNIDO) R in UNIDO NTTS 2017 41 / 47
Maintenance of UNIDO databases with R Data screening Todorov (UNIDO) R in UNIDO NTTS 2017 42 / 47
Maintenance of UNIDO databases with R Data screening Todorov (UNIDO) R in UNIDO NTTS 2017 43 / 47
Maintenance of UNIDO databases with R Data screening start value: x S * Indicator value x t 26 28 30 32 * x S relevant change: x * S ± δ significant change: x * S ± (δ + 2 s 2 x + s 2 * xs ) 2004 2006 2008 2010 2012 Time t Todorov (UNIDO) R in UNIDO NTTS 2017 44 / 47
Technical assistance Outline 1 About UNIDO, UNIDO Statistics and R 2 R for Data Exchange 3 R as a graphical engine: package yearbook 4 Imputation of Key Indicators: package unidocip2 5 REST APIs 6 IO Analysis, WIOD and the package rwiot 7 Industrial statistics for business structure: package indstat 8 Competitive Industrial Performance (CIP) index: package CItools 9 Maintenance of UNIDO databases with R 10 Technical assistance 11 Summary and conclusions Todorov (UNIDO) R in UNIDO NTTS 2017 45 / 47
Summary and conclusions Outline 1 About UNIDO, UNIDO Statistics and R 2 R for Data Exchange 3 R as a graphical engine: package yearbook 4 Imputation of Key Indicators: package unidocip2 5 REST APIs 6 IO Analysis, WIOD and the package rwiot 7 Industrial statistics for business structure: package indstat 8 Competitive Industrial Performance (CIP) index: package CItools 9 Maintenance of UNIDO databases with R 10 Technical assistance 11 Summary and conclusions Todorov (UNIDO) R in UNIDO NTTS 2017 46 / 47
Summary and conclusions Challenges The awareness of importance of computation in official statistics Staff - limited resources Rapid release cycle of R Package dependence Regular support Training and training materials IT infrastructure and support Todorov (UNIDO) R in UNIDO NTTS 2017 47 / 47