This section outlines in detail the construction of the Open Data Barometer rankings, including details of the primary and secondary data used.
The methodology used in this second edition of the Open Data Barometer broadly replicates that used in 2013. However, as part of work towards Common Assessment Methods on Open Data, future versions of the Barometer are likely to include additional components to look further at data use and impacts.
The sub-indexes, components and overall ranking in the ODB draw on three kinds of data:
Peer-reviewed expert survey responses - between June and September 2014 we included a series of questions in the Web Index expert survey, asking country specialists to respond to a number of detailed questions about the open data situation in a specific country (see below for the list of questions in the survey). Each question invited a response on a 0 - 10 scale, with detailed scoring guidance provided. Researchers also provided detailed citations for all scores. Responses were peer-reviewed, re-scored by researchers where required, and cross-checked by the research coordination team.
For the construction of sub-components and sub-indexes, scores were normalised using z-scores for each question. This converts the 0 - 10 score into a measure of how far above or below the mean (in standard deviations) any given answer is. Normalisation gives us the ability to compare how well countries are doing relative to one another, and makes the measurements more robust to marginal alterations in scoring guidance year-on-year. The mean and standard deviation values from 2013 were used, in order that the z-scores are comparable between the two years of data.
Detailed dataset assessments - between August and October 2013 a team of technical specialists investigated the availability of 15 kinds of data within each country, and answered a 10-point checklist with respect to the qualities of data provided. This small group of technical experts each assessed one or more countries, drawing upon source material provided by country experts in the expert survey. These assessments were peer-reviewed and subjected to a detailed review by a team of three technical reviewers.
For the Barometer Ranking, an aggregation logic and weightings were applied to the checklist results (see below) to generate a score between 0 and 100. These scores were not individually normalised, to allow clear comparison between the different datasets in the Barometer, but the aggregated index of dataset availability (the Implementation Sub-Index) was normalised using z-scores to bring it onto the same scale as other questions prior to inclusion in overall Index calculations.
Secondary data - in order to complement the expert survey data for the ODB in the Readiness section of the Barometer, we draw on five secondary indicators, each selected on the basis of theory and their ability to measure important aspects of readiness not covered in our survey. Four of these are based on independent expert surveys (by the World Economic Forum; Freedom House and the United Nations Department of Economic and Social Affairs) and one is based on World Bank collated data on internet penetration.
For the Barometer Rankings, these variables are each normalised using the same approach as for our peer-reviewed expert survey data (z-scores based on 2013 mean and standard deviation).
The Barometer builds upon tri-partite structure with three sub-indexes, each containing three components. The weightings of these in the aggregated Open Data Barometer score and ranking are shown in brackets.
(Primary & secondary data)
|Government (1/3)||Entrepreneurs & business (1/3)||Citizens & civil society (1/3)||Accountability dataset cluster (1/3)||Innovation dataset cluster (1/3)||Social policy dataset cluster (1/3)||Political (1/3)||Economic (1/3)||Social (1/3)|
This structure is based on the idea that:
The first edition Barometer incorrectly reported the sub-indexes as equally weighted on page 37. The first edition weights were: Readiness (1/5); Implementation (3/5); Impact (1/5) (i.e. 60% of the overall ranking was based on implementation). In the second edition 50% of ranking is based on implementation, with the rest split 25% to readiness, and 25% to impact.
The higher weighting of implementation in the first two editions of the Open Data Barometer reflects the focus, in this pilot phase of the project, on exploring progress towards open data implementation and impact over time, and judgements on the relative strength of the primary data collected in each year. The small reduction in weighting of implementation from the first to second editions reflects the direction of travel in the Barometer towards assessing use and impact, whilst seeking to maintain comparability of rankings between first and second editions and ensuring that score changes between first and second addition can be clearly explained by changes of underlying variables.
The Open Data Barometer measures readiness through three components focussing on: Government; Citizens and Civil Society; and Entrepreneurs and Business. We are not measuring readiness to start an open government data initiative, but rather readiness to secure positive outcomes from such an initiative. As such, we include measures relating to the existence of open data, and a range of interventions that support engagement with and re-use of open data.
Each of the groups are important for a successful OGD initiative. As Tim Berners-Lee has observed, open data “has to start at the top, it has to start in the middle and it has to start at the bottom”1. Policies and portals are just one component of an effective open data agenda. In carrying out qualitative Open Data Readiness assessment across a number of countries from 2010 to 2013, the Web Foundation developed a six-dimensional framework for looking at the Political, Organisational, Legal, Social, Economic and Technical context within a country in order to understand factors that may facilitate or inhibit the development of an OGD initiative, and the successful use of open data2. These six dimensions have informed the selection of indicators in the readiness section of the Open Data Barometer.
In selecting indicators we have also drawn upon findings from the Open Data in Developing Countries (ODDC) research project which have highlighted important relationship between open data policies and the Right to Information, and the importance of complementing open data release with robust protection for citizens personal data. These two issues are represented in the Barometer by indicators on Right to Information and Data Protection laws. The experience of the Open Data Institute in delivering training and capacity building for the economic re-use of data also informed the design of our indicator on training availability. There were a number of further aspects of readiness we would have liked to include in this section, such as quality of government record keeping3, and the statistical capacity of governments. However, we could not locate comprehensive secondary indicators, nor design simple expert survey questions adequate to capture these. We continue to seek approaches to be able to include these in future Barometer studies.
The variables used in the readiness sub-index, along with their variable names4, are:
Entrepreneurs and businesses
Citizen and Civil Society
To ensure variables collected on different scales are comparable, and that the ODB second edition data is comparable to 2013 data, all variables in the readiness sub-index are normalised using z-scores with the 2013 mean and standard deviations prior to aggregation. For presentation, variables are scaled on a 0 – 100 scale.
The 2012 Web Index asked researchers ‘To what extent are there government data on [X] on the web in your country?’, covering trade data, budget and spend, health sector performance, educational performance, transport data and schedules, census, national map, tax return, government service contact details, and crime, followed by a separate question on the extent of accessibility of these datasets (taken together) as open data. In the 2013 Open Data Barometer expert survey we modified this approach, asking researchers to complete a detailed checklist for each of 15 categories of data. This method is maintained for this second edition of the Open Data Barometer. The 10 checklist questions are show below, along with details of the qualitative data researchers were asked to provide in justification for each answer. We refined this process further in 2014 as described in the changes section below.
In many cases where machine-readable open data was not available (question c), researchers provided additional answers with respect to the non machine-readable data published by governments (e.g. providing details on whether PDF census information is up to date or not). This information is valuable for building an understanding of different patterns of information and data management within governments, but should not generally feature in a score that measures the availability of open data. Therefore, we apply a validation logic to the original survey data gathered from the Barometer survey to ensure that, after questions a and b, we are measuring only the properties of machine-readable datasets. The exception to this is timeliness data (g), where in the event that even the non machine-readable data is out of date, in this edition we deduct 5 points from the dataset score. This is to ensure that instances where there have been no updates to the data, and where updates may have been reasonable anticipated, in whatever format, since 2013, are suitably downgraded in the overall score.
Following validation, we weight the checklist responses, awarding the value in the weight column of the table below for answers of ‘Yes’. The weighting is designed to emphasise the four questions (c, d, e, f) which pick out key aspects of the Open Definition (OKF, 2006). A positive score on these variables is also used to calculate a binary ‘Is Open Data’ variable, which is used in presenting dataset listings and in selected summary statistics.
|Q||Question||Weight||Chaining Logic||Qualitative data collected|
|a||Does the data exist?||5||Description of data; Agency responsible; Reasons for non-collection|
|b||Is it available online from government in any form?||10||If a = No THEN 0 ELSE (IF b = Yes THEN 10 ELSe 0)||URL; Limits on data published; Policies preventing publication|
|c||Is the dataset provided in machine-readable formats?||15||IF b = No THEN 0 ELSE (IF c = Yes THEN 15 ELSe 0)||URL; File formats;|
|d||Is the machine-readable data available in bulk?||15||IF c = No THEN 0 ELSE (IF d = Yes THEN 15 ELSE 0)||URL|
|e||Is the dataset available free of charge?||15||IF c = No THEN 0 ELSE (IF e = Yes THEN 15 ELSE 0)||Details of charging regimes|
|f||Is the data openly licensed?||15||IF c = No THEN 0 ELSE (IF f = Yes THEN 15 ELSE 0)||URL; License details|
|g||Is the dataset up to date? Logic: lose 5 points if any form of data is the data is outdated. Gain 10 points if the machine-readable data is timely.||10||IF g = No THEN -5 ELSE (IF(c = Yes AND g = YES THEN 10) ELSE 0)||Last update date; Frequency of updates|
|h||Is the publication of the dataset sustainable?||5||IF c = No THEN 0 ELSE (IF h = Yes THEN 5 ELSE 0)||Evidence of sustainability|
|i||Was it easy to find information about this dataset?||5||IF c = No THEN 0 ELSE (IF i = Yes THEN 5 ELSE 0)||Notes on discoverability|
|j||Are (linked) data URIs provided for key elements of the data?||5||IF c = No THEN 0 ELSE (IF j = Yes then 5 ELSE 0)||URL of linked data publication|
The following table shows the categories of data covered in the technical survey, along with a brief definition of each. These definitions were carefully designed to avoid creating a strong bias against states who have less advanced internal systems for managing data, and to be able to capture cases where states are making an effort to share the data that they do have. We also sought to gather information about where data is managed federally rather than nationally, to avoid penalising countries with a federal system, although recognising that from the perspective of a data re-user, nationally aggregated data may be much more useful than separate non-standardised federal datasets.
By putting forward categories of data, rather than specific named datasets, we allowed researchers to exercise judgement as to the extent to which countries were making data of this kind available, whilst also sourcing specific examples of datasets that fit into these categories in different countries, and generating a rich collection of qualitative information about the reasons that certain data may or may not be available in different countries, and the extent to which certain datasets tend to exist at national or federal levels. This qualitative data will feed into future iterations of the Open Data Barometer design.
The wording of a number of definitions in 2014 were refined to align more closely with those used in the separate Open Data Index project undertaken by Open Knowledge, which uses an alternative crowdsourced methodology to gather data on 10 datasets across a number of countries. As a number of the operational definitions of variables, and categories, are lined up across these two independent data sources, this should allow for cross-validation and work to assess how far definitive judgements of dataset openness can be rendered through the methodologies adopted in both studies. The aligned definitions are indicated with *.
|Variable name||Short Name||Long name||Description|
|ODB.2013.D1||Map *||Mapping data||A detailed digital map of the country provided by a national mapping agency and kept updated with key features such as official administrative borders, roads and other important infrastructure. Please look for maps of at least a scale of 1:250,000 or better (1cm = 2.5km).|
|ODB.2013.D2||Land||Land ownership data||A dataset that provides national level information on land ownership. This will usually be held by a land registration agency, and usually relies on the existence of a national land registration database.|
|ODB.2013.D4||Stats *||National statistics||Key national statistics such as demographic and economic indicators (GDP, unemployment, population, etc), often provided by a National Statistics Agency. Aggregate data (e.g. GDP for whole country at a quarterly level, or population at an annual level) is considered acceptable for this category.|
|ODB.2013.D5||Budget *||Detailed budget data||National government budget at a high level (e.g. spending by sector, department etc). Budgets are government plans for expenditure, (not details of actual expenditure in the past which is covered in the spend category).|
|ODB.2013.D6||Spend||Government spend data||Records of actual (past) national government spending at a detailed transactional level; at the level of month to month government expenditure on specific items (usually this means individual records of spending amounts under $1m or even under $100k). Note: A database of contracts awarded or similar is not sufficient for this category, which refers to detailed ongoing data on actual expenditure. [In final review, this category was extended to allow cases where detailed quarterly data was provided, as very few cases of transaction level spending data were located. This varies form the Open Data Census which maintained a tight definition on transactional level spend.]|
|ODB.2013.D7||Company *||Company registration data||A list of registered (limited liability) companies in the country including name, unique identifier and additional information such as address, registered activities. The data in this category does not need to include detailed financial data such as balance sheet etc.|
|ODB.2013.D8||Legislation||Legislation data||The constitution and laws of a country.|
|ODB.2013.D9||Transport||Public transport timetable data||Details of when and where public transport services such as buses and rail services are expected to run. Please provide details for both bus and rail services if applicable. If no national data is available, please check and provide details related to the capital city. [This qualification for capital cities differs from the Open Data Index, which only looks for national level data.]|
|ODB.2013.D10||Trade||International trade data||Details of the import and export of specific commodities and/or balance of trade data against other countries.|
|ODB.2013.D11||Health||Health sector performance data||Statistics generated from administrative data that could be used to indicate performance of specific services, or the healthcare system as a whole. The performance of health services in a country has a significant impact on the welfare of citizens. Look for ongoing statistics generated from administrative data that could be used to indicate performance of specific services, or the healthcare system as a whole. Health performance data might include: Levels of vaccination; Levels of access to health care; Health care outcomes for particular groups; Patient satisfaction with health services.|
|ODB.2013.D12||Education||Primary and secondary education performance data||The performance of education services in a country has a significant impact on the welfare of citizens. Look for ongoing statistics generated from administrative data that could be used to indicate performance of specific services, or the education system as a whole. Performance data might include: Test scores for pupils in national examinations; School attendance rates; Teacher attendance rates. Simple lists of schools do not qualify as education performance data.|
|ODB.2013.D13||Crime||Crime statistics data||Annual returns on levels of crime and/or detailed crime reports.Crime statistics can be provided at a variety of levels of granularity, from annual returns on levels of crime, to detailed real-time crime-by-crime reports published online and geolocated, allowing the creation of crime maps.|
|ODB.2013.D14||Environment||National environmental statistics data||Data on one or more of: carbon emissions, emission of pollutants (e.g. carbon monoxides, nitrogen oxides, particulate matter etc.), and deforestation. Please provide links to sources for each if available.|
|ODB.2013.D15||Elections *||National election results data||Results by constituency / district for the most all national electoral contests over the last ten years.|
|ODB.2013.D16||Contracting||Public contracting data||Details of the contracts issued by the national government.|
To generate three sub-components in the Implementation sub-index we cluster these datasets into three groups, based on a qualitative analysis of the common ways in which these categories of data are used. As previously discussed, these clusters are not mutually exclusive. It is within the nature of open data that a dataset can be used for multiple purposes – and a single dataset might have applications across innovation, improving policy, and increasing accountability. However, for simplicity of presentation and analysis we place each dataset in only one cluster. Further work is needed to refine these clusters in future analysis, and readers are encouraged to explore different groupings of datasets in remixing our research.
|Data commonly used in open data applications by entrepreneurs, or with significant value to enterprise.||Data useful in planning, delivering and critiquing social policies & with the potential to support greater inclusion and empowerment.||Data central to holding governments and corporations to account. Based on the ‘Accountability Stack’.|
|Map Data, Public Transport Timetables, Crime Statistics, International Trade Data, Public Contracts||Health Sector Performance, Primary or Secondary Education, Performance Data, National Environment Statistics, Detailed Census Data||Land Ownership Data, Legislation, National Election Results, Detailed Government Budget, Detailed Government Spend, Company Register|
In order to maintain the ability to compare scores from one dataset to another, individual variables in this sub-index are not normalised prior to aggregation. However, the implementation sub-index score is z-score normalised prior to calculation of the final Barometer score, and then rescaled to 0 – 100 for presentation.
Recognising the early stage of open data developments around the world, we sought to develop an approach to capture stories of impact, and to be able to compare the relative strength of impact these indicated across different categories of impact, and across different countries. Our approach was to treat online, mainstream media and academic publications about open data impacts as a proxy for existence of impacts, with researchers asked to score the extent of impact on a 0 – 10 scale. Scoring guidance outlined that the highest scores should only be given for peer-reviewed studies showing impact, and emphasised the importance of sources making a direct connection between open data and observed impacts. For scores over 5 researchers were asked to cite at least two separate examples in the given category.
The six questions asked in this section, organised by sub-component, were:
These variables are all normalised using z-scores prior to aggregation.
To calculate each component an average of the variables in that component is taken. The average of components is used to generate each sub-index.
The weighted average of the sub-indexes is used to generate the overall Open Data Barometer score.
For consistency, the normalised scores for all the sub-indexes, and the readiness and impacts components, have been rescaled to a 0 - 100 range using the formula [(x - min)/(max - min)]*100 prior to presentation. This means that a score of 100 on these components and sub-indexes illustrates the highest scoring country across the 86 included in the Barometer Global ranking. It does not mean that a score of 100 is perfect.
All scores in a study of this kind are subject to a margin of error. To offer an indicative comparison between countries we offer a ranking based on rounding each countries overall ODB score to its integer value (no decimal places), and placing countries in order of score. This ranking, and each of the other scores, should be treated as the starting point for exploration, rather than a definitive judgement on each countries open data readiness, implementation and impacts.
Whilst the ultimate goal of the Open Data Barometer is to understand and increase open data impact, at present our methods offers only a rough proxy measure of impact, through the publication of media or academic stories on impact. An analysis of the data in, and between, years, suggests this method offers a useful heuristic for extent of impact, but does have a relatively high risk of false-negative results, when research does not locate stories of impact, and false-positives, when media incorrectly attribute impacts to open data, or report arguments for potential benefits as actual impacts and benefits. Scores on the impact variables also lack a normal distribution, being heavily skewed towards zero. As a result, we judged it was not yet possible to give impact the highest weight in our overall rankings.
Similarly, on theoretical grounds, whilst some variables within the readiness sub-index do reflect explicit actions on open data, such as those addressing the presence of initiatives, and support for innovation, other variables within this sub-index are capturing elements of wider context in the country. In seeking to measure progress towards being able to secure impacts of open data, having readiness alone is not enough: this readiness should be translated into action.
This is the basis for the 25-50-25 (Readiness-Implementation-Impact) weightings in the final Open Data Barometer score.
Future editions will draw upon updated indicators and methodologies in order to further the robustness of impact measurement, and to introduce a stronger focus on data use. This provides the basis for a gradual shift in this edition towards a marginally lower weighting of implementation, creating space for new variables, whilst offering the opportunity to keep some degree of comparability across indexes in future years also.
When making comparisons between 2013 and 2014 data it is important to be aware of minor methodological changes. Whilst we have made every effort to keep indicators consistent, learning from the 2013 process has led to a number of minor adaptations.
In 2013, a dedicated survey took place for the Open Data Barometer, combining context, impact and technical dataset assessment questions in one, and taking place between July and October 2013. Learning from this process suggested that different skill sets were required for the context and impact assessment, and the technical assessment, and so these processes were split in 2014.
In 2014, data collection for context and impact was included within the Web Index 2014 Expert Survey (which uses exactly the same methodology for expert survey as the Barometer), with data extracted following the Web Index peer-review process, and subjected to additional independent validation by the Open Data Barometer research team. Data collection for this component of the study took place from June to September 2014, with validation in September 2014. The assessments focussed on events in the 12 months to June 2014.
The full detailed dataset technical assessment was carried out by a separate small team of assessors, based on initial information provided through the 2014 Web Index survey about likely national data sources. Three members of the core Open Data Barometer research team reviewed and validated all technical assessments. Data collection for this component of the study took place from August to October 2014, with judgements focussing on data available up until the end of October 2014.
The 2014 survey also included a number of additional requests for supporting information, and effort was made to ensure these were provided in ways suitable for public release.
One additional dataset was added to the technical assessment (Public Contracts), bringing the total number of datasets assessed to 15. Public Contracts is included in the ‘Innovation & Economic Growth’ implementation sub-component, based on the potential role of transparent contracting data in creating a more competitive landscape in public procurement.
The operational definitions for a number of datasets in the technical assessment were updated to align, or maintain alignment, with those used in the separate and independent Open Data Index produced by Open Knowledge. The datasets affected included: Mapping, National Statistics, Detailed budget, Detailed data on government spend, Company Registration and Elections. The definitions for the Environment and Public Transportation categories are partially aligned, but with some minor differences. The changes were minor in each case, but took place to support a move towards common assessment methods, and to support third-party comparisons of the two datasets. Whilst the Open Data Barometer uses paid expert researchers, Open Knowledge’s Index adopts a crowdsourced method.
In 2014, datasets which are available in any forms, but which are judged not to be up-to-date will have 5 points subtracted from their 0 - 100 score. Datasets which are judged to be updated will still recieve +10 points on this score.
This change is to reflect the fact that a number of datasets which were out of date in 2013 remain so in this years survey, and to offer the same score in 2014 would not reflect the further drops in the timeliness of this data.
The weightings were adjusted as described above.
Analysis was carried out via R, with a parallel check of calculations using Google Spreadsheets.
Hogge, B. (2010). Open Data Study. Transparency and Accountability Initiative. Transparency and Accountability Initiative. http://www.transparency-initiative.org/wp-content/uploads/2011/05/open_data_study_final.pdf ↩
Grewal, A., Iglesias, C., Alonso, J. M., Boyera, S., & Bratt, S. (2011). Open Government Data - Feasability Study in Ghana; Alonso, J. M., Boyera, S., Grewal, A., Iglesias, C., & Pawelke, A. (n.d.). Open Government Data: Readiness Assessment Indonesia. ↩
Thurston, A. C. (2012). Trustworthy Records and Open Data. The Journal of Community Informatics, 8(2). http://ci-journal.net/index.php/ciej/article/view/951/952 ↩
Primary data variable names reflect the year they were first introduced to the study. E.g. ODB.2013.C.INIT reflects that this variable was first introduced in 2013. ↩