Happy to announce the formal start of a new project. COINr!
….and what is that, you may ask. Well, let me begin with a bit of background.
Building composite indicators requires quite a number of steps, including denomination, imputation, normalisation, weighting, aggregation, multivariate analysis, sensitivity analysis, visualisation and others besides.
Depending on what you are doing, and your level of experience, many of these steps are not hugely difficult. Aggregation, for example, is often a case of taking a weighted average. However, when you put all the steps together, they add up to quite a large number of calculations in which it is easy to make a mistake somewhere.
Some steps, even on their own, are also not straightforward: sensitivity analysis is one good example. Performing a proper global sensitivity analysis of your composite indicator requires parameterising each part of the construction process, and rebuilding the index many times according to an experimental design. Specific weighting methods can also be complicated, as can imputation. Getting meaningful visualisations can also be a chore.
Some tools already exist to help. The COIN Tool is one example. This does a heroic job of providing a fairly flexible framework for composite indicator development in Excel. However, it is easy to reach its limitations: there are limits on the numbers of units and indicators, and the number of aggregation levels. You cannot really have time-dependent indexes, and although there is some treatment of uncertainties, a proper sensitivity analysis is not possible. These limitations are a consequence of using Excel - it is great for certain tasks (and I have a ranty blog post in mind at some point for mindless critics of Excel), but you have to recognise its limits.
There is also a recently launched web-based tool called MCDA Index Tool. This is mostly focused on multi-criteria decision analysis, and doesn’t include different levels of aggregation. Nonetheless, for the purposes of MCDA, and certain types of indexes, it is a very interesting tool.
In Matlab, there are some packages addressing specific parts of index development. The CIAO package uses a nonlinear regression and optimisation approach to tune weights to agree with expert opinions, accounting for the complexities of correlation (full disclosure: I am a co-author/developer of this). But despite how brilliant it all is, it only deals with weights, and no one really likes paying the huge Matlab licence fees.
In R there is in fact a package for composite indicator development, called “compind”. This has some fairly sophisticated tools for weighting, particularly relating to data envelopment analysis approaches, as well as a number of aggregation functions. It is also encoded into a very nice Shiny web app. It does however have a number of practical limitations, namely that it can only handle one level of aggregation, and excludes a number of key steps such as multivariate analysis and sensitivity analysis.
One step beyond
Enter a new package written in R, which has the working title of “COINr” (COmposite INdicators in R).
COINr is an attempt to build a truly comprehensive package for composite indicator development, which includes:
- Full flexibility on number of indicators, units, aggregation levels
- Possibility of grouping and time dependency
- Interactive visualisation and analysis of index structure and indicators, including multivariate analysis
- A range of imputation options
- A range of data treatment options for outliers
- A range of normalisation options
- A range of aggregation functions
- A range of weighting approaches, including automatic statistical weighting
- Full global uncertainty and sensitivity analysis
- Automatic generation of unit reports via R Markdown
And other features that I can’t remember off the top of my head. In short, the idea is to provide a framework that leads you from the data entry stage all the way through to the results and analysis, with a range of options at each stage.
COINr is also organised in a somewhat “object-oriented” fashion. Functions are designed to interact with one another: a single object (a structured list in R) contains all data, parameters, and methodological details relating to the analysis. Therefore all functions operate on this object, and update it. This means that building the index is a very simple procedure, and the object contains a full record of the analysis. Nonetheless, all functions can also be used on stand-alone data frames, outside of the “COINrverse”.
Where it’s at
I started COINr back in September, and it has its own Github repository here. It is still a jumble of functions without being a fully organised package. It also has no documentation yet, and I need to run checks on pretty much everything. Also there are many functions to build yet and to modify further. In summary, its not yet very useful to anyone.
That said, I’m pretty pleased with where it is at at the moment, and I feel like it will end up being a Good Thing. I am now developing this with the European Commission’s Competence Centre for Composite Indicators and Scoreboards, and we should have a beta version ready by….. hopefully spring time, depending on other work. It’s a good project to work on and I am learning a lot by doing. I will update here from time to time. Peace.