[wdmmg-discuss] CRA 2010: description and questions
apt1002 at goose.minworks.co.uk
Thu Aug 12 10:11:18 BST 2010
My 2p is below...
On Wed, 11 Aug 2010, Anna Powell-Smith wrote:
> I've just started some work on the WDMMG data store, and Lisa and I
> have been looking at the data for CRA 2010, preparatory to loading it
> into the store.
> Here's what we've discovered, and some questions for discussion
> (Alistair, Will?):
> **The data**
> CRA 2010 consists of *two* spreadsheets, both in the CKAN package at
> As well as now being in two spreadsheets, it has also become slightly
> less granular.
> The two tables both show total spending, but classify it differently:
> by region and by sub-function. Table 9 classifies spending items by
> country, 9 regional areas, and COFOG 1 (e.g. England, East Midlands,
> Social protection). Table 10 classifies its items by country, COFOG 1
> and COFOG 2 (e.g. England, Social protection, Old age).
Oh, how awkward! PESA is structured like this too, and we hoped the CRA
would fix it.
> Both have just over 20,000 rows and show data from 2004-5 onwards. The
> Treasury claims the tables are consistent. One more difference between
> the two: Table 9 has projected spending for 2010-11, but Table 10 does
> **Differences from CRA 2009**
> Last year every item was classified by both region and sub-function,
> so the data seems to have become less granular overall. Basically, you
> can now classify it either by region, or by sub-function, but not
> A few minor differences: we have gained Treasury classifications of
> spending, which seem to be analogous to COFOG but not identical. The
> 'CG or LG' column (central or local government) has gained a third
> option and is now 'CG, LG or PC' (public corporation - like the Met
> Office & World Service).
That sounds good. We should definitely have a key for this column.
> Lisa says that the 'unknown' fields that caused problems last time are
> less problematic this time - I don't know much about this, but it
> sounds like good news.
> 1. Do we load this as one slice or two, given the two ways of
> classifying data? One slice seems feasible, but messy (I guess you
> just have columns for both region and cofog2, and always leave one of
> them null, and have a lot of potentially duplicate rows and fairly
> complex queries). Advice appreciated.
The general rule for a slice is that it should not double-count spending.
Therefore, it would be wrong to put both tables in one slice.
Having said that, if I look forward to the day when the data store holds
pre-computed aggregates of the spending, we would expect the aggregates of
the two tables to overlap significantly. Not sure how to handle this. To
allow for inconsistencies in the data (e.g. rouding errors) it could be
argued that it should remain in two slices, despite the overlap...
> 2. We now have actual 2009-10 spending to compare with last year's
> projected spending (though unfortunately less granular). I'm thinking
> of adding a projected/actual key to the data store to deal with this,
> it seems to be a common issue with spending data, unless anyone
> objects. Also, do we want to do anything with this comparison?
> 3. On a related note, should I load in the data from past years in CRA
> 2010? or do we assume that this would just duplicate CRA 2009?
I think you should load all of the CRA 2010, including historical data.
> 4. Finally, I'll add the Treasury classifications as a new key, unless
> anyone objects.
> best wishes
> wdmmg-discuss mailing list
> wdmmg-discuss at lists.okfn.org
More information about the openspending-discuss