Don McIntosh has come to the party again by contributing a new blog post on how we are enabling our customers to disseminate detailed information while protecting the privacy of individuals. In the context of being providers of Official statistics, making data more available, and making governments more transparent, we show that it *can* be done - you *can* release data.
We are currently engaging with three customers and developing new requirements around the area of privacy protection on their data. For two of the three, the main goal is to deliver more detailed, useful data to their customers without compromising privacy concerns. The other key goals are around reducing the risk of accidentally releasing sensitive data (a goal of increasing importance given the Gov 2.0 fueled demand for more open data), and reducing costs associated with the application of privacy protection. I thought I'd write a short note to summarise our work in this area of late.
We have an API plugin architecture for applying disclosure control. Basically, you can build your own modules that do things like adjust, conceal, and/or annotate cell values based on certain rules, or reject a query if it's deemed too sensitive for whatever reason. You can also record query details and use them to monitor for potential privacy intrusions.
The work we are looking at doing in relation to current customer requests includes the following:
- Implementing plugins with customised rounding and concealment rules. This is straight forward work as far as our current architecture is concerned, and helps our customers with these requirements to implement rules that maximise the data they can make available. For one customer, we have written a plugin that will suppress numbers less than a certain value, and any related totals. So for example, if you were suppressing all numbers in a table less than or equal to 3, a simple table would show suppression of that cell, plus any totals containing that cell. The example table demonstrates how a returned table would look. By suppressing the totals, you are preventing someone from back-calculating a value that has been suppressed.
- Allowing custom selection of different rule combinations for testing and more advanced use of disclosure control. This is useful especially where you have a few in-house specialists who are authorised to be more lenient in terms of what rules need to be applied when responding to ad hoc information requests.
- Extending confidentiality to apply to the output of calculations (SuperSTAR field derivations). For example, you might have a function that in some cases returns "..C" instead of a real value for certain cells as per the example above. Confidentiality can be extended to work with derived data. For example, it would be useful for determining a statistical mean or median and concealing the result if there was less than a certain number of contributors.
We are really keen to hear from our customers and other interested parties. If you have some recent experience in using confidentiality in SuperSTAR or elsewhere, or would like to give us any kind of related feedback, please do feel free to leave a comment or contact us directly.