Wednesday, November 11, 2009
New Blog Location
Please update your bookmarks to www.spacetimeresearch.com/blog.html
Sunday, November 1, 2009
Protecting confidentiality - some real life examples
Don McIntosh has come to the party again by contributing a new blog post on how we are enabling our customers to disseminate detailed information while protecting the privacy of individuals. In the context of being providers of Official statistics, making data more available, and making governments more transparent, we show that it *can* be done - you *can* release data.
We are currently engaging with three customers and developing new requirements around the area of privacy protection on their data. For two of the three, the main goal is to deliver more detailed, useful data to their customers without compromising privacy concerns. The other key goals are around reducing the risk of accidentally releasing sensitive data (a goal of increasing importance given the Gov 2.0 fueled demand for more open data), and reducing costs associated with the application of privacy protection. I thought I'd write a short note to summarise our work in this area of late.
We have an API plugin architecture for applying disclosure control. Basically, you can build your own modules that do things like adjust, conceal, and/or annotate cell values based on certain rules, or reject a query if it's deemed too sensitive for whatever reason. You can also record query details and use them to monitor for potential privacy intrusions.
The work we are looking at doing in relation to current customer requests includes the following:
- Implementing plugins with customised rounding and concealment rules. This is straight forward work as far as our current architecture is concerned, and helps our customers with these requirements to implement rules that maximise the data they can make available. For one customer, we have written a plugin that will suppress numbers less than a certain value, and any related totals. So for example, if you were suppressing all numbers in a table less than or equal to 3, a simple table would show suppression of that cell, plus any totals containing that cell. The example table demonstrates how a returned table would look. By suppressing the totals, you are preventing someone from back-calculating a value that has been suppressed.
- Allowing custom selection of different rule combinations for testing and more advanced use of disclosure control. This is useful especially where you have a few in-house specialists who are authorised to be more lenient in terms of what rules need to be applied when responding to ad hoc information requests.
- Extending confidentiality to apply to the output of calculations (SuperSTAR field derivations). For example, you might have a function that in some cases returns "..C" instead of a real value for certain cells as per the example above. Confidentiality can be extended to work with derived data. For example, it would be useful for determining a statistical mean or median and concealing the result if there was less than a certain number of contributors.
We are really keen to hear from our customers and other interested parties. If you have some recent experience in using confidentiality in SuperSTAR or elsewhere, or would like to give us any kind of related feedback, please do feel free to leave a comment or contact us directly.
Wednesday, October 21, 2009
Why APIs are important for Gov2.0
I was at the Gov 2.0 conference in
So we asked our Director of Product Planning, Don McIntosh to write an article about what APIs are, and why they're important. This is what he has to say about APIs.
With social applications, there is a clear and obvious use that everyone can understand, and the staggering traffic volumes for these sites make the topic all the more compelling. But what about open data and APIs? Why should we pay them any attention and how do we benefit from them?
An API is an Application Programming Interface. Web based APIs, sometimes referred to as Web services, are growing at a phenomenal rate. Basically, instead of information being presented in a predetermined manner through Web pages, APIs allow other applications (iPhone apps, Websites, MS Windows applications….) to extract specific chunks of information and combine it with other information in all kinds of ways to serve a specific purpose. Jim Ericson from Information Management blogged about this, and he included a good description of how Web services get used:
So, they’re useful, widely used, accessible even to non-programming types, and becoming more popular by the day but what in particular makes them so important in a Gov 2.0 context? I’d summarise it by saying that it’s about making it possible (and easy) for those outside of government to present statistics in a context that is meaningful and useful for them, and that can help facilitate informed discussion and decision making. If I want to provide a service to help people decide where to live, I could combine census statistics such as occupation, income, and age and mash it up with information about the location of shopping centres, pubs etc from a different service. I could achieve the same by gathering all the data into a database and building my service on top, but by accessing the data through an API, my information can remain current, and my queries can be run by calls to the API, saving me from the complexities and resources required to process the data myself. I can also leverage other services such as Google maps to present results. And of course, thanks to mashup platforms, this kind of application might just be something that an (non-programmer) individual does to satisfy their own interest. Either way, it makes it much more possible for people to take government information and use it in ways that government may never have chosen to do.
An API can facilitate innovation, and help automate services that other organizations may provide based on the data. It can also provide transparency by not colouring the data in any particular way, but leaving it open to others to render analysis of the data in their own way. On the other hand, if representing the data in certain ways is useful in promoting an organization’s mission, then it might be best to concentrate on delivering the appropriate views and/or viewing tools for the data. Or in some cases, it might make sense to do both.
Gartner analyst Andrea diMaio noted that separating data from its source and having no clear way to let consumers understand its lineage or quality runs a great risk of it being misused, or deliberately doctored to represent the “facts” that best suit the application builder. What does this mean to the organization providing the data? Providers of official statistics go to great lengths to defend against this possibility yet by providing data through APIs, they may in some way increase the risk of this happening. Perhaps one way to look at it is to realise that this can happen anyway, without APIs. And it is probably unreasonable to expect a provider to do more than provide accurate quality information alongside their data (and even make it queryable through the API) so that users can make informed choices about what constitutes valid use of the data.
Many statistical agencies have “remote access data laboratory” services to give researchers the ability to perform detailed analyses on their data. There are typically manual checking processes in this, to ensure that researchers’ queries do not breach data privacy laws by identifying individuals from the data (something that is very easy to do, even when data has been anonymized). A provider would need to determine what privacy risks are posed by making the data available through an API, and ensure that appropriate safeguards are put in place.
An API call results in some amount of processing. Depending on the specifics, such as the type of query and the volume of data, the level of computing resources required can be quite significant. In the beginning, one option may be to limit API use to a few specific applications, and expand that over time. Alternatively, the API could impose certain limits for any single user. This is the approach that Twitter uses to manage the enormous demand it generates.
Update: a wordle.net tag cloud of this post
Tuesday, October 13, 2009
SuperSTAR Goodies - 6.7 Release progress
Since transitioning to a fully agile process, we now run fortnightly iterations. From time to time, we will share the outcomes of an iteration and keep you all up to date.
Some of the key items that came out of this iteration were:
1) Record View in SuperWEB2 - we have implemented our first two user stories:
"As a SW RecordVIEW user, I want a way of seeing all the unit records that relate to a crosstab table so that I can understand the detail behind the crosstabulation".
"As a SW RecordVIEW user, I want filtered view of the unit records that relate to the cells in a crosstab table I choose so that I can focus on specific areas of interest"
We have implemented RecordView using GWT in the RESTful style. GWT allows us to get a Rich Internet client user experience. Using REST means that it is easy for other clients such as SuperView to consume the RecordView service.
2) Aggregated mapping for SuperWEB2
“As a SW2 user, I want to have a faster mapping experience so that I can be more productive”.
The Mapping team have done some great work to improve the performance of our mapping solution in SuperWEB2. They have developed a ArcGISMap widget which allows SuperWEB2 to communicate directly with the Arc GIS Server via a REST interface. This means much faster zoom and pan performance with maps.
3) SuperCROSS Local Annotations Refactor – we are making good progress to get the Annotations working correctly again in SuperCROSS and are on track with our plans.
4) Automated testing – we have also made good progress in automating the testing of SuperCROSS and SuperWEB2.
Sunday, October 4, 2009
Record VIEW Functionality in SuperWEB2 - comments welcomed
A guest blog from Don McIntosh, our product manager for SuperSTAR. Please feel free to give us comments or feedback so we can incorporate your feedback into our product development while we are developing it.
What I wanted to cover in this post is a brief summary of what we are planning for RecordVIEW, as well as a few features that might come in a later release. I wanted to write about this now while we are developing it so that our customers and partners have an opportunity to comment and hopefully improve on the end result. Another thing we’ll do is provide a link to a test instance to let you play around with it once we have it up and running.
The first step for RecordVIEW is actually to cover off much of the functionality we had in the original SuperWEB. That means identifying some cells, switching to the RecordVIEW tab, choosing what fields to report on, and then downloading to XLS or CSV. The major addition for the first release in comparison to what was in the original SuperWEB will be in the ease of use. The experience will be a lot more immersive, with fewer pauses for server updates and a richer UI. Click on a cell, chose RecordVIEW and then choose what fields to view. You can choose all fields, or start with none and add a select few. You can also sort the results, and selectively filter what fields you’re interested in viewing. One other key feature I’ll mention is that the results of the RecrordVIEW are transparently paginated, so if you have a very long list, the browser isn’t waiting a long time to update it; it simply adds more as you scroll down.
We are of course very aware that for some datasets, RecordVIEW is not appropriate, due to the sensitive nature of the data. We will keep this simple: if there is confidentiality enabled for a database, then no RecordVIEW. Other permission functionality will remain unchanged from the earlier version.
Thursday, October 1, 2009
We're in the cloud! SuperWEB available now
I'm really excited to announce that we aim to be among the first companies to host applications on the Apps.gov website.
Cloud computing services reduce costs through reductions in purchasing and maintaining servers, while simultaneously improving service scalabilty to manage peaks and troughs in usage. Kundra says that besides encouraging better collaboration among agencies, he expects cloud services to reduce energy consumption because agencies will be able to share IT infrastructures.
Space-Time Research is responding to the recent US Federal Government request for proposal for applications to be hosted via the Apps.gov website. The Apps.gov Storefront is managed by the US GSA (General Services Administration) and SuperSTAR software is already available for purchase through the GSA e-Library.
Space-Time Research cloud offerings
In September, Space-Time Research initiated a cloud offering by hosting SuperWEBSoftware as a Service (SaaS) on the Amazon EC2 cloud service. SuperWEB is currently in the process of being assessed for inclusion in the Apps.gov website. Once certified, SuperWEB SaaS will be available to buy as a small, medium, large or extra large implementation on a pay-by-month basis.At the end of October, SuperVIEW will be production-ready and available via a Google App Engine hybrid cloud service.
More about Apps.gov
Apps.gov is managed by the GSA development team, which is led by Casey Coleman, GSA’s CIO. In the article Kundra's great experiment: Government apps 'store front' opens for business, Coleman says:“Through Apps.gov, GSA can take on more of the procurement processes upfront, helping agencies to better fulfill their missions by implementing solutions more rapidly,”
“We will also work with industry to ensure cloud-based solutions are secure and compliant to increase efficiency by reducing duplication of security processes throughout government."