Wednesday, November 11, 2009

New Blog Location

The Customer Quality and Success blog has been moved to the Space-Time Research website.

Please update your bookmarks to www.spacetimeresearch.com/blog.html

Sunday, November 1, 2009

Protecting confidentiality - some real life examples

Don McIntosh has come to the party again by contributing a new blog post on how we are enabling our customers to disseminate detailed information while protecting the privacy of individuals. In the context of being providers of Official statistics, making data more available, and making governments more transparent, we show that it *can* be done - you *can* release data.

We are currently engaging with three customers and developing new requirements around the area of privacy protection on their data. For two of the three, the main goal is to deliver more detailed, useful data to their customers without compromising privacy concerns. The other key goals are around reducing the risk of accidentally releasing sensitive data (a goal of increasing importance given the Gov 2.0 fueled demand for more open data), and reducing costs associated with the application of privacy protection. I thought I'd write a short note to summarise our work in this area of late.

We have an API plugin architecture for applying disclosure control. Basically, you can build your own modules that do things like adjust, conceal, and/or annotate cell values based on certain rules, or reject a query if it's deemed too sensitive for whatever reason. You can also record query details and use them to monitor for potential privacy intrusions.

The work we are looking at doing in relation to current customer requests includes the following:

  • Implementing plugins with customised rounding and concealment rules. This is straight forward work as far as our current architecture is concerned, and helps our customers with these requirements to implement rules that maximise the data they can make available. For one customer, we have written a plugin that will suppress numbers less than a certain value, and any related totals. So for example, if you were suppressing all numbers in a table less than or equal to 3, a simple table would show suppression of that cell, plus any totals containing that cell. The example table demonstrates how a returned table would look. By suppressing the totals, you are preventing someone from back-calculating a value that has been suppressed.

  • Allowing custom selection of different rule combinations for testing and more advanced use of disclosure control. This is useful especially where you have a few in-house specialists who are authorised to be more lenient in terms of what rules need to be applied when responding to ad hoc information requests.

  • Extending confidentiality to apply to the output of calculations (SuperSTAR field derivations). For example, you might have a function that in some cases returns "..C" instead of a real value for certain cells as per the example above. Confidentiality can be extended to work with derived data. For example, it would be useful for determining a statistical mean or median and concealing the result if there was less than a certain number of contributors.

We are really keen to hear from our customers and other interested parties. If you have some recent experience in using confidentiality in SuperSTAR or elsewhere, or would like to give us any kind of related feedback, please do feel free to leave a comment or contact us directly.

Wednesday, October 21, 2009

Why APIs are important for Gov2.0

I was at the Gov 2.0 conference in Canberra earlier in the week and found that compared to the talk around social engagement through Twitter and Facebook, the whole concept of open data and APIs took a back seat for much of the event. APIs were mentioned by speakers, but I did not get any sense that the majority of the attendees were thinking about APIs and mash-up-ability of data as much as I do. I also wasn't sure that everyone knew what an API was, or why you would want one.

So we asked our Director of Product Planning, Don McIntosh to write an article about what APIs are, and why they're important. This is what he has to say about APIs.

With social applications, there is a clear and obvious use that everyone can understand, and the staggering traffic volumes for these sites make the topic all the more compelling. But what about open data and APIs? Why should we pay them any attention and how do we benefit from them?

An API is an Application Programming Interface. Web based APIs, sometimes referred to as Web services, are growing at a phenomenal rate. Basically, instead of information being presented in a predetermined manner through Web pages, APIs allow other applications (iPhone apps, Websites, MS Windows applications….) to extract specific chunks of information and combine it with other information in all kinds of ways to serve a specific purpose. Jim Ericson from Information Management blogged about this, and he included a good description of how Web services get used:

Now think of all the thousands of iPhone apps and how they amalgamate all kinds of Web services. You open your commuter traffic app, it calls on traffic information services, Google maps, a weather forecast and maybe an ad for public transportation. One browser app, many (API) calls.

Jim also mentioned how prominent APIs are becoming. For many popular websites, the network traffic generated by APIs actually exceeds the direct Web traffic. And that’s expected to continue. Perhaps even more interesting is the fact that these days, you don’t even need to be a programmer to use Web APIs. If you have played with Yahoo Pipes, or similar mashup tools, you know what I mean. Basically, these tools are empowering end users to create their own custom applications. Just drag and drop – no coding required.

So, they’re useful, widely used, accessible even to non-programming types, and becoming more popular by the day but what in particular makes them so important in a Gov 2.0 context? I’d summarise it by saying that it’s about making it possible (and easy) for those outside of government to present statistics in a context that is meaningful and useful for them, and that can help facilitate informed discussion and decision making. If I want to provide a service to help people decide where to live, I could combine census statistics such as occupation, income, and age and mash it up with information about the location of shopping centres, pubs etc from a different service. I could achieve the same by gathering all the data into a database and building my service on top, but by accessing the data through an API, my information can remain current, and my queries can be run by calls to the API, saving me from the complexities and resources required to process the data myself. I can also leverage other services such as Google maps to present results. And of course, thanks to mashup platforms, this kind of application might just be something that an (non-programmer) individual does to satisfy their own interest. Either way, it makes it much more possible for people to take government information and use it in ways that government may never have chosen to do.

From a data provider’s perspective, there are many things to consider when looking at providing APIs for direct data access and querying.

1. API vs other means

An API can facilitate innovation, and help automate services that other organizations may provide based on the data. It can also provide transparency by not colouring the data in any particular way, but leaving it open to others to render analysis of the data in their own way. On the other hand, if representing the data in certain ways is useful in promoting an organization’s mission, then it might be best to concentrate on delivering the appropriate views and/or viewing tools for the data. Or in some cases, it might make sense to do both.

2. Risk of abuse.

Gartner analyst Andrea diMaio noted that separating data from its source and having no clear way to let consumers understand its lineage or quality runs a great risk of it being misused, or deliberately doctored to represent the “facts” that best suit the application builder. What does this mean to the organization providing the data? Providers of official statistics go to great lengths to defend against this possibility yet by providing data through APIs, they may in some way increase the risk of this happening. Perhaps one way to look at it is to realise that this can happen anyway, without APIs. And it is probably unreasonable to expect a provider to do more than provide accurate quality information alongside their data (and even make it queryable through the API) so that users can make informed choices about what constitutes valid use of the data.

3. Data Privacy Protection

Many statistical agencies have “remote access data laboratory” services to give researchers the ability to perform detailed analyses on their data. There are typically manual checking processes in this, to ensure that researchers’ queries do not breach data privacy laws by identifying individuals from the data (something that is very easy to do, even when data has been anonymized). A provider would need to determine what privacy risks are posed by making the data available through an API, and ensure that appropriate safeguards are put in place.

4. Resources

An API call results in some amount of processing. Depending on the specifics, such as the type of query and the volume of data, the level of computing resources required can be quite significant. In the beginning, one option may be to limit API use to a few specific applications, and expand that over time. Alternatively, the API could impose certain limits for any single user. This is the approach that Twitter uses to manage the enormous demand it generates.

Update: a wordle.net tag cloud of this post


Wordle: Why APIs are important for Gov2.0

Tuesday, October 13, 2009

SuperSTAR Goodies - 6.7 Release progress

We would like to share the progress of some of the good stuff we have been doing in SuperSTAR development towards our 6.7 release.

Since transitioning to a fully agile process, we now run fortnightly iterations. From time to time, we will share the outcomes of an iteration and keep you all up to date.

Some of the key items that came out of this iteration were:
1) Record View in SuperWEB2 - we have implemented our first two user stories:
"As a SW RecordVIEW user, I want a way of seeing all the unit records that relate to a crosstab table so that I can understand the detail behind the crosstabulation".
"As a SW RecordVIEW user, I want filtered view of the unit records that relate to the cells in a crosstab table I choose so that I can focus on specific areas of interest"

We have implemented RecordView using GWT in the RESTful style. GWT allows us to get a Rich Internet client user experience. Using REST means that it is easy for other clients such as SuperView to consume the RecordView service.

2) Aggregated mapping for SuperWEB2
“As a SW2 user, I want to have a faster mapping experience so that I can be more productive”.

The Mapping team have done some great work to improve the performance of our mapping solution in SuperWEB2. They have developed a ArcGISMap widget which allows SuperWEB2 to communicate directly with the Arc GIS Server via a REST interface. This means much faster zoom and pan performance with maps.

3) SuperCROSS Local Annotations Refactor – we are making good progress to get the Annotations working correctly again in SuperCROSS and are on track with our plans.

4) Automated testing – we have also made good progress in automating the testing of SuperCROSS and SuperWEB2.

If you have any questions regarding our progress on the 6.7 release, or about any SuperSTAR product, please do not hesitate to contact us at support@spacetimeresearch.com


Sunday, October 4, 2009

Record VIEW Functionality in SuperWEB2 - comments welcomed


A guest blog from Don McIntosh, our product manager for SuperSTAR. Please feel free to give us comments or feedback so we can incorporate your feedback into our product development while we are developing it.

What I wanted to cover in this post is a brief summary of what we are planning for RecordVIEW, as well as a few features that might come in a later release. I wanted to write about this now while we are developing it so that our customers and partners have an opportunity to comment and hopefully improve on the end result. Another thing we’ll do is provide a link to a test instance to let you play around with it once we have it up and running.

RecordVIEW is a key feature of SuperWEB - and one that is currently lacking in SuperWEB2. It gives users the ability to drill down into the records that contribute to any cell in a table and view other attributes of those records. We find that customers use it for a variety of reasons. Two of the most common reasons are identification of individuals in interesting sub-populations, and data validation. An example of the former is “give me the list of names of all students scored above 95% in the English test”. An interesting point is that almost all the time, the records extracted via RecordVIEW need to be subsequently fed into another system for the user to complete their task. That’s a useful one for us to keep in mind, because perhaps we can add much more value by allowing some kind of direct integration between the RecordVIEW action and other systems.

The first step for RecordVIEW is actually to cover off much of the functionality we had in the original SuperWEB. That means identifying some cells, switching to the RecordVIEW tab, choosing what fields to report on, and then downloading to XLS or CSV. The major addition for the first release in comparison to what was in the original SuperWEB will be in the ease of use. The experience will be a lot more immersive, with fewer pauses for server updates and a richer UI. Click on a cell, chose RecordVIEW and then choose what fields to view. You can choose all fields, or start with none and add a select few. You can also sort the results, and selectively filter what fields you’re interested in viewing. One other key feature I’ll mention is that the results of the RecrordVIEW are transparently paginated, so if you have a very long list, the browser isn’t waiting a long time to update it; it simply adds more as you scroll down.

We are of course very aware that for some datasets, RecordVIEW is not appropriate, due to the sensitive nature of the data. We will keep this simple: if there is confidentiality enabled for a database, then no RecordVIEW. Other permission functionality will remain unchanged from the earlier version.

Other key features we will consider later on include cell selection from other views, such as areas on a chart or map. Also, as I mentioned earlier on, we’d like to explore how we can get RecordVIEW output might be more tightly integrated a workflow that involves taking sets of records to feed into another application for further processing, or viewing in a certain way.

It would be interesting to hear about some usage scenarios of feature ideas for RecordVIEW from our customers. We may be able to incorporate some scenarios in our acceptance testing, and hopefully learn about some ways to make this feature smarter and more in line with users’ core needs.

RecordVIEW will be available the Release 6.5 November service pack.

Thursday, October 1, 2009

We're in the cloud! SuperWEB available now

I'm really excited to announce that we aim to be among the first companies to host applications on the Apps.gov website.

To get there, we needed to get SuperWEB up into the cloud, and this week, we hosted our first application on the Amazon EC2 cloud. Yesterday, I got my first Amazon bill - $10 / day so far and we uploaded a lot of data!

Background:
Vivek Kundra, the US Federal Chief Information Officer, has launched the new Apps.gov Storefront to enable US Federal Government agencies to buy cloud computing services as easily as a consumer can acquire a Gmail or Facebook account.

Cloud computing services reduce costs through reductions in purchasing and maintaining servers, while simultaneously improving service scalabilty to manage peaks and troughs in usage. Kundra says that besides encouraging better collaboration among agencies, he expects cloud services to reduce energy consumption because agencies will be able to share IT infrastructures.

Space-Time Research is responding to the recent US Federal Government request for proposal for applications to be hosted via the Apps.gov website. The Apps.gov Storefront is managed by the US GSA (General Services Administration) and SuperSTAR software is already available for purchase through the GSA e-Library.

Space-Time Research cloud offerings

In September, Space-Time Research initiated a cloud offering by hosting SuperWEBSoftware as a Service (SaaS) on the Amazon EC2 cloud service. SuperWEB is currently in the process of being assessed for inclusion in the Apps.gov website. Once certified, SuperWEB SaaS will be available to buy as a small, medium, large or extra large implementation on a pay-by-month basis.

At the end of October, SuperVIEW will be production-ready and available via a Google App Engine hybrid cloud service.

More about Apps.gov

Apps.gov is managed by the GSA development team, which is led by Casey Coleman, GSA’s CIO. In the article Kundra's great experiment: Government apps 'store front' opens for business, Coleman says:

“Through Apps.gov, GSA can take on more of the procurement processes upfront, helping agencies to better fulfill their missions by implementing solutions more rapidly,”

“We will also work with industry to ensure cloud-based solutions are secure and compliant to increase efficiency by reducing duplication of security processes throughout government."


Tuesday, September 22, 2009

My shortest blog ever

My three favourite sites at the moment:

As we enable public intelligence and data provision, and we're an Australian based company, I have to keep on top of this every day. I love how fast ideas are moving.

For all goodness in quality and testing management. If I ever have a question or problem to solve and I'm stuck, I go here. Good for inspiration and great ideas.

Just launched by the US Government and we're going to be on it soon with a cloud provision of SuperWEB. Any US Government Agency will be able to buy us through this process. Super-excited about this one.

Jo

Contributors