Wednesday, November 11, 2009
New Blog Location
Please update your bookmarks to www.spacetimeresearch.com/blog.html
Sunday, November 1, 2009
Protecting confidentiality - some real life examples
Don McIntosh has come to the party again by contributing a new blog post on how we are enabling our customers to disseminate detailed information while protecting the privacy of individuals. In the context of being providers of Official statistics, making data more available, and making governments more transparent, we show that it *can* be done - you *can* release data.
We are currently engaging with three customers and developing new requirements around the area of privacy protection on their data. For two of the three, the main goal is to deliver more detailed, useful data to their customers without compromising privacy concerns. The other key goals are around reducing the risk of accidentally releasing sensitive data (a goal of increasing importance given the Gov 2.0 fueled demand for more open data), and reducing costs associated with the application of privacy protection. I thought I'd write a short note to summarise our work in this area of late.
We have an API plugin architecture for applying disclosure control. Basically, you can build your own modules that do things like adjust, conceal, and/or annotate cell values based on certain rules, or reject a query if it's deemed too sensitive for whatever reason. You can also record query details and use them to monitor for potential privacy intrusions.
The work we are looking at doing in relation to current customer requests includes the following:
- Implementing plugins with customised rounding and concealment rules. This is straight forward work as far as our current architecture is concerned, and helps our customers with these requirements to implement rules that maximise the data they can make available. For one customer, we have written a plugin that will suppress numbers less than a certain value, and any related totals. So for example, if you were suppressing all numbers in a table less than or equal to 3, a simple table would show suppression of that cell, plus any totals containing that cell. The example table demonstrates how a returned table would look. By suppressing the totals, you are preventing someone from back-calculating a value that has been suppressed.
- Allowing custom selection of different rule combinations for testing and more advanced use of disclosure control. This is useful especially where you have a few in-house specialists who are authorised to be more lenient in terms of what rules need to be applied when responding to ad hoc information requests.
- Extending confidentiality to apply to the output of calculations (SuperSTAR field derivations). For example, you might have a function that in some cases returns "..C" instead of a real value for certain cells as per the example above. Confidentiality can be extended to work with derived data. For example, it would be useful for determining a statistical mean or median and concealing the result if there was less than a certain number of contributors.
We are really keen to hear from our customers and other interested parties. If you have some recent experience in using confidentiality in SuperSTAR or elsewhere, or would like to give us any kind of related feedback, please do feel free to leave a comment or contact us directly.
Wednesday, October 21, 2009
Why APIs are important for Gov2.0
I was at the Gov 2.0 conference in
So we asked our Director of Product Planning, Don McIntosh to write an article about what APIs are, and why they're important. This is what he has to say about APIs.
With social applications, there is a clear and obvious use that everyone can understand, and the staggering traffic volumes for these sites make the topic all the more compelling. But what about open data and APIs? Why should we pay them any attention and how do we benefit from them?
An API is an Application Programming Interface. Web based APIs, sometimes referred to as Web services, are growing at a phenomenal rate. Basically, instead of information being presented in a predetermined manner through Web pages, APIs allow other applications (iPhone apps, Websites, MS Windows applications….) to extract specific chunks of information and combine it with other information in all kinds of ways to serve a specific purpose. Jim Ericson from Information Management blogged about this, and he included a good description of how Web services get used:
So, they’re useful, widely used, accessible even to non-programming types, and becoming more popular by the day but what in particular makes them so important in a Gov 2.0 context? I’d summarise it by saying that it’s about making it possible (and easy) for those outside of government to present statistics in a context that is meaningful and useful for them, and that can help facilitate informed discussion and decision making. If I want to provide a service to help people decide where to live, I could combine census statistics such as occupation, income, and age and mash it up with information about the location of shopping centres, pubs etc from a different service. I could achieve the same by gathering all the data into a database and building my service on top, but by accessing the data through an API, my information can remain current, and my queries can be run by calls to the API, saving me from the complexities and resources required to process the data myself. I can also leverage other services such as Google maps to present results. And of course, thanks to mashup platforms, this kind of application might just be something that an (non-programmer) individual does to satisfy their own interest. Either way, it makes it much more possible for people to take government information and use it in ways that government may never have chosen to do.
An API can facilitate innovation, and help automate services that other organizations may provide based on the data. It can also provide transparency by not colouring the data in any particular way, but leaving it open to others to render analysis of the data in their own way. On the other hand, if representing the data in certain ways is useful in promoting an organization’s mission, then it might be best to concentrate on delivering the appropriate views and/or viewing tools for the data. Or in some cases, it might make sense to do both.
Gartner analyst Andrea diMaio noted that separating data from its source and having no clear way to let consumers understand its lineage or quality runs a great risk of it being misused, or deliberately doctored to represent the “facts” that best suit the application builder. What does this mean to the organization providing the data? Providers of official statistics go to great lengths to defend against this possibility yet by providing data through APIs, they may in some way increase the risk of this happening. Perhaps one way to look at it is to realise that this can happen anyway, without APIs. And it is probably unreasonable to expect a provider to do more than provide accurate quality information alongside their data (and even make it queryable through the API) so that users can make informed choices about what constitutes valid use of the data.
Many statistical agencies have “remote access data laboratory” services to give researchers the ability to perform detailed analyses on their data. There are typically manual checking processes in this, to ensure that researchers’ queries do not breach data privacy laws by identifying individuals from the data (something that is very easy to do, even when data has been anonymized). A provider would need to determine what privacy risks are posed by making the data available through an API, and ensure that appropriate safeguards are put in place.
An API call results in some amount of processing. Depending on the specifics, such as the type of query and the volume of data, the level of computing resources required can be quite significant. In the beginning, one option may be to limit API use to a few specific applications, and expand that over time. Alternatively, the API could impose certain limits for any single user. This is the approach that Twitter uses to manage the enormous demand it generates.
Update: a wordle.net tag cloud of this post
Tuesday, October 13, 2009
SuperSTAR Goodies - 6.7 Release progress
Since transitioning to a fully agile process, we now run fortnightly iterations. From time to time, we will share the outcomes of an iteration and keep you all up to date.
Some of the key items that came out of this iteration were:
1) Record View in SuperWEB2 - we have implemented our first two user stories:
"As a SW RecordVIEW user, I want a way of seeing all the unit records that relate to a crosstab table so that I can understand the detail behind the crosstabulation".
"As a SW RecordVIEW user, I want filtered view of the unit records that relate to the cells in a crosstab table I choose so that I can focus on specific areas of interest"
We have implemented RecordView using GWT in the RESTful style. GWT allows us to get a Rich Internet client user experience. Using REST means that it is easy for other clients such as SuperView to consume the RecordView service.
2) Aggregated mapping for SuperWEB2
“As a SW2 user, I want to have a faster mapping experience so that I can be more productive”.
The Mapping team have done some great work to improve the performance of our mapping solution in SuperWEB2. They have developed a ArcGISMap widget which allows SuperWEB2 to communicate directly with the Arc GIS Server via a REST interface. This means much faster zoom and pan performance with maps.
3) SuperCROSS Local Annotations Refactor – we are making good progress to get the Annotations working correctly again in SuperCROSS and are on track with our plans.
4) Automated testing – we have also made good progress in automating the testing of SuperCROSS and SuperWEB2.
Sunday, October 4, 2009
Record VIEW Functionality in SuperWEB2 - comments welcomed
A guest blog from Don McIntosh, our product manager for SuperSTAR. Please feel free to give us comments or feedback so we can incorporate your feedback into our product development while we are developing it.
What I wanted to cover in this post is a brief summary of what we are planning for RecordVIEW, as well as a few features that might come in a later release. I wanted to write about this now while we are developing it so that our customers and partners have an opportunity to comment and hopefully improve on the end result. Another thing we’ll do is provide a link to a test instance to let you play around with it once we have it up and running.
The first step for RecordVIEW is actually to cover off much of the functionality we had in the original SuperWEB. That means identifying some cells, switching to the RecordVIEW tab, choosing what fields to report on, and then downloading to XLS or CSV. The major addition for the first release in comparison to what was in the original SuperWEB will be in the ease of use. The experience will be a lot more immersive, with fewer pauses for server updates and a richer UI. Click on a cell, chose RecordVIEW and then choose what fields to view. You can choose all fields, or start with none and add a select few. You can also sort the results, and selectively filter what fields you’re interested in viewing. One other key feature I’ll mention is that the results of the RecrordVIEW are transparently paginated, so if you have a very long list, the browser isn’t waiting a long time to update it; it simply adds more as you scroll down.
We are of course very aware that for some datasets, RecordVIEW is not appropriate, due to the sensitive nature of the data. We will keep this simple: if there is confidentiality enabled for a database, then no RecordVIEW. Other permission functionality will remain unchanged from the earlier version.
Thursday, October 1, 2009
We're in the cloud! SuperWEB available now
I'm really excited to announce that we aim to be among the first companies to host applications on the Apps.gov website.
Cloud computing services reduce costs through reductions in purchasing and maintaining servers, while simultaneously improving service scalabilty to manage peaks and troughs in usage. Kundra says that besides encouraging better collaboration among agencies, he expects cloud services to reduce energy consumption because agencies will be able to share IT infrastructures.
Space-Time Research is responding to the recent US Federal Government request for proposal for applications to be hosted via the Apps.gov website. The Apps.gov Storefront is managed by the US GSA (General Services Administration) and SuperSTAR software is already available for purchase through the GSA e-Library.
Space-Time Research cloud offerings
In September, Space-Time Research initiated a cloud offering by hosting SuperWEBSoftware as a Service (SaaS) on the Amazon EC2 cloud service. SuperWEB is currently in the process of being assessed for inclusion in the Apps.gov website. Once certified, SuperWEB SaaS will be available to buy as a small, medium, large or extra large implementation on a pay-by-month basis.At the end of October, SuperVIEW will be production-ready and available via a Google App Engine hybrid cloud service.
More about Apps.gov
Apps.gov is managed by the GSA development team, which is led by Casey Coleman, GSA’s CIO. In the article Kundra's great experiment: Government apps 'store front' opens for business, Coleman says:“Through Apps.gov, GSA can take on more of the procurement processes upfront, helping agencies to better fulfill their missions by implementing solutions more rapidly,”
“We will also work with industry to ensure cloud-based solutions are secure and compliant to increase efficiency by reducing duplication of security processes throughout government."
Tuesday, September 22, 2009
My shortest blog ever
Thursday, September 17, 2009
Bug Safaris - a different way to find bugs
Here is a post from Adrian Mirabelli - a Customer Quality engineer at STR. The idea for a bug safari came out of a presentation at the ANZ Test Board Conference in March 2009.
Bug safaris at Space-Time Research
For release 6.5, the STR quality team introduced “bug safaris” as a way to effectively and quickly find software bugs.
A bug safari involves the majority of the organisation including development, design, and management to locate bugs. Test cases or scripts are not necessarily provided but guidance should be given. Certain areas are targeted and the amount of interruptions is minimised to increase the effectiveness. Note the bug safari can be held multiple times over a release.
Planning is the key!
At the beginning of a bug safari, the quality manager invites the participants to a planning session or “kickoff”. The purpose of the kickoff is to define:
- The objectives of the session – including to communicate what is being done; all participants should be very clear about this by the conclusion of the session
- Areas to test and who will do it – this is important to ensure coverage and no wasteful duplication
- Configuration required and who will do it
- Test cases or documentation required and who will do it – the structure of these products should be agreed, for example, is it a checklist or a matrix that is filled out on-the-fly?
- Some ideas of how to test – do something unusual or non-typical, test boundary values, do something unusual
- Duration of session
- Method for reporting issues and bugs
Typically the system configuration and documentation will be done by the testing team with the help of technical resources if required. Login information is distributed in advance. The quality manager needs to decide how to report results including submission of bug reports, and therefore plays a crucial role in this testing.
At the agreed date or time, the testing itself is performed, typically no longer than two hours but longer than 45 minutes. This session is generally intense in nature as the mission is to find problems. The system testers are usually assigned to a product and work with the participants to help identify issues and troubleshoot problems. They can also be actively testing the system depending on what is agreed at the kickoff session.
At the conclusion of the session, results are tabulated and any bugs found are raised in the incident tracking system.
Within the next couple of days at the absolute latest, a debriefing is held with the participants including the system testers. The quality manager reports on bugs found, and discussion is held regarding:
- The perceived level of success of the bug safari
- What can be improved for next time
- What worked well this time
- General feelings and sentiments
- Required actions and action owners.
Why not just use structured tests?
Procedural test cases, which follow a step-by-step test script, are excellent for communicating to the wider audience how you are testing and to obtain buy-in and feedback from stakeholders. In my experience, however, you can find bugs by looking around the software, not just looking at the expected results of the test case. Further, bugs are found when testing certain sequences of data, mouse clicks, configuration, operating systems and more, and it is expensive to write test cases for all these combinations.
Why involve people outside the testing team?
You and I are testing software every day. Just by using software you are testing whether it satisfies your need and your purpose. Everyone interacts with software differently, and is likely to try things out in various and different ways, some typical and some strange, so it is good to have such testing sessions to really verify the software is “fit-for-purpose”. It also gives the opportunity for fresh eyes to look and question the software, and test out other important elements such as usability and compatibility. It also increases the participants’ knowledge of the software, whilst testing the accuracy of the configuration and documentation, including the quality of test harnesses and pre-defined scripts.
What are the benefits?
Bug safaris are defined as “exploratory testing” with more tangible results. The results can easily be reported on charts or whiteboards and transferred to the test management and tracking system.
We allow the participants to exercise freedom of thought in executing tests. In this way we can find new bugs as possible new combinations of tests are being exercised. Quality therefore improves as we can address and fix such bugs based on their priority. The participant is encouraged to investigate and should investigate any strange behaviour they find, perform further tests, and ask questions.
By everyone being involved in testing, and not just the test team, it improves the visibility of the test organisation and the importance of testing, whilst sharing the ownership of “quality” to all people involved in the development of the software from concept to implementation.
By performing such tests, we can report and therefore utilise many metrics to find out, for example:
1. Number of bugs or issues found per session
2. Number of sessions run
3. Areas covered in the session with combinations
4. Time required to configure
5. Time required to test
6. Time required to investigate issues
The key is that people work together and discuss openly the software and what it does.
What are the challenges?
This method of testing is still relatively new, and is therefore not a perfect method, nor a substitute for traditional testing methods. The key is to balance out the proportion of how much testing is structured versus unstructured, whilst ensuring that the testing results are captured sufficiently. Such examples of test tracking may require the participant to complete a spreadsheet, matrix or running sheet.
What is being done in future?
STR will run bug safaris in future releases. Bug safaris have been shown to find bugs, and important bugs, and are continuing to win flavour in the testing industry as its true benefits are being realised. Introducing bug safaris have the advantage of not requiring major cultural or system changes, or expensive start-up costs.
Monday, August 24, 2009
Our Quality Vision (and Addressing Our Quality Past)
- Timely, relevant, functioning software that works!
- Performance, stability and resiliency focus.
- Deliver releases of SuperSTAR that are perceived within STR and by our partners and customers as better than the previous release.
The customers reported bugs, severity and their own priority via our normal support channel (via email to support@spacetimeresearch.com). We regularly triaged the bugs reported, and communicated via conference call with each customer to advise what we intended to do, or discuss concerns.
- Integration and configuration issues were ironed out during the pre-release phase.
- Customer-focused testing found issues we would never have found.
- The end delivery held no surprises.
- We delivered on time to those customers and met their deadlines.
- Involve more customers in pre-release testing.
- Collect more sample databases from customers.
- Collect reference data sets from customers so we can validate our statistical routines.
- Use client test beds for complex or unusual environments.
- Open up our change management and support processes so customers can track issues they are interested in.
Wednesday, August 19, 2009
Open Data Initiative - Free SuperVIEW hosting of data
- Is Cost and Time Efficient — Reduces the workload on your data analysts and researchers.
- Provides Data that is Complete — Why compromise on providing a subset of the data? Maximize the ability of the public to self‐service data of personal interest.
- Provides Data as Service — Now you can provide a new online data service to the public.
- Protects the Relevance of Your Brand — Provide an engaging and rewarding experience for the public. This reinforces the relationship of trust they have in your organization.
- Delivers Data Integrity — Have confidence that the public are seeing the right numbers, graphs, and maps, andreaching the correct interpretation and understanding behind those numbers.
- Delivers Data Responsiveness — Minimize the time between data collection and data dissemination to ensure maximum relevancy of the data to the audience.
- Creates Communities of Users — Ensure the online experience can be captured and shared by the public incollaborative environments from Blogs to Twitter.
A. This is a free service and as such it has business model restrictions for customers - they cannot charge a fee for access to their created sites. It must be public and not sit behind authentication or payment gateways. We have a paid service available that overcomes these restrictions but this is a good way to test drive the technology and the dissemination approach using the free service initially. Alternatively customers can purchase a paid SuperVIEW software license and implement their own business model around a deployed SuperVIEW.
Q. What about confidentiality?
No confidentiality capabilities are offered with the free SuperVIEW. The Open Data Initiative will host all data in the Cloud so by it's nature data provided should not contain confidential information. We can provide a confidential Cloud based service using our Hybrid connector, but this becomes a paid solution engagement.
Q. How do statistical boundaries get loaded?
We will detail this in the data collection process over the next week with people that sign up to our early adopter program, but think it will be along the lines of providing a shapefile (with some size limits -- i.e. pre-simplified and for particular areas) or KML to us.
Q. How does the application get integrated with the data providers website.
Option 1 -> provide a link that takes the user from the data provider website to the Open Data Initiative website.
Option 2 -> use an IFRAME to embed the Open Data Initiative hosted site into their website.
Friday, August 7, 2009
My favourite Cloud Reading List / Resources
- Amazon EC2 Service Level Agreement --> http://aws.amazon.com/ec2-sla/
- Amazon FAQs --> http://aws.amazon.com/ec2/faqs/#What_does_your_Amazon_EC2_Service_Level_Agreement_guarantee
- Amazon Elastic Compute Cloud (Amazon EC2) --> http://aws.amazon.com/ec2/
- Amazon Elastic Compute Cloud (EC2) Running IBM --> http://aws.amazon.com/ibm/
SuperSTAR and Cloud - nutting through the details of Google Apps Engine
- For clients who already have a SuperSTAR infrastructure, external web hosting can sometimes be difficult to arrange. This is an easy and inexpensive way to get around it.
- Clients can take advantage of the scalability offered and handle peak loads without having to buy massive servers.
- Of course, there's others. They're in last week's blog.
- It's true that there is no SLA for the Google App Engine. I reckon this rules out half of our customers straight away. Especially those who are data providers like the Australian Bureau of Statistics and want to reliably provide access to data and analytical tools to the world 24/7. Other customers, such as those who use our software for internal or researcher use, or those who are just starting out with SuperVIEW, might take this risk on board and try it out.
- It's really difficult to work out what it would actually cost. Everything is costed by usage per day and there's the option of getting a free service that then jumps into a paid service, or a paid service that gets more expensive as your user base grows. What we did work out was that it would be free for most SuperVIEW applications up to 2,000 user sessions per day. After that, it would cost approximately $300 USD per month to add an additional 1,000 users.
- I am preparing a white paper for our sales team and customers on what we offer now and intend to offer in the future. This will have enough technical details to be able to talk to project owners / sponsors, but IT representatives will need more detail.
- I'm finding out as much as I can about what governs our customer decisions now. I'm keen to get help on that because it's really hard to find out.
- I'm talking to Gartner analysts to get their take. Particularly as the article I'm sent most often is one that Gartner wrote about the security concerns of cloud and what to watch out for.
- I'm talking to a real cloud provider - Telstra - who are an Australian provider and who are going to host all of Visy Recycling's applications which is a significant move.
- I'm going to ask questions of the Australian gov2.0 taskforce and see what they think about it.
- I'm going to keep reading the articles I get sent every day.
Tuesday, July 28, 2009
Cloud Computing Services at Space-Time Research
I have been doing a lot of reading about cloud computing and concerns over security of data. In case you hadn't noticed, cloud computing is a hot topic and IT magazines and blogs are overflowing with articles. Kundra is talking about it (Kundra courts the risk of innovation -- Government Computer News ), Gov 2.0 and Data.gov encourage it, and some US city departments are investigating moving all their services into a cloud (L.A. weighs plan to replace computer software with Google service - Los Angeles Times )
At first I wondered what all the fuss was about - it's only third party hosting of applications after all, and it's already been done – A LOT. Over the last few weeks I've delved a bit deeper, and discovered that my understanding of the technology, and options available, was limited. There are a number of different ways applications can be hosted or delivered via a cloud, and putting your application on a separate server housed at an external provider, which is what we do for some of our existing clients, is a very simple but expensive way to do it. I've since discovered there are other ways that might be better.
I have worked at and with large organizations over the last 20 years, and I understand why the idea of moving applications into a cloud is attractive. Sometimes it can be nearly impossible for a business unit within an organization to get a server or space on a server to host applications. And if you can get one, for some organizations, it can cost up to hundreds of thousands of dollars even if the server itself only cost a few thousand. Here we have an opportunity to get rid of one of the major stumbling blocks in putting a new application (particularly a web-application) out there.
The potential benefits of cloud computing are clear:
- It can be MUCH cheaper. We've worked out that a basic SuperVIEW application could be hosted for under sixty dollars a month (depending on number of users etc.) This compares with an external hosting service cost of $1500 AUD per month for a dedicated server.
- It removes constraints imposed by IT departments, or even harder to deal with, IT Service Providers. The approvals to host applications on internal servers can be onerous.
- It can offer scalability to scale up or down, particularly when there is an initial peak load. I’m hoping that when we launch 2011 census data online with the Australian Bureau of Statistics that we can use cloud resources to cope with our initial peak loads.
- As the hardware and infrastructure are already available, it can be very quick to deploy at application and use it. No more waiting for the server to be ready.
The major considerations are:
- Some cloud services offerings won't tell you or guarantee where your data is stored and this makes some organizations nervous.
- The technology and different options available are new and don’t necessarily follow strict government security procedures. I figure that by the time some government organizations are ready to launch an application it will sorted out.
- Working out your optimal pricing can be a little tricky - it's a bit like a mobile phone plan and if you don't know how your system is going to be used, it can be hard to work out which is the most cost-effective model.
We have recently come up with a couple of cloud offerings for our SuperVIEW software that offer the best of both worlds for SuperSTAR customers. Our customers have given us some direct feedback that they are very interested in cloud models for hosting web applications, but they would like to keep their data in-house. This is not simply an issue of security; all of our customers have substantial data management systems in place, either fully in-house, or connected to privately outsourced data centres. Having the data for SuperVIEW hosted in-house ensures that the provider retains full ownership and does not have to extend its data management policies to address the differences that cloud computing would introduce.
Our HYBRID model fits this bill. The SuperVIEW application is hosted on a cloud provided by the Google App Engine. Via a secure data connector developed by STR, the application connects to a customer's existing SuperSTAR database housed internally. Encrypted, aggregated data is returned to the web application for analysis and visualization in the SuperVIEW web client. Because SuperSTAR databases are read-only, and cannot be manipulated by SQL or other programs, the raw data is secure and is not vulnerable to alteration or attack.
We also want offer the ability to experience the whole SuperSTAR application in a cloud using a different service provider . Currently, we do provide fully hosted dedicated-server solutions, and over the next month we are working out who best to source these services from in a more distributed environment. There are some customers who will always want to keep their data management tools in-house, but others may want to migrate the whole solution to a cloud. We expect to be able to provide a hybrid, or fully cloud-based SuperSTAR service to customers with the next release of our software in the next month or so.
Until next time,
Jo