The critical importance of testing

It should go without saying, but good testing is absolutely critical to producing technology that benefits your organization. As a former insurer CIO, I can attest that the group most often squeezed to meet a deliverable date is the quality assurance team that has the incredible responsibility to ensure accuracy. Perhaps our organization was unique, but having also spent years at vendors, doing implementations, I see it happen over and over. Projects run late and rather than adjusting the date, or the project content, testing gets squeezed, often with unintended consequences. The corollary to poor testing is “Day 2.” This is code in the IT community for “we didn’t get it done, but will let someone else worry about it later.” This is compounded in the Life insurance industry, as “later” can sometimes be measured in years. This means that the team that built the code may be long gone by the time Day 2 rolls around, or worse, the fact that there are open items will be lost in the history or lore of the organization. The reason I bring this topic up is an article I read recently that dramatically shows the challenges of inadequate testing. For the nerds amongst our readership, one of the classic fails of programming is an overflow. For the non-nerds, this just means you have a counter in your program that isn’t big enough. Rather than continuing to count up, it wraps back to zero and starts over. The results can be major to a program and it is an error that might not be discovered for years. Remember, you’re counting something, so reaching this limit can take a long time. The latest programming error to experience an overflow? The Boeing 787 Dreamliner. It seems that you need to reboot the plane every 248 days or risk the plane falling from the sky. 248 days, measured differently, 2^31 one-hundredths of a second. Now I have no idea what they’re counting, but apparently it is pretty important. Should testing have caught this? Of course. Did it? Apparently not. So the next time you want to cut short testing, remember this post and ask yourself “should we?” You may not have a plane falling from the sky, but you could have a hidden calculation error that could cost your company. Take a small error and multiply it over time and it quickly becomes a very, very large error. A potentially catastrophic financial error.

Robotics, bots and chocolate teapots

Increasingly in operational efficiency and automation circles we’re hearing about bots and robotics. As a software engineer in days past and a recovering enterprise architect I have given up biting my tongue and repeatedly note that, “we have seen it all before.” I’ve written screen scrapers that get code out of screens, written code to drive terminal applications and even hunted around user interfaces to find buttons to press. The early price comparison websites over a decade ago used these techniques to do the comparison. These techniques work for a while but are desperately fragile when someone changes the name of a button, or a screen or a screen flow. However, they can help. I recall a while ago a manager lamenting ‘the solution’ was about as useful as a chocolate teapot. A useful 10 minutes hunting for this video of a chocolate teapot holding boiling water for one whole pot of tea made the point for me. Sometimes all you need is one pot of tea.
Tea poured from a chocolate tea pot

Tea poured from a chocolate tea pot

So it’s not new, some bots may be fragile and with my “efficiency of IT spend” hat on (the one typically worn by enterprise architects) stitching automation together by having software do what people do is an awful solution – but as a pragmatist sometimes it’s good enough. Things have moved on. Rather than a physical machine running this with a ghost apparently operating mouse and keyboard we have virtual machines and monitoring of this is a lot better than it used to be. Further machine learning and artificial intelligence libraries are now getting robust enough to contribute meaningfully smart or learning bots into the mix that can do a bit more than rote button pressing and reading screens. In fact this is all reminiscent of the AI dream of mutli-agent systems and distributed artificial intelligence where autonomous agents collaborated on learning and problem solving tasks amongst other things. The replacement of teams of humans working on tasks with teams of bots directly aligns with this early vision. The way these systems are now stitched together owes much to the recent work on service oriented architecture, component orchestration and modern approaches to monitoring distributed Internet scale applications. For outsourcers it makes a great deal of sense. The legacy systems are controlled and unlikely to change, the benefits are quick and if these bots do break they can have a team looking after many bots across their estate and fix them swiftly. It may not be as elegant as SOA purists would like but it helps them automate and achieve their objectives. The language frustrates me though, albeit bots is better than chocolate teapots. I’ve heard bot referred to as a chunk of code to run, a machine learning model and a virtual machine running the code. I’ve even heard discussion comparing the number staff saved to the number of bots in play – I can well imagine operations leads in the future including bot efficiency in their KPIs. Personally, I’d rather we discussed them for what they are – virtual desktops, screen scraper components, regression models, decision trees, code, bits of SQL were appropriate, etc. rather than bucket them together but perhaps I’m too close to the technology. In short bots may not be a well-defined term but the collection it describes is another useful set of tools, that are becoming increasingly robust, to add to the architects toolkit.

The Blame Game

Kathleen Sebelius resigned on Monday, and I’m betting that she is hoping that her next role does not include a major IT project.  As Secretary of the Department of Health and Human Services (HHS), Sebelius was responsible for overseeing the rollout of the troubled website whose launch was tainted by serious technical problems. 

Initially the technical problems were thought to be confined to scalability issues given the large amount of traffic.  But it was found that there were a variety of other issues.  The site rejected valid passwords, served up blank drop-down menus, and crashed repeatedly.  There were challenges with the database, issues with integration, and after millions of visits on the first day, only six people got all the way through.  New contractors were brought in to fix the problems which added to the cost overruns.  The cost ceiling began at $93M, was raised to $292M, and today, it is estimated that the site has cost around $500M.    To be fair, this was an extremely complex project consisting of 6 complex systems, 55 contractors working in parallel, 5 government agencies were involved; it is being used in 36 States, and covers 300 insurers’ 4500 insurance plans.

There were a number of contributing factors to the technical problems.  No single contractor managed the entire project and there was a lack of coordination across the multiple vendors.  There were a number of last minute changes – and the project was managed using a waterfall methodology – which can make it difficult to respond to the changes quickly.  Testing was inadequate.  Not only did the system not perform according to design, but it didn’t scale to the level anticipated.  Clearly they knew what the load would be – but the load testing didn’t meet the capacity plan. 

However, Sebelius had little direct oversight of the project and certainly wasn’t responsible for the day to day project management.   The website design was managed and overseen by the Centers for Medicare and Medicaid Services, which directly supervised the construction of the federal website.   Regardless, Sebelius is likely updating her resume today and considering alternatives. 

What does this mean for a CIO?  If you’re going through a large scale project – and many carriers are – you won’t know everything that the project manager knows – even though your neck is on the line if the project fails. Large scale projects require a different level of management than day to day operations.

Areas to focus on include:

·         Set realistic time frames.   Don’t underestimate the amount of time it will take to implement the project.  A lot of carriers want to hear that implementation of a policy admin system can be done in 6 – 12 months and while there are some examples of that being true, it’s more likely that your solution will take longer.  Plan carefully,  add contingencies, and if you end up with a choice between launching late or launching with less functionality that was initially planned,  you’re usually better off taking the time to do it right.  People are much less forgiving of a poorly executed project than a late one.

·         Manage the project with multiple, aligned work streams.  Large projects generally will require multiple work streams.  We often see carriers who divide the project into streams such as data, workflow, rating, documents, etc.   This allows the team to focus their efforts.  However, you have to continuously monitor that the streams are aligned.  Communication across multiple work streams is critical

·         Communications is a key success factor for large projects, yet is often an afterthought – or worse – not planned.  Communications across project teams is necessary to assure the functionality is aligned as planned. Communications is also critical when it comes to managing scope creep.  When the team clearly understands the priorities, they’re better able to make tradeoffs early on.   Clearly setting expectations around the deliverables and then continuing to manage those expectations as the project moves forward is an important piece of the communications – especially if faced with optimistic delivery dates, changing requirements, or staffing constraints. 

·         Focus on the worst case scenario.  Be skeptical when all is going smoothly.  Insist on regular checks on the project and take red flags seriously.  Realistic monitoring of the project progress and analysis of the underlying factors impacting the use of contingency will help identify issues early on.  Make sure not to just look backwards at what has occurred – but focus on readiness for future stages.  Some carriers benefit from having third parties come in and conduct project health checks – looking objectively across the project for subtle indicators of potential issues.

In the end, Sebelius is responsible for the results of the implementation and her resignation should be seen as a red flag for carriers in similar situations.  Take a look at the governance you’ve put in place for your large projects.  Now may be a good time to consider adding some additional oversight. 

Failing is the first step of Innovation

Most cultures reward and encourage success. Growing up, we learn quickly to highlight our brilliance and down play our mistakes. The corporate world we operate in only reinforces that. Do you remember how many times you had to fall off your bike before you could ride it successfully? You had to fail before you got it right. Had you been laughed at or criticised in those moments of lying on the asphalt would you ever have picked yourself up and tried another time? Being an innovator or supporting innovation in your organisation requires a willingness to embrace failure.  And I mean really embrace it. It means allocating money to ideas that have no apparent business case. It means creating certain spaces for staff to experiment with no fear of retribution (think Google’s 20% idea). It means falling of your bike, again, scraping your knees, in the knowledge that at some point, something might come of the endeavour. From failure comes huge learning for the people and the organisation. Failure uncovers a gold mine of information that you had no access to before you started. Many readers will nod sagely at what appears to be obvious. But my experience tells me that most of us go out of our way to avoid failure. We have been institutionalised to only ever succeed (or at least be seen as succeeding) at every step and at every endeavour. Don’t believe me? Here’s my challenge for you – write your own failure CV. Look back on your career, and write up each failure and what you learnt from it. My own experience and that of friends shows me that putting pen to paper to outline our failures is remarkably hard because of our conditioning. And this is symptomatic of our unwillingness to see failure as the necessary stepping stones of innovation. I remain convinced that we need to face failure head on. Failure is good. Embracing failure will allow us as leaders to unlock the innovation potential that currently lies dormant in our companies. ______________________________________________________________________________________ Our competition for “Innovation in 6 words” is still open. We have received over 110 submissions so far! You can check out your competition here. To take part in the discussion, join our LinkedIn discussion group (Innovation is…) devoted to the topic.  To participate in the challenge, e-mail your definition to Erica Ferguson at using the subject line “Innovation is” along with your contact information. We will be announcing the winning entries during our Innovation & Insight (I&I) Day on February 27, 2013. Regular readers of our blog know that I&I is a flagship Celent event. As always, it will host a variety of Celent and non-Celent speakers and will be a great opportunity to network with the industry peers. If you’d like to see the full agenda and learn more details, please visit our registration site.

Improving Business Analysis from the Bottom Up

As part of research effort into building business analysis skills, I was talking with a manager of a business analyst department in a major U.S. bank this week. He described their approach to improving requirements collection. It struck me as an effective and practical method that I want to pass along.

Many of the models for building business analysis skills are top-down initiatives, planned and executed as part of a wider improvement program. These are often driven from a learning and development department or a special training area within the IT organization (see the Celent Report Building a Better Business Analyst – Transforming the Enterprise). This bank’s approach was more “bottom-up” and grew out of a focus on their software testing process. They improved the rigor of their business requirements documentation through automating their test scripting, planning and test case development process. As part of their revised methodology, it is now necessary for business analysts to gather requirements in a structured manner that can be automatically uploaded into their test automation software. These then generate test scripts, plans and cases. This yields increased consistency, control and structure to what previously was a very ad hoc process.

This practical development approach is valuable in that it delivers skill and process enhancements as part of the day-to-day activities of software development. For those looking for a different strategy for improving requirements gathering, getting there through automated testing may help. For those that have also taken this same approach, I would be interested in knowing what the results have been.

Software Application Testing in Insurance, Part IV: Best Practices

Previous posts about testing Topic 1: Automated Testing Topic 2: Getting Test Data Topic 3: Test Environment Topic 4: Best Practices in Application Testing: It’s been a while since I’ve blogged about this subject, but I think the previous three posts could use some follow up. After talking about the need for testing, getting the test data, and setting up a test environment, we need to actually do the testing. The best practices for software application testing are not easy to set up and maintain, often requiring one or two full time IT resources with a test development specialty (often called a Software Test Engineer, or an STE). Not many technology vendors follow these best practices, let alone insurers, and the expense and effort might not be practical or possible for every insurance company. However, by understanding the best practices an insurer can at least take steps in the right direction. The ideal scenario is an automated system that is able to clean itself to a standard state before and after each test. For example, if the QA team wants to run manual tests for software application ABC, they would run a script that would:
  • Drop all the tables in the database, recreate the tables in the database for ABC, and populate it with the same set of start data.
  • Clear out any existing application code, then get and install the latest application code from development.
  • In extreme cases, the entire test server OS will be reinstalled from scratch, though that is likely unnecessary for this level of application testing.
Later, if a different QA team wants to run manual tests for software application XYZ, they would run a similar process. Both teams are guaranteed a stable, repeatable base from which to begin their testing. By recreating the database and the application each time the tests are run, there is no need to worry about maintaining the database in a certain way, and multiple users can work with the same servers. Preparing to run the Automated System Tests should behave in a similar manner, with the reminder that the order of tests shouldn’t matter. That means it might be necessary to REBUILD the database between some automated system tests. In the case of Unit Tests, the ideal test environment should go one step further. Each unit test (and in many cases automated system tests) should contain its own code to add the data to the database needed by the test. Since unit tests are intended to be very focused, the typical test just needs a few rows of data. After the test is complete, it should clean out the data it just added. Since these are written by developers, there should be an application specific API for helping developers do this quickly, written by the IT resources devoted to test development. It’s a lot of effort, and each IT group needs to determine the level they are willing to go to achieve the best possible test practices. As with the previous topics, however, once the intial set up is complete, following the best practices becomes easy and natural.

Software Application Testing in Insurance, Part III: Test Environment:

Previous posts about testing Topic 1: Automated Testing Topic 2: Getting Test Data Topic 3: Clean, Repeatable Test Data and Test Environment I’ve got a short but important point to add to the previous software application testing topics. A clean test environment makes the difference between valid results and frustrated QA. Running automated or manual tests against the same database over and over cannot guarantee the same results each time. That database changes with each test. I see many insurers who load a database once a week and then have their manual test teams run through a series of scripts every day. When results vary it’s impossible to determine whether it’s a bug or simply because the database contains different data than it did on the previous attempt. It is very important to have a stable, repeatable environment in which to run tests. If a test fails, it’s important to know that the test is failing because of a problem in the code and not because a previous test broke the system. Without a way to easily generate test data and to guarantee a clean environment, different groups that need to test will likely end up clashing in their attempt to utilize the test environment. In a future post I will talk about testing best practices, including more details on creating clean test environments.

Software Application Testing in Insurance, Part II: Getting Test Data

Previous posts about testing Topic 1: Automated Testing Topic 2: Getting Test Data Bad test data can mean that the best tests fail to predict real world problems. While many test topics apply to all industries, insurance carriers face some unique issues when it comes to getting good test data. Due to HIPAA and other industry regulations, utilizing real data for testing is a gray area, as the test team does not necessarily need to be working with real data to do their jobs. It’s a difficult task to take real data and “clean” it for testing. It’s also a difficult task to generate good test data from scratch, though this is really the best solution. An insurer should take the time to have a developer create a small application/utility that generates test data specifically for the application being tested. This utility should be generating random data but follow a set of rules to keep the data within the bounds of reality. It should intentionally create “edge cases” that might stress the system and reveal errors. It should be easily adjusted to create small data sets for simple tests and very large data sets for performance/scalability tests. While it may take a few days to implement this utility, it will save a lot of time later. Instead of struggling for a half a day every time tests need to be run (a common complaint), the work to manage this will be completed up front. Unfortunately, since every software application has different data needs, this kind of utility will likely have to be written separately (or at least significantly rewritten) for each new application that needs to be tested. 90% or more of tests should be run against a very small set of data. Running tests against a huge database is unnecessary, will slow down the tests themselves, and will complicate things. Most tests are meant to verify very specific issues and there is no reason the database needs to contain any more than the bare minimum of data. Only a very small number of tests need to be run against a large database. Many insurers simply copy over their entire real-world database and then run tests against it. This not only creates security issues but makes the job harder for development and quality assurance teams.

Software Application Testing in Insurance, Part I: Automated Testing

In my last post I talked about the vendor proof of concept as a way for insurers to avoid buying “a lemon”. The same risk management needs to apply when an insurer develops software internally, and that’s achieved with strong testing practices. Software testing is a huge topic, and I’m going to split the discussion up over a series of blog posts. There are a lot of great books and articles out there about software testing, but in these posts I’ll try to give it an insurance industry spin. You might run into different issues depending on whether you are building web portals, data-heavy applications, server utilities, or mainframe applications, though in general the same methodologies still apply. Topic 1: Automated Testing From the lowest to highest level, test cases can be broken into the following three categories 1. Unit Tests: Code-based test cases that developers write to test their own code. Typically these are written using developer tools such as the open source JUnit. 2. Automated System Tests: Test scripts that can be run against the entire system, and can be created by developers or test teams. These are typically written with applications that are specifically geared to help write tests for web-based, Windows-based, or Java-based applications. 3. Manual Tests: Test scripts that are manually executed by a test or QA team. The biggest testing issue I see at insurance companies is that there are too many manual test processes and not enough automated testing. Manual tests are important, but they are slow, difficult to reproduce reliably, costly, and aren’t scalable. Plus, it’s much more likely that previously fixed bugs will pop up again without being noticed. With a full suite of Automated System Tests and Unit Tests, large changes can be made to a software application with confidence. Many developers complain about writing unit tests or skip this step, claiming the responsibility lies with the QA team, or saying they will take care of it later (and then never do). The more unit tests written, the easier it gets to write them, and the less time will be spent fixing bugs later. And, remember, the later a bug is discovered and fixed, the costlier it becomes, especially when the bug isn’t found until the system is in production. To increase the automated test footprint on a system built using only manual testing, start requiring new unit tests for all bug fixes going forward. Each time the test group discovers a bug, an Automated System Test or a Unit Test should be written to repeat that bug before the developer fixes the code. Once the code is changed, the team can verify that the bug has been fixed by running the new test case again and seeing that it now works. This also creates a repeatable automated test that prevents the bug from reintroduction.