Failure Testing Cook Book: Avoiding a Recipe for Disaster

Ever wonder what would happen if your favorite website’s database went down? Would anyone care? After all, if the database goes down you’ve got bigger problems. Would you expect to receive a friendly message instead of an ugly error code? If you answered yes, congratulations! You’ve thought about failure testing.

What is failure testing?

Failure testing determines a system’s reliability, and how each component of a system will react when the modules it depends on are not available. There are numerous benefits that can result from performing this type of testing, particularly for “Software as a Service” (SAAS) applications that have 24/7 up-time requirements, including:

  1. Identifying all internal and external system dependencies.
  2. Documenting system-failure points and their effect on the system.
  3. Documenting “Standard Operating Procedures” for monitoring and troubleshooting production issues.
  4. Determining whether or not the system behaves as designed.
  5. Identifying acceptable user experiences when system components go down.

When is the best time to perform failure testing? Test planning needs to happen early in the “Software Development Life Cycle” (SDLC) because identifying failure points often uncovers additional testing needs. Whenever possible, perform failure testing after the application has been functionally tested. This minimizes the chances of running into defects that block failure testing and/or major code changes that would require re-testing.

Working through an example

The following example uses the AddressBook application to walk through the process of identifying test cases and get a sense of what failure testing is all about.

Untitled

In this diagram, the AddressBook application has the following dependencies:

  • Three internal services: AddUpdateDelete, Email, and ImportExport.
  • One external service: Third-Party Email.
  • A database used for storing a contact’s address.

How do all of these pieces work together? Take a look at the table below for a summary of AddressBook functionality and how it utilizes the internal and external services.

AddressBook  Feature Service Calls On Success On Failure
Add a new contact to the address book. AddUpdateDelete Display message in UI indicating: “<Name> has been inserted into your address book.” – Display message in UI indicating: “ was not inserted into your address book. .”- Output failure messages to AddressBook and AddUpdateDelete service log files.
Update an existing contact in the address book. AddUpdateDelete Display message in the UI indicating: “<Name>’s address has been updated.” – Display message in UI indicating: “ address has not been updated. .”- Output failure messages to AddressBook and AddUpdateDelete service logs.
Delete an existing contact from the address book. AddUpdateDelete Display message in UI indicating: “<Name> was removed from your address book.” – Display message in UI indicating: “Unable to remove from your address book.  .”- Output failure messages to AddressBook and AddUpdateDelete service logs.
Send email to existing contact in the address book. Email Display message in UI indicating: “Your email has been delivered to <Name>.” – Display message in UI indicating: “Unable to deliver email to . .”- Output failure messages to AddressBook and Email service logs
Import contacts into address book using user-supplied source and credentials. – ImportExport

– Third-Party

– Email

Display message in UI indicating: “Your contacts have been imported from to your address book. – Display message in UI indicating: “Unable to import contacts from to your address book. .”- Output failure messages to AddressBook and ImportExport service log files.Note:  If there is an issue interacting with the Third-Party Email service. <Reason> noted above highlights that problem.

 

Export contacts from address book to a file. ImportExport Display message in UI indicating: “Your contacts have been exported to: .” – Display message in UI indicating: “Unable to export contacts to: .  .”- Output failure messages to AddressBook and ImportExport service logs.

Identifying test cases

Once we understand how the AddressBook works, we can identify test cases because we know:

  • Each service call is a possible failure point.
  • Which action in our application invokes each service call.

Ok, let’s pinpoint the AddUpdateDelete Service failure scenarios. We want to assess how AddressBook responds when the AddUpdateDelete service is unavailable. We do this by adding, updating, and deleting contacts in our address book since those actions make calls to the AddUpdateDelete service as follows:

  • Write a test script that performs repeated add, update, and delete operations and runs throughout the failure test. Use a tool like JMeter to simulate light load on the AddUpdateDelete service.
  • Get an understanding of the user experience when the service goes down by using the user interface to manually add, update, and delete contacts in the AddressBook.
  • Watch application log files to ensure that the application is writing correct messages when it makes calls to the AddUpdateDelete service.

Taking all of this into account, our test case for this service might look something like this:

Action

Verification

Run AddUpdateDelete test script via JMeter. – Verify script is running successfully.- Verify AddressBook logs contain no errors.- Verify AddUpdateDelete logs contain no errors.
Run manual test that adds, updates, and deletes contacts from the address book. – All actions performed are successful with proper UI success messaging.
Bring AddUpdateDelete service down (script still running). – Verify AddUpdateDelete service has stopped.- Verify AddUpdateDelete log file indicates it was brought down.- Verify AddressBook logs contain proper failure messages indicating AddUpdateDelete service is unavailable.
Evaluate AddUpdateDelete test script results. – You should start to see AddUpdateDelete test script failures in JMeter. Verify proper failure messages are returned.- Verify proper failure messages are output to      AddressBook log file.- Verify that there are no unexpected exceptions in the AddressBook log file.
Perform manual tests: add, update, and delete contacts. – Verify the UI displays the proper failure messages.
Bring AddUpdateDelete service back up. – Verify AddUpdateDelete logs indicate it was successfully started up and contains no errors.- Verify that the AddressBook logs indicate AddUpdateDelete service is available again.- Verify the AddUpdateDelete test script is passing again in JMeter.
Perform manual tests: add, update, and delete contacts. – All actions performed are successful again with proper UI success messaging.

The first test case is complete! Now repeat the same process for each of the other services identified as failure points in the AddressBook application. In most cases, the scenarios will be the same. You just need to make adjustments for the service under test and write a new test script that makes the appropriate service calls.

So, what’s the recipe for success?

No application is immune from downtime. Even big-time players like Google and Amazon have faced outages. In early July 2013, Google services such as YouTubeTM, GmailTM, and Google DriveTM went down for approximately 1 hour. In early August 2013, Amazon’s website had a rare 15-minute outage that cost the online retailer an estimated $2.4 million dollars!

The recipe for success consists of:

  • Thoroughly planning and executing failure testing.
  • Understanding an application’s points of failure.
  • Documenting how to troubleshoot and recover from them.

Has your failure testing uncovered a potential recipe for disaster? Share it in the comment section below.

Comments

  1. Software testing is a set of tools, techniques and methods that assess the excellence and performance of software. Techniques for finding problems in software are widely varied, ranging from the use of wit by the staff who execute the tests to automated tools that help ease the burden and cost of time for this activity. But nothing would know all the software testing techniques, if a program has no documentation, the code is unclear, or have not followed the steps for planning and development. The IEEE dictionary defines testing, and failure test case as follows:

  2. “It is an activity in which a system or one of its components is executed in two or more pre-specified circumstances, the results are observed and recorded and performed an evaluation of some aspect” [IEEE, 1990]. Try, therefore, is the process of executing a program in order to find errors or failures.

Leave a Comment