Using JSON Schema to Validate Web Service Requests

json160

Validating input is one of those programming necessities that we sometimes like to put off.  In fact it is much easier to take on faith that the input is correct than to spend many lines-of-code trying to guard against all the ways the input can be wrong. This leap-of-faith approach may make coding faster and easier, but it does not make for robust software.

Several years ago a co-worker showed me that it is far better to practice “safe-programming” by guarding against bad input. In the long term this approach works out better for both customers and the developers, who must maintain the code.

This is especially true when we look at writing web services. Web services have to deal with incoming data from a POST, PUT or PATCH request that are usually in the popular JavaScript Object Notation (JSON) format.

JSON validation approaches

Let’s look at a couple of approaches one can take to validate the incoming JSON. Not all of these are as effective as one would hope:

  1. Create a key/value pair object map using the JSON input.
    This approach uses a Jackson ObjectMapper, and each key/value pair is validated individually. This works well if the input is a flat list of key/value pairs that are all more or less of the same type. However, if the incoming JSON has a complex, nested hierarchy or you have many different data types to validate, then this approach can be cumbersome and inefficient.
  1. Create a data-transfer object (DTO) or a domain object from the JSON input.
    This approach also uses an ObjectMapper, to which you then apply code to validate that the object is correct. The main drawback to this approach is that it generates a rather severe failure when the JSON–to–object mapping fails. For example, if an enumeration (enum) in the incoming payload is wrong and won’t map to an object enum property, the result is an internal server error failure and a broken validation! I’ve seen this approach work if the input is subtly wrong, but if the input is wildly incorrect the results can be severe when the code fails.
  1. Create custom code for validation.
    This approach takes the JSON input as a string which is parsed using custom code you have built for validation. While this approach is definitely plausible, creating your own custom JSON parsing and validation routines are complex and typically self-defeating. This is especially true if the input format contract changes for that web service. This approach can be so problematic that you should stay as far away as possible from doing anything like this in practice.

Validating JSON using JSON schema

A better approach to validating JSON input is the use of a JSON schema. Similar to the XML schema, which is written in pure XML format for validating XML, JSON schema is written in pure JSON format for validating JSON. In fact, if you have ever written an XML schema, then JSON schema should be quite familiar to you.

So what exactly is JSON schema?  JSON schema is:

  • A way of describing your existing data format
  • Written in clear, human and machine readable, documentation
  • Complete structural validation useful for
    • Automated testing
    • Validating client submitted data

The JSON schema specification

Before diving into a tutorial and exploring JSON schema syntax, it is worth describing the current state of the JSON schema specification. The JSON schema spec is currently in its 4th draft version, having undergone 3 prior revisions.  The spec is not in a final form, but there are many language implementations available that support the spec in its 4th draft revision.

JSON schema validators

There are almost 30 validator implementations written for Java, Ruby, C, C++, Python, JavaScript Perl and other languages. At Constant Contact we are mostly a Java shop, and the validator from Francis Galliegue (aka fge) works quite well for us. It is popular with Java developers, and is available on GitHub here.

To interactively execute any of the tutorial examples below, I recommend using this validator, which implements the fge/json-schema-validator code from GitHub.

You’ll find that using this validator makes playing with a schema and trying out various validation scenarios quite simple. It eliminates the need to write application code that loads the validator and then executes the schema against the input. This really simplifies the entire process of trying out a schema.

Let’s build a schema

Let’s first look at a simple example validating “Hello, World” using a JSON schema.

Example 1

JSON input

The data that we want to validate is:

“Hello, World”

JSON Schema

This schema will do the job:

{
    "type" : "string"
}

How to test the schema

Now use the JSON schema validator at http://json-schema-validator.herokuapp.com/ to test the schema.

  1. Paste the JSON input into the data panel.
  2. Pasting the schema into the schema panel and clicking VALIDATE will yield no error messages in the Validation results panel and a success message.

This schema validates a string and rejects anything other than a string. To narrow down what gets validated even more, let’s add schema constraints to the string type for minimum and maximum length:

{
    "type" : "string", "minLength": 12, "maxLength": 12
}

This schema invalidates all strings shorter than 12 characters and longer than 12 characters (there are 12 characters in the data set). By adding a pattern constraint for type string we can use a regular expression to validate the string matches exactly:

{
    "type" : "string", "pattern": "^Hello, World$"
}

Schema to validate an email address

We don’t need to create a complex regular expression (regexp) to validate an email address such as in the following JSON input

{
    "email_address": "somebody@example.com"
}
 

Instead, we can just call upon the format constraint on the string type to solve the problem in a snap.

{
    "type": "object",
    "properties":  {
    "email_address":  { "type": "string", "format": "email"}
},
    "required": [ "email_address" ],
    "additionalProperties": false
}

Note that we’ve introduced a couple of new schema keywords here:

  • object – indicates a key/value list of properties, in this case “email_address” being the key and “somebody@example.com” being the property value.
  • required – indicates any required object properties.
  • additionalProperties – indicates whether or not other properties are allowed to be specified for that object .

Using the string type in addition to the email format for validating email address strings allows format specification for:

  • date-time – Date representation, as defined by RFC 3339, section 5.6
  • hostname – Internet host name, see RFC 1034, section 3.1
  • ipv4 – IPv4 address, according to dotted-quad ABNF syntax as defined in RFC 2673, section 3.2)
  • ipv6
  • uri (A universal resource identifier (URI), according to RFC 3986)

In addition to the string type, JSON schema supports the following data types natively:

  • integer
  • number
  • boolean
  • null
  • array
  • object

Developers have the freedom to create arbitrarily complex custom data types by using composition of the native types. This provides great power and flexibility in constructing a rock-solid contract for incoming JSON web service requests. Instead of writing voluminous application code to validate the request, we specify the validation rules in clear, human readable JSON schema that also doubles as documentation for the web service data contract.

Example 2

Consider the following, more complex example:

Data input
With the following input we have a somewhat contrived military service record:

{
    "first_name": "John",
    "last_name": "Doe",
    "rank": "corporal",
    "serial_number": "5AB78-TR-9",
    "age": 23,
    "date_of_birth": "1981-12-07T00:00:00.000Z"
}

Using the following JSON schema:

{
    "type": "object", "properties":  {
    "first_name":  { "type": "string", "minLength": 2,"maxLength": 20 },
    "last_name":  { "type": "string", "minLength": 2, "maxLength": 20 },
    "rank":  {   "enum": [ "private", "corporal", "sergeant", "lieutenant", "captain", "major", "colonel", "general" ]   },
    "serial_number": { "type":  "string", "pattern":  "[0-9][A-Z][A-Z][0-9][0-9]-[A-Z][A-Z]-[0-9]" },
    "age": { "type":  "integer",  "minimum": 18, "maximum": 65 },
    "date_of_birth":  { "type": "string",  "format": "date-time"}
},
    "required": [ "first_name",  "last_name", "rank", "serial_number" ],
    "additionalProperties": false
}

Popping the above into the heroku validator  shows that the input data is valid.  The schema uses:

  • the integer keyword for age
  • an enum keyword that validates against a list of possible values.

Try experimenting with both the input and the schema itself. If you get into syntactic trouble with either and want to know what the problem is, use the JSON lint web page to validate the JSON formatting.

We’ve only scratched the surface of what is possible using JSON schema to validate JSON requests. 

There are several excellent tutorials available to get you more acquainted with these concepts. A good starting point is the excellent guide given here:

Have any additional questions? Ask us in the comments!

Comments

  1. Darren Broderick says:

    Hey, great article, is it possible to schema validate duplicate key values? Eg. Someone puts 2 firstName elements into the json payload.

Leave a Comment