JSON Schema Validation in Ruby
You expect input from the outside world in JSON format and want to make sure it has all the properties you expect? Great, say hello to JSON Schema. In this post I’ll go over defining a schema and using it to validate JSON input in a Ruby program using the json_schemer gem.
Modeling the Input
Validating input is rarely done for its own sake, but in order to check that the input fits to some kind of model. This model can specify presence or absence of attributes, allowed values, or integrity constraints across objects.
As example, let’s model events for a calendar. Events should have a title and a date. With this we can look into a calendar and quickly see what happens and when it happens. Further, we can add a longer description, and a time. If the event has a time, it can also have an end time, but the end time is not allowed if the (start) time is not set.
(You could also model this differently, e.g. not require a title, but instead a location, or not use an end time but a duration attribute.)
For example, the objects below both should be valid. The left one with minimal number of attributes, the right one with all attributes:
{
  "title": "Christmas",
  "date": "2019-12-24"
}
{
  "title": "Christmas Dinner",
  "date": "2019-12-24",
  "description": "We meet at my Mom's ↩
            house and enjoy the food.",
  "time": "18:00",
  "end-time": "23:00"
}
On the other hand, the following documents should not be accepted. The left document has two errors, as it lacks the title-attribute, and the date is a not in a date format. The right document’s attributes are fine by themselves, however the end-time is not allowed without a time.
{
  "address": "Carthage",
  "date": "Friday"
}
{
  "title": "Christmas Dinner",
  "date": "2019-12-24",
  "end-time": "23:00"
}
Expressing as JSON Schema
A JSON Schema description is an object that basically looks like this:
{
  "$id": "https://example.com/calendar-event.schema.json",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Calendar Event",
  "type": "object",
  "properties": { ... }
}
The $id-attribute contains a unique identifier of that schema, $schema indicates the variant of the schema that is followed. With title we can provide a readable title for the thing we actually want to model, and type is, well, the type of the thing. Finally, properties contains a description of the attributes of the object.
The properties’ keys are the attribute names of the described object, and the values describe how the attribute values should look like. The title attribute is of type string. Also, we don’t want empty titles, which we force by setting the minimal length to 1. For including documentation, we also add a description; and voila:
  "title": {
    "type": "string",
    "minLength": 1,
    "description": "The title of this event."
  }
For the date, the object should also contain a string. However, we can leverage the format property defined by JSON Schema, which allows only dates in the form of YYYY-MM-DD and not arbitrary strings:
  "date": {
    "type": "string",
    "format": "date",
    "description": "The date of this event."
  }
As the description is optional, we just define a type for it.
  "description": {
    "type": "string",
    "description": "A description of the event"
  }
The time should look like HH:MM. There is also a format for time, but this includes seconds and timezones, which is too much for this example. Another option, is to define a regular expression for that, that the string should match. This is done with pattern:
  "time": {
    "type": "string",
    "pattern": "^([0-9]|[01][0-9]|2[0-3]):[0-5][0-9]$",
    "description": "When the event starts."
  }
And "end-time" works the same.
Now we defined the properties of an event object. This means, an object that contains, e.g., "time": "midnight", is not a valid event. If the title attribute would be missing, it is not covered yet. For this, we additionally need to list the required attributes:
  "required": ["title", "date"]
Last thing we wanted, is that the end-time is allowed only if there is a time. This is done using a dependency. Dependencies are defined as “if this attribute is present, also this list of other attributes needs to be defined”:
  "dependencies": {
    "end-time": ["time"]
  }
Using the Schema with Ruby and json_schemer
We just feed the raw JSON Schema to JSONSchemer.schema which is the entry point for the json_schemer-gem:
require "json_schemer"
schema = JSONSchemer.schema(File.read("ex1_schema.json"))
Testing, if a schema is valid is done with the valid? method:
valid_doc = JSON.parse '{
  "title": "Christmas",
  "date": "2019-12-24"
}'
schema.valid? valid_doc
# => true
invalid_doc = JSON.parse '{
  "time": "20:00"
}'
schema.valid? invalid_doc
# => false
If you also want to tell the your user, what part of the schema is faulty, you can use the validate method. This returns an enumerator that contains all errors. Each object in the enumerator contains a data_pointer that indicates where the error is and a type that says what went wrong.
unless schema.valid?(document)
  puts "================================="
  puts "Document not valid:"
  schema.validate(document).each do |v|
      puts "- error type: #{v["type"]}"
      puts "  data: #{v['data']}"
      puts "  path: #{v["data_pointer"]}"
  end
end
For the first invalid example from above, this snippet outputs
Document not valid:
- error type: required
  data: {"address"=>"Carthage", "date"=>"Friday"}
  path:
- error type: format
  data: Friday
  path: /date
The first error comes from the requirement of the title attribute to be present. As this is validated on the entire object, it appears in the data field of the error, and the path is an empty string (pointing to the root of the object).
The second error is, because the string found in the /date path with content Friday does not match the desired format.
To provide nicer errors to our users we can format them like this:
def nice_error verr
  case verr["type"]
  when "required"
    "Path '#{verr["data_pointer"]}' is missing keys: #{verr["details"]["missing_keys"].join ', '}"
  when "format"
    "Path '#{verr["data_pointer"]}' is not in required format (#{verr["schema"]["format"]})"
  when "minLength"
    "Path '#{verr["data_pointer"]}' is not long enough (min #{verr["schema"]["minLength"]})"
  else
    "There is a problem with path '#{verr["data_pointer"]}'. Please check your input."
  end
end
# validate a series of objects
(valid + invalid).each do |d|
  unless schema.valid?(d)
    puts "Document not valid:"
    schema.validate(d).each do |v|
      puts "- #{nice_error v}"
    end
  end
end
Conclusion
This was a quick start for when you want to make sure that the JSON documents you are reading fulfill certain properties. You can look at the schema, Ruby code, and valid examples 1 and 2 as well as invalid examples 1, 2, 3, and 4.
If you want to explore more of JSON Schema’s possibilities, look at this page. If you are not a Ruby person, the JSON Schema website has an extensive list of libraries in other language.