Let me code again...

Welcome to ijokarumawak's blog :)

NiFi, JoltTransformJSON jolted me

Recently, a question was posted on NiFi user mailing list about how to transform a JSON into a desired format. I tried to answer the question using JoltJsonTransform processor. However it was not an easy process for me to figure out what Jolt spec will do the expected transformation. Eventually, I understood how it works, and feel I should share it.

Sample input and expected JSONs were shared as follows. As you can see, it requires join operation on elements in the root array by the value of ID:

Input JSON:
[
  { "ID": "100",
    "PersonalDetails": [
      { "name": "leo",
        "age": "30",
        "address": "Us" }
    ] },
  { "ID": "100",
    "OfficeDetails": [
      { "Company": "lys",
        "salary": "2000",
        "designation": "manager" }
    ] },
  { "ID": "101",
    "PersonalDetails": [
      { "name": "karo",
        "age": "24",
        "address": "Newyork" }
    ] },
  { "ID": "101",
    "OfficeDetails": [
      { "Company": "lys",
        "salary": "1000",
        "designation": "Operator" }
    ] }
]
Output JSON:
[
  {
    "ID": "100",
    "OfficeDetails": [
      { "Company": "lys",
        "salary": "2000",
        "designation": "manager" }
    ],
    "PersonalDetails": [
      { "name": "leo",
        "age": "30",
        "address": "Us" }
    ] },
  {
    "ID": "101",
    "OfficeDetails": [
      { "Company": "lys",
        "salary": "1000",
        "designation": "Operator" }
    ],
    "PersonalDetails": [
      { "name": "karo",
        "age": "24",
        "address": "Newyork" }
    ] }
]

Here is the Jolt spec I came up with, to do the transformation:

[
  {
    "operation": "shift",
    "spec": {
      "*": {
        "PersonalDetails": "@(1,ID).PersonalDetails",
        "OfficeDetails": "@(1,ID).OfficeDetails"
      }
    }
  },
  {
    "operation": "shift",
    "spec": {
      "*": {
        "$": "[#2].ID",
        "@.OfficeDetails": "[#2].OfficeDetails",
        "@.PersonalDetails": "[#2].PersonalDetails"
      }
    }
  }
]

Let me try to explain how it works. There are two shift operations, the first one does followings:

  1. Jolt will traverse elements in the input object, if there is a matching spec, then apply it to write it into the output JSON
  2. The first * matches each element in the root array
  3. Then PersonalDetails matches PersonalDetails element within the element hit at previous step, and execute the mystical @(1,ID).PersonalDetails. If I explain it with plain English, it may goes like:

    Copy the value of input PersonalDetails element (which is an array) to the output PersonalDetails element, as a child of an element which is one of the root output element, named with the value of ID attribute that exists one level up in the input JSON.

  4. Do the same thing with OfficeDetails as PersonalDetails

Wow, in just a simple instruction like that, so many things are going on! After the first shift operation, the intermediate JSON will look like the left intermediate JSON below. Elements are joined by ID already.

Intermediate output:
{

  "100": {
    "OfficeDetails": [
      { "Company": "lys",
        "designation": "manager",
        "salary": "2000" } ],
    "PersonalDetails": [
      { "address": "Us",
        "age": "30",
        "name": "leo" } ]
  },
  "101": {
    "OfficeDetails": [
      { "Company": "lys",
        "designation": "Operator",
        "salary": "1000" } ],
    "PersonalDetails": [
      { "address": "Newyork",
        "age": "24",
        "name": "karo" } ]
  }

}
Final output:
[ {
    "ID": "100",
    "OfficeDetails": [
      { "Company": "lys",
        "salary": "2000",
        "designation": "manager" }
    ],
    "PersonalDetails": [
      { "name": "leo",
        "age": "30",
        "address": "Us" }
    ] },
  {
    "ID": "101",
    "OfficeDetails": [
      { "Company": "lys",
        "salary": "1000",
        "designation": "Operator" }
    ],
    "PersonalDetails": [
      { "name": "karo",
        "age": "24",
        "address": "Newyork" }
    ] } ]

There are still two differences from the expected output, the root object should be an array, and the ID should be an attribute rather than object key. Now, let’s see how the second shift operation works. The second shift operation looked like this:

{"*": {
    "$": "[#2].ID",
    "@.OfficeDetails": "[#2].OfficeDetails",
    "@.PersonalDetails": "[#2].PersonalDetails"
  }}
  1. The first * matches with each key/value pair in the root object
  2. $ matches with the key (i.e. “100” and “101”), then copy it to [#2].OfficeDetails. What does it mean? Again with plain English:

    Make the output root object an array. Copy the captured key as ‘ID’ attribute in the output object within the output root array.

  3. @.OfficeDetails matches with the OfficeDetails element in the value. Copy it to the output object OfficeDetails.
  4. Do the same thing with PersonalDetails.

I came up with more complex Jolt spec using three shift operations to do the same thing at first (you can see it in the ML thread), but it can be this simple in the end.

There are so many examples at the bottom of Jolt Transform Demo web page, and you should definitely check it before writing complex Jolt spec.

Hope this helps, see you next time!