Advanced JSON manipulation in PostgreSQL

————————————- DO !!!!!!! NOT !!!!!! REVIEW THIS —————————————————

This contains the second part of the blog… still to be written, probably it’ll go in a new blog

Tabulate the data

In all the above queries, we extracted each field, one by one. But we could do a mass extraction using the jsonb_to_record syntax below:

select x.id, ev.title
from
    restaurant_data
    cross join lateral jsonb_to_record(jsonb_data)
        as x(
            id int,
            name text,
            has_table_booking boolean,
            user_rating jsonb,
            currency text,
            event jsonb
            )
    cross join lateral jsonb_array_elements(jsonb_data -> 'zomato_events') ze
    cross join lateral jsonb_to_record(ze -> 'event')
        as ev(
            display_date text,
            date_added timestamp,
            start_date date,
            photos jsonb,
            description text,
            title text
            )
limit 2;

All we had to do is to define the subitems we wanted to extract and related data type. The below is the results:

    id    |            name            | has_table_booking |                                            user_rating                                             | currency
----------+----------------------------+-------------------+----------------------------------------------------------------------------------------------------+----------
   308322 | Hauz Khas Social           | t                 | {"votes": "7931", "rating_text": "Very Good", "rating_color": "5BA829", "aggregate_rating": "4.3"} | Rs.
 18037817 | Qubitos - The Terrace Cafe | t                 | {"votes": "778", "rating_text": "Excellent", "rating_color": "3F7E00", "aggregate_rating": "4.5"}  | Rs.
(2 rows)

Edit the JSON documents

Analysing the data is only part of the journey, we might want to edit the JSON documents. As example, let’s review the types of cuisine:

select
    jsonb_data ->> 'id' id,
    jsonb_data ->> 'name' name,
    jsonb_data ->> 'cuisines' cuisines
from restaurant_data
limit 5;

The following is the result… noticing anything suspicious?

   id    |            name            |                       cuisines
---------+----------------------------+------------------------------------------------------
308322   | Hauz Khas Social           | Continental, American, Asian, North Indian
18037817 | Qubitos - The Terrace Cafe | Thai, European, Mexican, North Indian, Chinese, Cafe
312345   | The Hudson Cafe            | Cafe, Italian, Continental, Chinese
307490   | Summer House Cafe          | Italian, Continental
18241537 | 38 Barracks                | North Indian, Italian, Asian, American
(5 rows)

What? The cuisines is just a concatenated list and not an array? let’s fix that! Extract as array with the string_to_array function:

select
    jsonb_data ->> 'id' id,
    jsonb_data ->> 'name' name,
    string_to_array(jsonb_data ->> 'cuisines', ', ') cuisines
from restaurant_data
limit 5
;

The string_to_array function divides the data contained in the jsonb_data ->> 'cuisines' item using the , divisor.

   id    |            name            |                      cuisines
---------+----------------------------+-----------------------------------------------------
308322   | Hauz Khas Social           | {Continental,American,Asian,"North Indian"}
18037817 | Qubitos - The Terrace Cafe | {Thai,European,Mexican,"North Indian",Chinese,Cafe}
312345   | The Hudson Cafe            | {Cafe,Italian,Continental,Chinese}
307490   | Summer House Cafe          | {Italian,Continental}
18241537 | 38 Barracks                | {"North Indian",Italian,Asian,American}
(5 rows)

We can update our data in place with the following query using the jsonb_set function:

update restaurant_data
set jsonb_data = jsonb_set(jsonb_data, '{cuisines}', to_jsonb(string_to_array(jsonb_data ->> 'cuisines', ', ')));

The jsonb_set function replaces the {cuisines} item within jsonb_data with the array created after dividing the concatenated list with string_to_array. If we check again the data after the update:

select
    jsonb_data ->> 'id' id,
    jsonb_data ->> 'name' name,
    jsonb_data -> 'cuisines' cuisines
from restaurant_data
limit 5
;

We can notice that now cuisines is properly stored as array

   id    |               name               |                               cuisines
---------+----------------------------------+-----------------------------------------------------------------------
308322   | Hauz Khas Social                 | ["Continental", "American", "Asian", "North Indian"]
18037817 | Qubitos - The Terrace Cafe       | ["Thai", "European", "Mexican", "North Indian", "Chinese", "Cafe"]
307490   | Summer House Cafe                | ["Italian", "Continental"]
303960   | Manhattan Brewery & Bar Exchange | ["Finger Food", "American", "Continental", "North Indian", "Italian"]
18279449 | HotMess                          | ["North Indian", "Mediterranean", "Asian", "Fast Food"]
(5 rows)

Since now we have an array, we can use the array functions like ?&, to filter only restaurants combining all the cuisines we might want to find, as examples Mexican, Thai, North Indian.

select
    jsonb_data ->> 'id' id,
    jsonb_data ->> 'name' name,
    jsonb_data ->> 'cuisines' cuisines
from restaurant_data
where jsonb_data -> 'cuisines' ?& '{Mexican, Thai, "North Indian"}'
limit 5
;

The ?& will check if in the cuisines item we have values matchin all the choices {Mexican, Thai, "North Indian"} in our array. The results:

   id    |            name            |                                  cuisines
---------+----------------------------+----------------------------------------------------------------------------
18037817 | Qubitos - The Terrace Cafe | ["Thai", "European", "Mexican", "North Indian", "Chinese", "Cafe"]
70092    | Kaidi Kitchen              | ["Italian", "Mexican", "Chinese", "Thai", "North Indian"]
101884   | Mamu's Infusion            | ["Italian", "Chinese", "Mexican", "Thai", "North Indian"]
111895   | 650 - The Global Kitchen   | ["Chinese", "Italian", "North Indian", "Mexican", "Mediterranean", "Thai"]
3200034  | Tomato's                   | ["North Indian", "Mughlai", "Mexican", "Thai"]
(5 rows)

Perform major editing, add/remove columns

Sometimes a substitution is not enough, and we might want to perform a major editing in the JSON structure. As example we might want to move away from latitude and longitude and include a field lat_long containing an array with both values. The following query does exactly that:

select jsonb_pretty(
    (jsonb_data -> 'location')::jsonb - '{latitude, longitude}'::text[] ||
    jsonb_build_object(
        'lat_long',
        ARRAY[
            (jsonb_data -> 'location' ->> 'latitude'),
            (jsonb_data -> 'location' ->> 'longitude')
            ]
        )
    )
from restaurant_data
limit 1;

The - operator removes from jsonb_data -> 'location' the latitude and longitude items passed as array of text ('{latitude, longitude}'::text[]). The pipe || operator allows us to add to the JSONB document, with the jsonb_build_object building a JSONB item with lat_long as field name and the array composed by latitude and longitude as value. The result is in line with what we wanted to achieve.

jsonb_pretty
----------------------------------------------------------
{                                                       +
    "city": "New Delhi",                                +
    "address": "9-A & 12, Hauz Khas Village, New Delhi",+
    "city_id": 1,                                       +
    "zipcode": "",                                      +
    "lat_long": [                                       +
        "28.5542851000",                                +
        "77.1944706000"                                 +
    ],                                                  +
    "locality": "Hauz Khas Village",                    +
    "country_id": 1,                                    +
    "locality_verbose": "Hauz Khas Village, New Delhi"  +
}
(1 row)