Advanced JSON manipulation in PostgreSQL
————————————- DO !!!!!!! NOT !!!!!! REVIEW THIS —————————————————
This contains the second part of the blog… still to be written, probably it’ll go in a new blog
Tabulate the data
In all the above queries, we extracted each field, one by one. But we could do a mass extraction using the jsonb_to_record syntax below:
select x.id, ev.title
from
restaurant_data
cross join lateral jsonb_to_record(jsonb_data)
as x(
id int,
name text,
has_table_booking boolean,
user_rating jsonb,
currency text,
event jsonb
)
cross join lateral jsonb_array_elements(jsonb_data -> 'zomato_events') ze
cross join lateral jsonb_to_record(ze -> 'event')
as ev(
display_date text,
date_added timestamp,
start_date date,
photos jsonb,
description text,
title text
)
limit 2;
All we had to do is to define the subitems we wanted to extract and related data type. The below is the results:
id | name | has_table_booking | user_rating | currency
----------+----------------------------+-------------------+----------------------------------------------------------------------------------------------------+----------
308322 | Hauz Khas Social | t | {"votes": "7931", "rating_text": "Very Good", "rating_color": "5BA829", "aggregate_rating": "4.3"} | Rs.
18037817 | Qubitos - The Terrace Cafe | t | {"votes": "778", "rating_text": "Excellent", "rating_color": "3F7E00", "aggregate_rating": "4.5"} | Rs.
(2 rows)
Edit the JSON documents
Analysing the data is only part of the journey, we might want to edit the JSON documents. As example, let’s review the types of cuisine:
select
jsonb_data ->> 'id' id,
jsonb_data ->> 'name' name,
jsonb_data ->> 'cuisines' cuisines
from restaurant_data
limit 5;
The following is the result… noticing anything suspicious?
id | name | cuisines
---------+----------------------------+------------------------------------------------------
308322 | Hauz Khas Social | Continental, American, Asian, North Indian
18037817 | Qubitos - The Terrace Cafe | Thai, European, Mexican, North Indian, Chinese, Cafe
312345 | The Hudson Cafe | Cafe, Italian, Continental, Chinese
307490 | Summer House Cafe | Italian, Continental
18241537 | 38 Barracks | North Indian, Italian, Asian, American
(5 rows)
What? The cuisines is just a concatenated list and not an array? let’s fix that! Extract as array with the string_to_array function:
select
jsonb_data ->> 'id' id,
jsonb_data ->> 'name' name,
string_to_array(jsonb_data ->> 'cuisines', ', ') cuisines
from restaurant_data
limit 5
;
The string_to_array function divides the data contained in the jsonb_data ->> 'cuisines' item using the , divisor.
id | name | cuisines
---------+----------------------------+-----------------------------------------------------
308322 | Hauz Khas Social | {Continental,American,Asian,"North Indian"}
18037817 | Qubitos - The Terrace Cafe | {Thai,European,Mexican,"North Indian",Chinese,Cafe}
312345 | The Hudson Cafe | {Cafe,Italian,Continental,Chinese}
307490 | Summer House Cafe | {Italian,Continental}
18241537 | 38 Barracks | {"North Indian",Italian,Asian,American}
(5 rows)
We can update our data in place with the following query using the jsonb_set function:
update restaurant_data
set jsonb_data = jsonb_set(jsonb_data, '{cuisines}', to_jsonb(string_to_array(jsonb_data ->> 'cuisines', ', ')));
The jsonb_set function replaces the {cuisines} item within jsonb_data with the array created after dividing the concatenated list with string_to_array.
If we check again the data after the update:
select
jsonb_data ->> 'id' id,
jsonb_data ->> 'name' name,
jsonb_data -> 'cuisines' cuisines
from restaurant_data
limit 5
;
We can notice that now cuisines is properly stored as array
id | name | cuisines
---------+----------------------------------+-----------------------------------------------------------------------
308322 | Hauz Khas Social | ["Continental", "American", "Asian", "North Indian"]
18037817 | Qubitos - The Terrace Cafe | ["Thai", "European", "Mexican", "North Indian", "Chinese", "Cafe"]
307490 | Summer House Cafe | ["Italian", "Continental"]
303960 | Manhattan Brewery & Bar Exchange | ["Finger Food", "American", "Continental", "North Indian", "Italian"]
18279449 | HotMess | ["North Indian", "Mediterranean", "Asian", "Fast Food"]
(5 rows)
Since now we have an array, we can use the array functions like ?&, to filter only restaurants combining all the cuisines we might want to find, as examples Mexican, Thai, North Indian.
select
jsonb_data ->> 'id' id,
jsonb_data ->> 'name' name,
jsonb_data ->> 'cuisines' cuisines
from restaurant_data
where jsonb_data -> 'cuisines' ?& '{Mexican, Thai, "North Indian"}'
limit 5
;
The ?& will check if in the cuisines item we have values matchin all the choices {Mexican, Thai, "North Indian"} in our array. The results:
id | name | cuisines
---------+----------------------------+----------------------------------------------------------------------------
18037817 | Qubitos - The Terrace Cafe | ["Thai", "European", "Mexican", "North Indian", "Chinese", "Cafe"]
70092 | Kaidi Kitchen | ["Italian", "Mexican", "Chinese", "Thai", "North Indian"]
101884 | Mamu's Infusion | ["Italian", "Chinese", "Mexican", "Thai", "North Indian"]
111895 | 650 - The Global Kitchen | ["Chinese", "Italian", "North Indian", "Mexican", "Mediterranean", "Thai"]
3200034 | Tomato's | ["North Indian", "Mughlai", "Mexican", "Thai"]
(5 rows)
Perform major editing, add/remove columns
Sometimes a substitution is not enough, and we might want to perform a major editing in the JSON structure. As example we might want to move away from latitude and longitude and include a field lat_long containing an array with both values. The following query does exactly that:
select jsonb_pretty(
(jsonb_data -> 'location')::jsonb - '{latitude, longitude}'::text[] ||
jsonb_build_object(
'lat_long',
ARRAY[
(jsonb_data -> 'location' ->> 'latitude'),
(jsonb_data -> 'location' ->> 'longitude')
]
)
)
from restaurant_data
limit 1;
The - operator removes from jsonb_data -> 'location' the latitude and longitude items passed as array of text ('{latitude, longitude}'::text[]). The pipe || operator allows us to add to the JSONB document, with the jsonb_build_object building a JSONB item with lat_long as field name and the array composed by latitude and longitude as value. The result is in line with what we wanted to achieve.
jsonb_pretty
----------------------------------------------------------
{ +
"city": "New Delhi", +
"address": "9-A & 12, Hauz Khas Village, New Delhi",+
"city_id": 1, +
"zipcode": "", +
"lat_long": [ +
"28.5542851000", +
"77.1944706000" +
], +
"locality": "Hauz Khas Village", +
"country_id": 1, +
"locality_verbose": "Hauz Khas Village, New Delhi" +
}
(1 row)