it’s a universal truth that one of the primary jobs of backend developers is drinking from the firehose of json the frontends are constantly spraying at us. normally, we pick apart all those key/value pairs and slot them into the neatly-arranged columns of our db, keeping our database structure nice and normal. but sometimes there are good reasons to just store a whole json object or array as an undifferentiated blob.
in the dark, old days, this json would go in a text
column, which was fine if all we needed to do was store it and return it. but if we needed to do something more complex like, extract certain values from that json or, god forbid, use one in a WHERE
clause, things could get out of hand pretty quickly.
fortunately, modern mysql supports JSON
as a native data type and offers a whole host of useful json functions, allowing us to work with json data in a way that’s almost pleasurable. in this post we’re going to go over extracting values from json and using json data in WHERE
clauses.
creating a JSON
column
let’s start by creating a table with a column of type JSON
.
CREATE TABLE `some_data` ( `id` int unsigned NOT NULL AUTO_INCREMENT, `name` varchar(200) NOT NULL, `some_json` json NULL, `some_text` text NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
the test data we’re going to be using is the json representation of bratmobile’s ‘kiss and ride’ single.
{ "artist": "Bratmobile", "title": "Kiss and Ride", "format": "single", "year": 1992, "label": "Kill Rock Stars", "tracks": { "A": [ { "title": "Kiss and Ride", "duration_seconds": 88 } ], "B": [ { "title": "No You Don't", "duration_seconds": 105 }, { "title": "Queenie", "duration_seconds": 79 } ] } }
we’ll insert that json into both a native JSON
type column as well as into a TEXT
column. we’ll come back to that TEXT
column later.
INSERT INTO some_data VALUES( null, 'some name', -- as json '{ "artist": "Bratmobile", "title": "Kiss and Ride", "format": "single", "year": 1992, "label": "Kill Rock Stars", "tracks": { "A": [ { "title": "Kiss and Ride", "duration_seconds": 88 } ], "B": [ { "title": "No You Don\'t", "duration_seconds": 105 }, { "title": "Queenie", "duration_seconds": 79 } ] } }', -- as string '{ "artist": "Bratmobile", "title": "Kiss and Ride", "format": "single", "year": 1992, "label": "Kill Rock Stars", "tracks": { "A": [ { "title": "Kiss and Ride", "duration_seconds": 88 } ], "B": [ { "title": "No You Don\'t", "duration_seconds": 105 }, { "title": "Queenie", "duration_seconds": 79 } ] } }' );
extracting some data from that json
now that we have some json in our JSON
column, we can work on selecting out individual values. we do this with mysql’s JSON_EXTRACT
function.
JSON_EXTRACT
takes two arguments: the column and the path of the element you want to select. paths are separated by a dot, kind of the same way directory paths are separated by a /
, and the top level of the path is denoted by $
. if you’ve ever struggled with jq
, you should feel comfortable struggling with this.
in this example, we select the artist
element from the top level of our json object.
SELECT JSON_EXTRACT(some_json, '$.artist') as artist FROM some_data WHERE id = 1; +--------------+ | artist | +--------------+ | "Bratmobile" | +--------------+
mysql also provides the ->
operator as a shorthand for JSON_EXTRACT
to help make our queries a little cleaner.
SELECT some_json -> '$.artist' as artist FROM some_data WHERE id = 1; +--------------+ | artist | +--------------+ | "Bratmobile" | +--------------+
an annoyance we notice immediately with these results is that they’re quote-enclosed. we don’t want this. nobody wants this. let’s remove them with the function JSON_UNQUOTE
SELECT JSON_UNQUOTE(JSON_EXTRACT(some_json, '$.artist')) as artist FROM some_data WHERE id = 1; +------------+ | artist | +------------+ | Bratmobile | +------------+
the shorthand ->
operator we used before can be modified to ->>
to automatically includ JSON_UNQUOTE
. using this operator is the same as enclosing a call to JSON_EXTRACT
in JSON_UNQUOTE
:
SELECT some_json ->> '$.artist' as artist FROM some_data WHERE id = 1; +------------+ | artist | +------------+ | Bratmobile | +------------+
dealing with arrays
getting single elements from the top level is great, but we’re also going to want to deal with arrays.
we can reference arrays in the path we pass to JSON_EXTRACT
by using an index inside square brackets. this is pretty familiar stuff. for instance, in our sample data we have an object called tracks
that contains two arrays keyed as A
and B
; these are the songs on side ‘a’ and ‘b’. those A
and B
elements are arrays of song objects. if we wanted to get the first song of side B
, we would construct our JSON_EXTRACT
call like so:
SELECT JSON_EXTRACT(some_json, '$.tracks.B[0].title') as side_b_song_one FROM some_data WHERE id = 1; +-----------------+ | side_b_song_one | +-----------------+ | "No You Don't" | +-----------------+
looking at the path argument, we can see we start at the top of the hierarchy with $
, then reference the tracks
object. next is the array B
. we indicate we want the first element of this array with [0]
, then continue our path by giving the key title
. the result is the title of the first song on side b.
if we want all the songs on side b, we can replace that index with the wildcard *
. this returns a json array of titles.
SELECT JSON_EXTRACT(some_json, '$.tracks.B[*].title') as side_b FROM some_data WHERE id = 1; +-----------------------------+ | side_b | +-----------------------------+ | ["No You Don't", "Queenie"] | +-----------------------------+
we could also use the **
wildcard to do this job.
SELECT JSON_EXTRACT(some_json, '$.tracks.B**.title') as side_b FROM some_data WHERE id = 1; +-----------------------------+ | side_b | +-----------------------------+ | ["No You Don't", "Queenie"] | +-----------------------------+
it’s important to note that **
actually means all the paths between the prefix and the suffix. this means that we can use it to get all of the title
elements under tracks
, regardless of whether there in the array keyed A
or B
like so:
SELECT JSON_EXTRACT(some_json, '$.tracks**.title') as side_a_and_side_be FROM some_data WHERE id = 1; +----------------------------------------------+ | side_a_and_side_be | +----------------------------------------------+ | ["Kiss and Ride", "No You Don't", "Queenie"] | +----------------------------------------------+
in that example, **
matches both A
and B
. very powerful stuff.
we can also use ranges when defining our array indexes. for instance, if we wanted to get all the title
s on side B
between position zero and one, we could use the index [0 to 1]
:
SELECT JSON_EXTRACT(some_json, '$.tracks.B[0 to 1].title') as side_b_first_two FROM some_data WHERE id = 1; +-----------------------------+ | side_b_first_two | +-----------------------------+ | ["No You Don't", "Queenie"] | +-----------------------------+and if all we want is the last element of an array, mysql allows us to use the literal word
last
:SELECT JSON_EXTRACT(some_json, '$.tracks.B[last].title') as side_b_last_track FROM some_data WHERE id = 1; +-------------------+ | side_b_last_track | +-------------------+ | "Queenie" |
using this in WHERE
clauses
extracting data from json is useful but, really, we could just select the whole column and run the data through json_decode
in our controller. the real money is using this in WHERE
clauses. let’s look:SELECT JSON_EXTRACT(some_json, '$.title') as title FROM some_data WHERE JSON_EXTRACT(some_json, '$.year') = 1992; +-----------------+ | title | +-----------------+ | "Kiss and Ride" | +-----------------+
here we just extracted year
from our json column and used it in a WHERE
clause.
a kludge to simulate WHERE IN
clauses
if we want to write a WHERE IN
clause against an array in our json object, things get a bit trickier. our initial ideal might be to simply try using IN
on an array extracted from our json:
-- this absolutely DOES NOT WORK SELECT JSON_EXTRACT(some_json, '$.title') as title FROM some_data WHERE "No You Don't" IN JSON_EXTRACT(some_json, '$.tracks**.title');
no dice. the problem here, of course, is that WHERE IN
expects a list of values, not a json array. there’s a type mismatch.
the solution is to use MEMBER OF
.
this construct behaves kind of like a function and kind of like an operator. it takes it’s left value and tests if it resides in the json array that is passed to it as an argument.
SELECT JSON_UNQUOTE(JSON_EXTRACT(some_json, '$.artist')) as artist, JSON_UNQUOTE(JSON_EXTRACT(some_json, '$.title')) as title FROM some_data WHERE "Queenie" MEMBER OF(CAST(JSON_EXTRACT(some_json, '$.tracks**.title') AS json)); +------------+---------------+ | artist | title | +------------+---------------+ | Bratmobile | Kiss and Ride | +------------+---------------+
the thing to note here is the argument to MEMBER OF
. we have extracted all the title
values from our json columns and mashed them together into one array. that ‘array’ isn’t really an array, though; it’s a string. and not even a json string, just a string string.
since MEMBER OF
requires json-style data as an argument, we need to call CAST
to turn the string of our array into an actual json array.
this solution is a bit messy, and if i am kludging up the wrong tree here and there is a better, more elegant way to go about this, i am very much open to input. however, this construct does work.
wait. this works on TEXT
columns too?
the secret is that all of the stuff we’ve covered here also works on TEXT
columns. if we run that WHERE IN
simulation select against our some_text
column, it works just fine:
SELECT JSON_UNQUOTE(JSON_EXTRACT(some_json, '$.artist')) as artist, JSON_UNQUOTE(JSON_EXTRACT(some_json, '$.title')) as title FROM some_data WHERE "Queenie" MEMBER OF(CAST(JSON_EXTRACT(some_text, '$.tracks**.title') AS json)); +------------+---------------+ | artist | title | +------------+---------------+ | Bratmobile | Kiss and Ride | +------------+---------------+
so why bother with the JSON
column type at all? well, most importantly, when we use the JSON
type we get automatic validation on insert or update. we can put invalid json into TEXT
columns and mysql will never complain; it will just wait for us to be disappointed at select time.
secondly, the native JSON
type is far more efficient. can we use extracted json values in a five table join? yes. will it be painfully slow if we’re using TEXT
? also yes. native JSON
will be much faster.
finally, there will almost certainly be new json functions in future versions of mysql (including, hopefully, a more-sane way to do WHERE IN
s) and there is no guarantee that they will behave well with TEXT
. bottom line: if you know you’re putting json in a column, make it a JSON
column.
co-founder of fruitbat studios. cli-first linux snob, metric evangelist, unrepentant longhair. all the music i like is objectively horrible. he/him.