mysql: using json data and not hating it

it’s a universal truth that one of the primary jobs of backend developers is drinking from the firehose of json the frontends are constantly spraying at us. normally, we pick apart all those key/value pairs and slot them into the neatly-arranged columns of our db, keeping our database structure nice and normal. but sometimes there are good reasons to just store a whole json object or array as an undifferentiated blob.

in the dark, old days, this json would go in a text column, which was fine if all we needed to do was store it and return it. but if we needed to do something more complex like, extract certain values from that json or, god forbid, use one in a WHERE clause, things could get out of hand pretty quickly.

fortunately, modern mysql supports JSON as a native data type and offers a whole host of useful json functions, allowing us to work with json data in a way that’s almost pleasurable. in this post we’re going to go over extracting values from json and using json data in WHERE clauses.

this column can fit so much schema-less data in it

creating a JSON column

let’s start by creating a table with a column of type JSON.

CREATE TABLE `some_data` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(200) NOT NULL,
  `some_json` json NULL,
  `some_text` text NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

the test data we’re going to be using is the json representation of bratmobile’s ‘kiss and ride’ single.

{
  "artist": "Bratmobile",
  "title": "Kiss and Ride",
  "format": "single",
  "year": 1992,
  "label": "Kill Rock Stars",
  "tracks": {
    "A": [
      {
        "title": "Kiss and Ride",
        "duration_seconds": 88
      }
    ],
    "B": [
      {
        "title": "No You Don't",
        "duration_seconds": 105
      },
      {
        "title": "Queenie",
        "duration_seconds": 79
      }
    ]
  }
}

we’ll insert that json into both a native JSON type column as well as into a TEXT column. we’ll come back to that TEXT column later.

INSERT INTO some_data VALUES(
  null,
  'some name',
  -- as json
  '{ "artist": "Bratmobile", "title": "Kiss and Ride", "format": "single", "year": 1992, "label": "Kill Rock Stars", "tracks": { "A": [ { "title": "Kiss and Ride", "duration_seconds": 88 } ], "B": [ { "title": "No You Don\'t", "duration_seconds": 105 }, { "title": "Queenie", "duration_seconds": 79 } ] } }',
  -- as string
  '{ "artist": "Bratmobile", "title": "Kiss and Ride", "format": "single", "year": 1992, "label": "Kill Rock Stars", "tracks": { "A": [ { "title": "Kiss and Ride", "duration_seconds": 88 } ], "B": [ { "title": "No You Don\'t", "duration_seconds": 105 }, { "title": "Queenie", "duration_seconds": 79 } ] } }'
);

extracting some data from that json

now that we have some json in our JSON column, we can work on selecting out individual values. we do this with mysql’s JSON_EXTRACT function.

JSON_EXTRACT takes two arguments: the column and the path of the element you want to select. paths are separated by a dot, kind of the same way directory paths are separated by a /, and the top level of the path is denoted by $. if you’ve ever struggled with jq, you should feel comfortable struggling with this.

in this example, we select the artist element from the top level of our json object.

SELECT  JSON_EXTRACT(some_json, '$.artist') as artist
FROM    some_data
WHERE   id = 1;
+--------------+
| artist       |
+--------------+
| "Bratmobile" |
+--------------+

mysql also provides the -> operator as a shorthand for JSON_EXTRACT to help make our queries a little cleaner.

SELECT  some_json -> '$.artist' as artist
FROM    some_data
WHERE   id = 1;
+--------------+
| artist       |
+--------------+
| "Bratmobile" |
+--------------+

an annoyance we notice immediately with these results is that they’re quote-enclosed. we don’t want this. nobody wants this. let’s remove them with the function JSON_UNQUOTE

SELECT  JSON_UNQUOTE(JSON_EXTRACT(some_json, '$.artist')) as artist
FROM    some_data
WHERE   id = 1;
+------------+
| artist     |
+------------+
| Bratmobile |
+------------+

the shorthand -> operator we used before can be modified to ->> to automatically includ JSON_UNQUOTE. using this operator is the same as enclosing a call to JSON_EXTRACT in JSON_UNQUOTE:

SELECT  some_json ->> '$.artist' as artist
FROM    some_data
WHERE   id = 1;
+------------+
| artist     |
+------------+
| Bratmobile |
+------------+

dealing with arrays

getting single elements from the top level is great, but we’re also going to want to deal with arrays.

we can reference arrays in the path we pass to JSON_EXTRACT by using an index inside square brackets. this is pretty familiar stuff. for instance, in our sample data we have an object called tracks that contains two arrays keyed as A and B; these are the songs on side ‘a’ and ‘b’. those A and B elements are arrays of song objects. if we wanted to get the first song of side B, we would construct our JSON_EXTRACT call like so:

SELECT  JSON_EXTRACT(some_json, '$.tracks.B[0].title') as side_b_song_one
FROM    some_data
WHERE   id = 1;
+-----------------+
| side_b_song_one |
+-----------------+
| "No You Don't"  |
+-----------------+

looking at the path argument, we can see we start at the top of the hierarchy with $, then reference the tracks object. next is the array B. we indicate we want the first element of this array with [0], then continue our path by giving the key title. the result is the title of the first song on side b.

if we want all the songs on side b, we can replace that index with the wildcard *. this returns a json array of titles.

SELECT  JSON_EXTRACT(some_json, '$.tracks.B[*].title') as side_b
FROM    some_data
WHERE   id = 1;
+-----------------------------+
| side_b                      |
+-----------------------------+
| ["No You Don't", "Queenie"] |
+-----------------------------+

we could also use the ** wildcard to do this job.

SELECT  JSON_EXTRACT(some_json, '$.tracks.B**.title') as side_b
FROM    some_data
WHERE   id = 1;
+-----------------------------+
| side_b                      |
+-----------------------------+
| ["No You Don't", "Queenie"] |
+-----------------------------+

it’s important to note that ** actually means all the paths between the prefix and the suffix. this means that we can use it to get all of the title elements under tracks, regardless of whether there in the array keyed A or B like so:

SELECT  JSON_EXTRACT(some_json, '$.tracks**.title') as side_a_and_side_be
FROM    some_data
WHERE   id = 1;
+----------------------------------------------+
| side_a_and_side_be                           |
+----------------------------------------------+
| ["Kiss and Ride", "No You Don't", "Queenie"] |
+----------------------------------------------+

in that example, ** matches both A and B. very powerful stuff.

we can also use ranges when defining our array indexes. for instance, if we wanted to get all the titles on side B between position zero and one, we could use the index [0 to 1]:

SELECT  JSON_EXTRACT(some_json, '$.tracks.B[0 to 1].title') as side_b_first_two
FROM    some_data
WHERE   id = 1;
+-----------------------------+
| side_b_first_two            |
+-----------------------------+
| ["No You Don't", "Queenie"] |
+-----------------------------+
and if all we want is the last element of an array, mysql allows us to use the literal word last:
SELECT  JSON_EXTRACT(some_json, '$.tracks.B[last].title') as side_b_last_track
FROM    some_data
WHERE   id = 1;
+-------------------+
| side_b_last_track |
+-------------------+
| "Queenie"         |

using this in WHERE clauses

extracting data from json is useful but, really, we could just select the whole column and run the data through json_decode in our controller. the real money is using this in WHERE clauses. let’s look:
SELECT  JSON_EXTRACT(some_json, '$.title') as title
FROM    some_data
WHERE   JSON_EXTRACT(some_json, '$.year') = 1992;
+-----------------+
| title           |
+-----------------+
| "Kiss and Ride" |
+-----------------+

here we just extracted year from our json column and used it in a WHERE clause.

a kludge to simulate WHERE IN clauses

if we want to write a WHERE IN clause against an array in our json object, things get a bit trickier. our initial ideal might be to simply try using IN on an array extracted from our json:

-- this absolutely DOES NOT WORK
SELECT  JSON_EXTRACT(some_json, '$.title') as title
FROM    some_data
WHERE   "No You Don't" IN JSON_EXTRACT(some_json, '$.tracks**.title');

no dice. the problem here, of course, is that WHERE IN expects a list of values, not a json array. there’s a type mismatch.

the solution is to use MEMBER OF.

this construct behaves kind of like a function and kind of like an operator. it takes it’s left value and tests if it resides in the json array that is passed to it as an argument.

SELECT  JSON_UNQUOTE(JSON_EXTRACT(some_json, '$.artist')) as artist,
        JSON_UNQUOTE(JSON_EXTRACT(some_json, '$.title')) as title
FROM    some_data
WHERE   "Queenie" MEMBER OF(CAST(JSON_EXTRACT(some_json, '$.tracks**.title') AS json));
+------------+---------------+
| artist     | title         |
+------------+---------------+
| Bratmobile | Kiss and Ride |
+------------+---------------+

the thing to note here is the argument to MEMBER OF. we have extracted all the title values from our json columns and mashed them together into one array. that ‘array’ isn’t really an array, though; it’s a string. and not even a json string, just a string string.

since MEMBER OF requires json-style data as an argument, we need to call CAST to turn the string of our array into an actual json array.

this solution is a bit messy, and if i am kludging up the wrong tree here and there is a better, more elegant way to go about this, i am very much open to input. however, this construct does work.

wait. this works on TEXT columns too?

the secret is that all of the stuff we’ve covered here also works on TEXT columns. if we run that WHERE IN simulation select against our some_text column, it works just fine:

SELECT  JSON_UNQUOTE(JSON_EXTRACT(some_json, '$.artist')) as artist,
        JSON_UNQUOTE(JSON_EXTRACT(some_json, '$.title')) as title
FROM    some_data
WHERE   "Queenie" MEMBER OF(CAST(JSON_EXTRACT(some_text, '$.tracks**.title') AS json));
+------------+---------------+
| artist     | title         |
+------------+---------------+
| Bratmobile | Kiss and Ride |
+------------+---------------+

so why bother with the JSON column type at all? well, most importantly, when we use the JSON type we get automatic validation on insert or update. we can put invalid json into TEXT columns and mysql will never complain; it will just wait for us to be disappointed at select time.

secondly, the native JSON type is far more efficient. can we use extracted json values in a five table join? yes. will it be painfully slow if we’re using TEXT? also yes. native JSON will be much faster.

finally, there will almost certainly be new json functions in future versions of mysql (including, hopefully, a more-sane way to do WHERE INs) and there is no guarantee that they will behave well with TEXT. bottom line: if you know you’re putting json in a column, make it a JSON column.

Posted by: grant horwood

co-founder of fruitbat studios. cli-first linux snob, metric evangelist, unrepentant longhair. all the music i like is objectively horrible. he/him.

Leave a Reply