the cold, hard truth is that the reason your web app or api is running slow is because of your database calls. you can migrate everything to frakenphp or put compression in your http server if you want, i’m not going to stop you, but if you want to shave whole seconds off your response times, you should take a hard look at all that janky sql and yolo orm queries you wrote.
the good news is that mysql
has a tool to find and log your slow queries. it’s called the ‘slow query log’, basically the perfect name, and it comes pre-installed. let’s get it turned on, set up, and look at the output.
configuring slow query log the quick way
the slow query log comes turned off by default, a totally-unwarranted vote of confidence in our query-crafting skills by the mysql dev team. fortunately, we can turn it on simply by setting the global variable slow_query_log
to ON
.
at our mysql
prompt:
SET @@GLOBAL.slow_query_log = 'ON';
note here that we’re using the GLOBAL
version of this variable; we’re turning on the slow query log for all the databases on our mysql server.
once logging is on, we’re going to want to set exactly where that log file should be:
SET @@GLOBAL.slow_query_log_file = '/path/to/file.log';
that file, of course, should be writable by your mysql
user. if the file doesn’t exist, mysql
will do you the favour of creating it.
finally, we’re going to define exactly how slow ‘slow’ is. this is done by setting the variable long_query_time
to the maximum number of seconds a query can take before being flagged as ‘slow’.
SET @@GLOBAL.long_query_time = 1.0;
note here that this value is a float and fractions of seconds are perfectly fine. the default value is 10.0
, which is ridiculous.
once we’ve set those variables, we’re logging. no restart required.
if we want to check our slow query configuration with can show our variables with these commands:
SHOW GLOBAL VARIABLES LIKE "slow_%"; SHOW GLOBAL VARIABLES LIKE "long_query_time";
configuring slow query log the more-permanent way
setting variables on the command line doesn’t persist across restarts however. if we want to make the slow query a log a more permanent thing, we’re going to need add our directives to the config file.
mysql config file locations and layout vary from distro to distro pretty dramatically. what we want to find the the [mysqld]
configuration block. some grepping may be required.
if we’re running a later version of ubuntu, we’ll find some slow query log defaults already set and commented out:
# Here you can see queries with especially long duration # slow_query_log = 1 # slow_query_log_file = /var/log/mysql/mysql-slow.log # long_query_time = 2 # log-queries-not-using-indexes
all we have to do is uncomment them and set the values we want.
analyzing output with some shell kludgery
the whole point of doing this is to get mysql to write our shameful queries to a log that we can read later, so let’s look at some example logfile data:
/usr/sbin/mysqld, Version: 8.0.34-0ubuntu0.22.04.1 ((Ubuntu)). started with: Tcp port: 3306 Unix socket: /var/run/mysqld/mysqld.sock Time Id Command Argument # Time: 2024-05-02T17:31:27.113225Z # User@Host: root[root] @ localhost [] Id: 17086 # Query_time: 0.022704 Lock_time: 0.000004 Rows_sent: 1 Rows_examined: 2007 use a_test_database; SET timestamp=1714671087; SELECT count(*) FROM large_table lt LEFT JOIN other_table ot ON lt.created_at > ot.created_at; # Time: 2024-05-02T17:31:47.753776Z # User@Host: root[root] @ localhost [] Id: 17086 # Query_time: 0.037307 Lock_time: 0.000009 Rows_sent: 1 Rows_examined: 2007 SET timestamp=1714671107; SELECT count(*) FROM large_table lt LEFT JOIN other_table ot ON lt.created_at > ot.created_at; # Time: 2024-05-02T17:32:43.277762Z # User@Host: root[root] @ localhost [] Id: 17086 # Query_time: 0.033879 Lock_time: 0.000006 Rows_sent: 1 Rows_examined: 2007 SET timestamp=1714671163; SELECT count(*) FROM large_table lt LEFT JOIN other_table ot ON lt.created_at > ot.created_at;
there’s a lot of good data here, stuff we want like the query time and how many rows were examined and a full writing of the query itself, but the format is… not great.
if we want to do something like find the longest-running query (that would be useful!), we basically have to scroll through the entire log looking for the biggest number. that’s fine if there’s ten entries or so, but if we run this on production, racking up fifty thousand log writes, we’re in trouble. to get our slowest queries, we’re going to need to do a little awk
/sed
kludgery:
$ tail +4 /path/to/file.log | grep -v "^# [^Query_time]" | awk '{printf("%s ", $0)}' | sed -e 's/# /\n/g' | sort -r | head -1
if we run that, we get the output:
Query_time: 0.037307 Lock_time: 0.000009 Rows_sent: 1 Rows_examined: 2007 SET timestamp=1714671107; SET timestamp=1714671087; SELECT count(*) FROM large_table lt LEFT JOIN other_table ot on lt.created_at > ot.created_at;
that is our slowest query; a piece of casual violence where we joined on a datetime using a greater-than. we should fix that.
of course it’s rude to present a six-command pipeline without explaining what it does or how it does it, so let’s go over that little script
tail +4 /path/to/file.log
the raw log output has a bunch of header data at the top. it’s useful stuff, but not anything we’re particularly interested in right now. we strip off the first bunch of lines by calling tail +4
here.
grep -v "^# [^Query_time]"
at the top of each query entry there’s some comments that start with “#”. the only one of those we want is Query_time
, so we use grep
to remove all lines that start with #
except those that contain the string ‘Query_time’.
awk '{printf("%s ", $0)}'
ultimately, we want to have all the data about one query on its own line. doing that is a two-step process and the first step is stripping out all line endings and putting the whole file into one line. that’s what this little dollop of awk
does.
sed -e 's/# /\n/g'
now that our whole file is one line, we split it into one line per query. since we want `# Query_time` to be first in each line to make sorting easier, we split on `#`
sort -r
at last, we can sort. since the query time is the first variable content in each line, it’s what we’re sorting on absent duplicate values. we use the -r
switch here to reverse the the sort, putting larger numbers first.
head -1
this just makes it so we only see the first line.
which is what we want.