My personal and professional life

2019-06-05

TREE explain format in MySQL 8.0.16

I have omitted an interesting new feature in latest MySQL release 8.0.16, which I covered last month. It is mentioned in the release notes (see Functionality Added or Changed and scroll down or search for TREE) and also in the documentation (see Obtaining Execution Plan Information). As it turns out new TREE format for the explain plan was added.

Until now there were two options for format of the explain plans generated with EXPLAIN statement:
  • TRADITIONAL - the old tabular explain plan suitable for console;
  • JSON - the explain plan in JSON format used by MySQL Workbench to display nice graphics in Visual Explain, which I use in presentations.
Whoever is familiar with the traditional tabular output knows that it's not indented and sometimes it's hard to understand the order of operations. Now there is the option to use TREE format, which prints the "generated iterator tree" with the operations indented to right similarly to Oracle execution plans. This should help users to understand better how query execution is actually set up, but it's still experimental, so both the syntax and output may change in future.

Let's take the following two queries that generate the same result - minimal, average and maximum salaries for the departments in the organization based on the DEPT and EMP schema (see my adaptation for MySQL and EMP and DEPT script). The second query is possible only with MySQL 8.0.14 and later (see New development features in MySQL 8.0.14).

Query 1: Using LEFT OUTER JOIN and derived table

SELECT D.dname, DT.min_sal, DT.avg_sal, DT.max_sal
  FROM dept D
       LEFT OUTER JOIN
       (SELECT E.deptno,
               MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal
          FROM emp E
         GROUP BY E.deptno
       ) AS DT
       ON DT.deptno = D.deptno;

Query 2: Using LATERAL derived table

SELECT D.dname, LDT.min_sal, LDT.avg_sal, LDT.max_sal
  FROM dept D,
       LATERAL
       (SELECT MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal
          FROM emp E
         WHERE E.deptno = D.deptno
       ) AS LDT;


The tabular execution plans of the two queries (with columns partitions, possible_keys and filtered removed to shorten the output) and the extended output are as follows:

Query 1

+----+-------------+------------++-------++-------------+---------+-------------------+------++-------+
| id | select_type | table      || type  || key         | key_len | ref               | rows || Extra |
+----+-------------+------------++-------++-------------+---------+-------------------+------++-------+
|  1 | PRIMARY     | D          || ALL   || NULL        | NULL    | NULL              |    4 || NULL  |
|  1 | PRIMARY     | <derived2> || ref   || <auto_key0> | 5       | dept_emp.D.deptno |    2 || NULL  |
|  2 | DERIVED     | E          || index || fk_deptno   | 5       | NULL              |   14 || NULL  |
+----+-------------+------------++-------++-------------+---------+-------------------+------++-------+
3 rows in set, 1 warning (0.01 sec)


+-------+------+-----------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                             |
+-------+------+-----------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `dept_emp`.`D`.`dname` AS `dname`... left join (
                 /* select#2 */ select `dept_emp`.`E`.`deptno` AS `deptno`,min(`dept_emp`.`E`.`sal`) AS `min_sal`... |
+-------+------+-----------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

Query 2

+----+-------------------+------------++------++-----------+---------+-------------------+------++----------------------------+
| id | select_type       | table      || type || key       | key_len | ref               | rows || Extra                      |
+----+-------------------+------------++------++-----------+---------+-------------------+------++----------------------------+
|  1 | PRIMARY           | D          || ALL  || NULL      | NULL    | NULL              |    4 || Rematerialize (<derived2>) |
|  1 | PRIMARY           | <derived2> || ALL  || NULL      | NULL    | NULL              |    2 || NULL                       |
|  2 | DEPENDENT DERIVED | E          || ref  || fk_deptno | 5       | dept_emp.D.deptno |    1 || NULL                       |
+----+-------------------+------------++------++-----------+---------+-------------------+------++----------------------------+
3 rows in set, 2 warnings (0.0006 sec)

+-------+------+-----------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                             |
+-------+------+-----------------------------------------------------------------------------------------------------+
| Note  | 1276 | Field or reference 'dept_emp.D.deptno' of SELECT #2 was resolved in SELECT #1                       |
| Note  | 1003 | /* select#1 */ select `dept_emp`.`D`.`dname` AS `dname`, ... join lateral (
                 /* select#2 */ select min(`dept_emp`.`E`.`sal`) AS `min_sal`...                                     |
+-------+------+-----------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)


That's not nice, right? The additional information printed with SHOW WARNINGS only marks the involved queries, so you could more easily identify them in the plan (corresponding to first column id). For the second query there is also a note about how the reference from the second query was resolved in the first query.

Now, here is how the new execution plan outputs look like:

Query 1

+-----------------------------------------------------------------+
| EXPLAIN                                                         |
+-----------------------------------------------------------------+
| -> Nested loop left join
    -> Table scan on D
    -> Index lookup on DT using <auto_key0> (deptno=d.deptno)
        -> Materialize
            -> Group aggregate: min(e.sal), avg(e.sal), max(e.sal)
                -> Index scan on E using fk_deptno                |
+-----------------------------------------------------------------+
1 row in set (0.0025 sec)

Query 2

+----------------------------------------------------------------------+
| EXPLAIN                                                              |
+----------------------------------------------------------------------+
| -> Nested loop inner join
    -> Invalidate materialized tables (row from D)
        -> Table scan on D
    -> Table scan on LDT
        -> Materialize (invalidate on row from D)
            -> Aggregate: min(e.sal), avg(e.sal), max(e.sal)
                -> Index lookup on E using fk_deptno (deptno=d.deptno) |
+----------------------------------------------------------------------+
1 row in set, 1 warning (0.0007 sec)
Note (code 1276): Field or reference 'dept_emp.D.deptno' of SELECT #2 was resolved in SELECT #1


That's nice, right? You see all the operations that would be executed and the indentation helps understand how they're nested or in other words which is executed before others. I will have to see how this new format would evolve, but even now I think it would be quite useful for complex queries, so it's very good they added it.

Thank you MySQL developers!

No comments: