New development features in MySQL 8.0.14
With the release of MySQL 8.0.14 Oracle keeps the already established practice of introducing new development features even with maintenance releases, which usually include only small improvements and mostly bug fixes. I took a look at 8.0.14's release notes, the post The MySQL 8.0.14 Maintenance Release is Generally Available by Geir Hoydalsvik and of course the manual, experimented and here below are my picks related to development.
Lateral derived tables
Before MySQL 8.0.14 it wasn't possible for derived tables to refer to (depend on) columns of preceding tables in the same FROM clause. Now this restriction is removed with the addition of LATERAL keyword (see Lateral derived tables). The LATERAL keyword means that the derived table depends on the previous table on its left side. You could have more than one LATERAL derived table in a query and each one would depend only on the previous table(s) or derived table(s). Lateral derived tables are so called "for each" loop of SQL and this makes possible some operations that were otherwise not possible or less efficient.
Here's an example. Let's say you want to calculate the minimum, average and maximum salaries for each department in the organization. Previously you would have to write it like this:
SELECT D.dname, DT.min_sal, DT.avg_sal, DT.max_sal
FROM dept D
LEFT OUTER JOIN
(SELECT E.deptno, MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal
FROM emp E
GROUP BY E.deptno
) AS DT
ON DT.deptno = D.deptno;
So use a derived table DT to calculate min/avg/max salary for all departments from emp table and then join with the dept table to produce the following result:
+------------+---------+-------------+---------+
| dname | min_sal | avg_sal | max_sal |
+------------+---------+-------------+---------+
| ACCOUNTING | 1300.00 | 2916.666667 | 5000.00 |
| RESEARCH | 800.00 | 2175.000000 | 3000.00 |
| SALES | 950.00 | 1566.666667 | 2850.00 |
| OPERATIONS | | | |
+------------+---------+-------------+---------+
4 rows in set (0.0014 sec)
The derived table is completely independent of the other joined table as it could generate result by itself (i.e. it doesn't depend on the column values of the other table). The explain plan for this query is given on the right and it confirms that the result set of the derived table is first materialized, so it could be joined to the other table.
SELECT D.dname, DT.min_sal, DT.avg_sal, DT.max_sal
FROM dept D
LEFT OUTER JOIN
(SELECT E.deptno, MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal
FROM emp E
GROUP BY E.deptno
) AS DT
ON DT.deptno = D.deptno;
So use a derived table DT to calculate min/avg/max salary for all departments from emp table and then join with the dept table to produce the following result:
+------------+---------+-------------+---------+
| dname | min_sal | avg_sal | max_sal |
+------------+---------+-------------+---------+
| ACCOUNTING | 1300.00 | 2916.666667 | 5000.00 |
| RESEARCH | 800.00 | 2175.000000 | 3000.00 |
| SALES | 950.00 | 1566.666667 | 2850.00 |
| OPERATIONS | | | |
+------------+---------+-------------+---------+
4 rows in set (0.0014 sec)
The derived table is completely independent of the other joined table as it could generate result by itself (i.e. it doesn't depend on the column values of the other table). The explain plan for this query is given on the right and it confirms that the result set of the derived table is first materialized, so it could be joined to the other table.
Another approach would be to use sub queries in the SELECT clause like this:
SELECT D.dname,
(SELECT MIN(E.sal) FROM emp E WHERE E.deptno = D.deptno) AS min_sal,
(SELECT AVG(E.sal) FROM emp E WHERE E.deptno = D.deptno) AS avg_sal,
(SELECT MAX(E.sal) FROM emp E WHERE E.deptno = D.deptno) AS max_sal
FROM dept D;
which won't be efficient (imagine a sales table and thousands of salesman if you want to evaluate their salaries) as three queries would have to do the job of one. It's not possible to use a single sub query to calculate all the necessary values in SELECT, because such sub queries need to be scalar. Such a query would result in error Error Code: 1241. Operand should contain 1 column(s) if you try it.
SELECT D.dname,
(SELECT MIN(E.sal) FROM emp E WHERE E.deptno = D.deptno) AS min_sal,
(SELECT AVG(E.sal) FROM emp E WHERE E.deptno = D.deptno) AS avg_sal,
(SELECT MAX(E.sal) FROM emp E WHERE E.deptno = D.deptno) AS max_sal
FROM dept D;
which won't be efficient (imagine a sales table and thousands of salesman if you want to evaluate their salaries) as three queries would have to do the job of one. It's not possible to use a single sub query to calculate all the necessary values in SELECT, because such sub queries need to be scalar. Such a query would result in error Error Code: 1241. Operand should contain 1 column(s) if you try it.
If you try to connect the derived table to the other table with a query like the following one:
SELECT D.dname, DT.min_sal, DT.avg_sal, DT.max_sal
FROM dept D,
(SELECT MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal
FROM emp E
WHERE E.deptno = D.deptno
) AS DT;
you'll get Error Code: 1054. Unknown column 'D.deptno' in 'where clause', because D table is not known to the derived table. The query is illegal in SQL-92, but in SQL-1999 it becomes legal if the derived table it's proceeded by the LATERAL keyword:
SELECT D.dname, LDT.min_sal, LDT.avg_sal, LDT.max_sal
FROM dept D,
LATERAL
(SELECT MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal
FROM emp E
WHERE E.deptno = D.deptno
) AS LDT;
and produces the following result:
+------------+---------+-------------+---------+
| dname | min_sal | avg_sal | max_sal |
+------------+---------+-------------+---------+
| ACCOUNTING | 1300.00 | 2916.666667 | 5000.00 |
| RESEARCH | 800.00 | 2175.000000 | 3000.00 |
| SALES | 950.00 | 1566.666667 | 2850.00 |
| OPERATIONS | NULL | NULL | NULL |
+------------+---------+-------------+---------+
4 rows in set (0.1182 sec)
As clear from the explain plan graph on the right in this case there's no grouping, but MySQL gives higher query cost, because access to the derived table is by full table scan. The more interesting information is however into the tabular explain plan (column partitions is intentionally hidden):
+----+-------------------+------------++------+---------------+-----------+---------+-------------------+------+----------+----------------------------+
| id | select_type | table || type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------------+------------++------+---------------+-----------+---------+-------------------+------+----------+----------------------------+
| 1 | PRIMARY | D || ALL | NULL | NULL | NULL | NULL | 4 | 100 | Rematerialize (<derived2>) |
| 1 | PRIMARY | <derived2> || ALL | NULL | NULL | NULL | NULL | 2 | 100 | NULL |
| 2 | DEPENDENT DERIVED | E || ref | fk_deptno | fk_deptno | 5 | dept_emp.D.deptno | 4 | 100 | NULL |
+----+-------------------+------------++------+---------------+-----------+---------+-------------------+------+----------+----------------------------+
3 rows in set, 2 warnings (0.0010 sec)
Note (code 1276): Field or reference 'dept_emp.D.deptno' of SELECT #2 was resolved in SELECT #1
Note (code 1003): /* select#1 */ select `dept_emp`.`d`.`dname` AS `dname`,`ldt`.`min_sal` AS `min_sal`,`ldt`.`avg_sal` AS `avg_sal`,`ldt`.`max_sal` AS `max_sal` from `dept_emp`.`dept` `d` join lateral (/* select#2 */ select min(`dept_emp`.`e`.`sal`) AS `min_sal`,avg(`dept_emp`.`e`.`sal`) AS `avg_sal`,max(`dept_emp`.`e`.`sal`) AS `max_sal` from `dept_emp`.`emp` `e` where (`dept_emp`.`e`.`deptno` = `dept_emp`.`d`.`deptno`)) `ldt`
There are two new information and an additional note. The plan clearly shows that the derived table E (derived2) is DEPENDENT on the other table and that its rematerialized for each row of D (see EXPLAIN extra information). This is the reason for which lateral derived tables are also known as the "for each" loop of SQL. The note informs about how the external reference in the derived table was resolved.
SELECT D.dname, DT.min_sal, DT.avg_sal, DT.max_sal
FROM dept D,
(SELECT MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal
FROM emp E
WHERE E.deptno = D.deptno
) AS DT;
you'll get Error Code: 1054. Unknown column 'D.deptno' in 'where clause', because D table is not known to the derived table. The query is illegal in SQL-92, but in SQL-1999 it becomes legal if the derived table it's proceeded by the LATERAL keyword:
SELECT D.dname, LDT.min_sal, LDT.avg_sal, LDT.max_sal
FROM dept D,
LATERAL
(SELECT MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal
FROM emp E
WHERE E.deptno = D.deptno
) AS LDT;
and produces the following result:
+------------+---------+-------------+---------+
| dname | min_sal | avg_sal | max_sal |
+------------+---------+-------------+---------+
| ACCOUNTING | 1300.00 | 2916.666667 | 5000.00 |
| RESEARCH | 800.00 | 2175.000000 | 3000.00 |
| SALES | 950.00 | 1566.666667 | 2850.00 |
| OPERATIONS | NULL | NULL | NULL |
+------------+---------+-------------+---------+
4 rows in set (0.1182 sec)
As clear from the explain plan graph on the right in this case there's no grouping, but MySQL gives higher query cost, because access to the derived table is by full table scan. The more interesting information is however into the tabular explain plan (column partitions is intentionally hidden):
+----+-------------------+------------++------+---------------+-----------+---------+-------------------+------+----------+----------------------------+
| id | select_type | table || type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------------+------------++------+---------------+-----------+---------+-------------------+------+----------+----------------------------+
| 1 | PRIMARY | D || ALL | NULL | NULL | NULL | NULL | 4 | 100 | Rematerialize (<derived2>) |
| 1 | PRIMARY | <derived2> || ALL | NULL | NULL | NULL | NULL | 2 | 100 | NULL |
| 2 | DEPENDENT DERIVED | E || ref | fk_deptno | fk_deptno | 5 | dept_emp.D.deptno | 4 | 100 | NULL |
+----+-------------------+------------++------+---------------+-----------+---------+-------------------+------+----------+----------------------------+
3 rows in set, 2 warnings (0.0010 sec)
Note (code 1276): Field or reference 'dept_emp.D.deptno' of SELECT #2 was resolved in SELECT #1
Note (code 1003): /* select#1 */ select `dept_emp`.`d`.`dname` AS `dname`,`ldt`.`min_sal` AS `min_sal`,`ldt`.`avg_sal` AS `avg_sal`,`ldt`.`max_sal` AS `max_sal` from `dept_emp`.`dept` `d` join lateral (/* select#2 */ select min(`dept_emp`.`e`.`sal`) AS `min_sal`,avg(`dept_emp`.`e`.`sal`) AS `avg_sal`,max(`dept_emp`.`e`.`sal`) AS `max_sal` from `dept_emp`.`emp` `e` where (`dept_emp`.`e`.`deptno` = `dept_emp`.`d`.`deptno`)) `ldt`
There are two new information and an additional note. The plan clearly shows that the derived table E (derived2) is DEPENDENT on the other table and that its rematerialized for each row of D (see EXPLAIN extra information). This is the reason for which lateral derived tables are also known as the "for each" loop of SQL. The note informs about how the external reference in the derived table was resolved.
Of course MySQL Workbench (even upgraded to 8.0.14 as well) is again not familiar with the new syntax (see my previous post New development features in MySQL 8.0.13), because it doesn't properly color the new keyword and shows error in SQL editor just after it. I reported this as bug 94012, but I have no much hope as 90620, 92900 and 92908 were verified yet still open. Version numbers doesn't mean much nowadays :-)
Please, note that it's possible even to connect to the outer table if the derived table is in a sub query (see an example in WL#461).
JSON aggregation functions can now be used as window functions
It is now possible to use JSON_ARRAYAGG and JSON_OBJECTAGG aggregate functions as window functions by using the OVER clause (see Window Function Concepts and Syntax). This makes all (except COUNT(DISTINCT) and GROUP_CONCAT) of the aggregate functions possible for use as window functions after bitwise AND/OR/XOR functions were made so with MySQL 8.0.12. Here's an example:
SELECT E.ename, E.sal,
AVG(E.sal) OVER dw AS avg_sal,
JSON_OBJECTAGG(D.dname, E.sal) OVER dw AS dept_sal
FROM emp E,
dept D
WHERE E.deptno = D.deptno
WINDOW dw AS (PARTITION BY D.deptno);
+--------+---------+-------------+------------------------+
| ename | sal | avg_sal | dept_sal |
+--------+---------+-------------+------------------------+
| CLARK | 2450.00 | 2916.666667 | {"ACCOUNTING": 1300.0} |
| KING | 5000.00 | 2916.666667 | {"ACCOUNTING": 1300.0} |
| MILLER | 1300.00 | 2916.666667 | {"ACCOUNTING": 1300.0} |
| SMITH | 800.00 | 2175.000000 | {"RESEARCH": 3000.0} |
| JONES | 2975.00 | 2175.000000 | {"RESEARCH": 3000.0} |
| SCOTT | 3000.00 | 2175.000000 | {"RESEARCH": 3000.0} |
| ADAMS | 1100.00 | 2175.000000 | {"RESEARCH": 3000.0} |
| FORD | 3000.00 | 2175.000000 | {"RESEARCH": 3000.0} |
| ALLEN | 1600.00 | 1566.666667 | {"SALES": 950.0} |
| WARD | 1250.00 | 1566.666667 | {"SALES": 950.0} |
| MARTIN | 1250.00 | 1566.666667 | {"SALES": 950.0} |
| BLAKE | 2850.00 | 1566.666667 | {"SALES": 950.0} |
| TURNER | 1500.00 | 1566.666667 | {"SALES": 950.0} |
| JAMES | 950.00 | 1566.666667 | {"SALES": 950.0} |
+--------+---------+-------------+------------------------+
14 rows in set (0.0021 sec)
It's important to note that MySQL does not permit duplicate keys in JSON data type, so in a window without ordering function JSON_OBJECTAGG would return the last value for the key, which may not be deterministic.
SELECT E.ename, E.sal,
AVG(E.sal) OVER dw AS avg_sal,
JSON_OBJECTAGG(D.dname, E.sal) OVER dw AS dept_sal
FROM emp E,
dept D
WHERE E.deptno = D.deptno
WINDOW dw AS (PARTITION BY D.deptno);
+--------+---------+-------------+------------------------+
| ename | sal | avg_sal | dept_sal |
+--------+---------+-------------+------------------------+
| CLARK | 2450.00 | 2916.666667 | {"ACCOUNTING": 1300.0} |
| KING | 5000.00 | 2916.666667 | {"ACCOUNTING": 1300.0} |
| MILLER | 1300.00 | 2916.666667 | {"ACCOUNTING": 1300.0} |
| SMITH | 800.00 | 2175.000000 | {"RESEARCH": 3000.0} |
| JONES | 2975.00 | 2175.000000 | {"RESEARCH": 3000.0} |
| SCOTT | 3000.00 | 2175.000000 | {"RESEARCH": 3000.0} |
| ADAMS | 1100.00 | 2175.000000 | {"RESEARCH": 3000.0} |
| FORD | 3000.00 | 2175.000000 | {"RESEARCH": 3000.0} |
| ALLEN | 1600.00 | 1566.666667 | {"SALES": 950.0} |
| WARD | 1250.00 | 1566.666667 | {"SALES": 950.0} |
| MARTIN | 1250.00 | 1566.666667 | {"SALES": 950.0} |
| BLAKE | 2850.00 | 1566.666667 | {"SALES": 950.0} |
| TURNER | 1500.00 | 1566.666667 | {"SALES": 950.0} |
| JAMES | 950.00 | 1566.666667 | {"SALES": 950.0} |
+--------+---------+-------------+------------------------+
14 rows in set (0.0021 sec)
It's important to note that MySQL does not permit duplicate keys in JSON data type, so in a window without ordering function JSON_OBJECTAGG would return the last value for the key, which may not be deterministic.
X Protocol improvements
According to the release notes data is now always converted to utf8mb4 character set (using the utf8mb4_general_ci collation). The other notable improvement is the support for "prepare functionality". The release notes provide no reference about this new functionality, but Geir Hoydalsvik has mentioned WL#9270 in his article, so I believe this is about preparing CRUD operations (see Preparing CRUD Statements). A simple example in JavaScript would be the following:
MySQL Shell 8.0.14
Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.
MySQL JS > \connect user@localhost
Creating a session to 'user@localhost'
Your MySQL connection id is 22 (X protocol)
Server version: 8.0.14 MySQL Community Server - GPL
MySQL localhost:33060+ ssl JS > \use test
Default schema `test` accessible through db.
MySQL localhost:33060+ ssl test JS > var usr = db.createCollection('users')
MySQL localhost:33060+ ssl test JS > usr.add({name:"User1",age:15})
Query OK, 1 item affected (0.0108 sec)
MySQL localhost:33060+ ssl test JS > usr.add({name:"User2",age:17})
Query OK, 1 item affected (0.0138 sec)
MySQL localhost:33060+ ssl test JS > usr.add({name:"User3",age:20})
Query OK, 1 item affected (0.0105 sec)
MySQL localhost:33060+ ssl test JS > usr.add({name:"User4",age:19})
Query OK, 1 item affected (0.0137 sec)
MySQL localhost:33060+ ssl test JS > usr.add({name:"User5",age:16})
Query OK, 1 item affected (0.0118 sec)
MySQL localhost:33060+ ssl test JS > usr.find()
[
{"_id": "00005c46e7e5000000000000000a","age": 15,"name": "User1"},
{"_id": "00005c46e7e5000000000000000e","age": 17,"name": "User2"},
{"_id": "00005c46e7e5000000000000000f","age": 20,"name": "User3"},
{"_id": "00005c46e7e50000000000000010","age": 19,"name": "User4"},
{"_id": "00005c46e7e50000000000000011","age": 16,"name": "User5"}
]
5 documents in set (0.0006 sec)
MySQL localhost:33060+ ssl test JS > var fcmd = usr.find('age >= :page')
MySQL localhost:33060+ ssl test JS > fcmd.bind('page', 18)
[
{"_id": "00005c46e7e5000000000000000f","age": 20,"name": "User3"},
{"_id": "00005c46e7e50000000000000010","age": 19,"name": "User4"}
]
2 documents in set (0.0006 sec)
MySQL localhost:33060+ ssl test JS > fcmd.bind('page', 16)
[
{"_id": "00005c46e7e5000000000000000e","age": 17,"name": "User2"},
{"_id": "00005c46e7e5000000000000000f","age": 20,"name": "User3"},
{"_id": "00005c46e7e50000000000000010","age": 19,"name": "User4"},
{"_id": "00005c46e7e50000000000000011","age": 16,"name": "User5"}
]
4 documents in set (0.0003 sec)
MySQL localhost:33060+ ssl test JS > fcmd.bind('page', 17)
[
{"_id": "00005c46e7e5000000000000000e","age": 17,"name": "User2"},
{"_id": "00005c46e7e5000000000000000f","age": 20,"name": "User3"},
{"_id": "00005c46e7e50000000000000010","age": 19,"name": "User4"}
]
3 documents in set (0.0004 sec)
The example creates a collection of users with their names and ages, then prints the whole collection. The interesting part starts with the line highlighted in yellow. It prepares a statement using a named parameter (anonymous parameters with ? are not supported by the X protocol), but does not execute it. The execution happens after a value is bind to the parameter and this could be done many times producing different results. It's interesting that in the general log the first bind actually executes a query, then there's prepare and then follow executes:
Query SELECT doc FROM `test`.`users` -> usr.find()
Query SELECT doc FROM `test`.`users` WHERE (JSON_EXTRACT(doc,'$.age') >= 18) -> fcmd.bind('page', 18)
Prepare SELECT doc FROM `test`.`users` WHERE (JSON_EXTRACT(doc,'$.age') >= ?)
Execute SELECT doc FROM `test`.`users` WHERE (JSON_EXTRACT(doc,'$.age') >= 16) -> fcmd.bind('page', 16)
Execute SELECT doc FROM `test`.`users` WHERE (JSON_EXTRACT(doc,'$.age') >= 17) -> fcmd.bind('page', 17)
Using prepared statements for repeatedly executed statements could lead do performance improvements, because of the saved parse time, so this is something you should consider if you need to improve the performance of your applications and scripts.
Spatial improvements
The function ST_Distance now accepts an optional third parameter specifying the unit for the returned value. The possible values are defined in INFORMATION_SCHEMA.ST_UNITS_OF_MEASURE table along with conversion factor relative to the basic unit metre, which is the default. Here is an example for calculation of the distance between Sofia and Sydney in kilometers and nautical miles in SRID 4326:
SELECT ST_Distance(ST_PointFromText('POINT( 42.69751 23.32415)', 4326),
ST_PointFromText('POINT(-33.86667 151.20000)', 4326)) / 1000 dist_km;
+--------------------+
| dist_km |
+--------------------+
| 15431.933058990671 |
+--------------------+
1 row in set (0.0023 sec)
SELECT ST_Distance(ST_PointFromText('POINT( 42.69751 23.32415)', 4326),
ST_PointFromText('POINT(-33.86667 151.20000)', 4326), 'nautical mile') dist_nm;
+------------------+
| dist_nm |
+------------------+
| 8332.57724567531 |
+------------------+
1 row in set (0.0008 sec)
SELECT ST_Distance(ST_PointFromText('POINT( 42.69751 23.32415)', 4326),
ST_PointFromText('POINT(-33.86667 151.20000)', 4326)) / 1000 dist_km;
+--------------------+
| dist_km |
+--------------------+
| 15431.933058990671 |
+--------------------+
1 row in set (0.0023 sec)
SELECT ST_Distance(ST_PointFromText('POINT( 42.69751 23.32415)', 4326),
ST_PointFromText('POINT(-33.86667 151.20000)', 4326), 'nautical mile') dist_nm;
+------------------+
| dist_nm |
+------------------+
| 8332.57724567531 |
+------------------+
1 row in set (0.0008 sec)
This wraps up my review. There is of course lot more in MySQL 8.0.14 not only for developers, so I encourage to explore and find out more. The references at the beginning of this article are a good starting point.
No comments:
Post a Comment