My personal and professional life

2019-07-31

New features in MySQL 8.0.17

Last Tuesday (July 27-th), Oracle released MySQL 8.0.7 following the three months release cycle introduced last year with 8 series. This is the next "maintenance" release introducing a few new features as well as depreciating some non-standard ones, so here are my picks starting with the ones I consider important for developers.

Multi-valued indexes

With the help of generated columns (introduced with MySQL 5.7.6) and functional indexes (introduced with MySQL 8.0.13 for which I wrote before) it become possible to index data in complex column values like JSON. But in JSON you could have scalars as well as arrays, so searching into arrays with the help of an index wasn't possible. Multi-valued indexes come to solve this by allowing multiple records in the index to point to the same data record. Such indexes are created like any other functional index and used automatically by the optimizer when possible. Let's see an example - a register for translators with their spoken languages.

CREATE TABLE translators (
  id INT AUTO_INCREMENT,
  jdata JSON,

  PRIMARY KEY(id)
);

  • Case 1 - array of strings
Let's create some data:

INSERT INTO translators (jdata)
VALUES ('{"name": "T1", "langs": ["English", "French", "Spanish"]}'),
       ('{"name": "T2", "langs": ["English", "Spanish"]}'),
       ('{"name": "T3", "langs": ["French", "Spanish"]}');


Then, let's query the translators that speak English using the new MEMBER OF operator:

SELECT id, jdata->>'$.name', jdata->'$.langs'
  FROM translators
 WHERE 'English' MEMBER OF (jdata->'$.langs');


Or the new function JSON_OVERLAPS like this:

SELECT id, jdata->>'$.name', jdata->'$.langs'
  FROM translators
 WHERE JSON_OVERLAPS(jdata->'$.langs', '["English"]');


Both queries are leading to the same result:

+----+------------------+----------------------------------+
| id | jdata->>'$.name' | jdata->'$.langs'                 |
+----+------------------+----------------------------------+
|  1 | T1               | ["English", "French", "Spanish"] |
|  2 | T2               | ["English", "Spanish"]           |
+----+------------------+----------------------------------+
2 rows in set (0.00 sec)


Given the data as expected the queries return T1 and T2, but not T3. However, these queries are doing a full table scan, so their performance would degrade with the accumulation of data in the table.

Execution plan without index

Fortunately, it's now possible to add a multi-valued index like this:

ALTER TABLE translators
  ADD INDEX idx_langs_arr ((CAST(jdata->'$.langs' AS CHAR(8) ARRAY)));

It's a functional index into which it's necessary to use function CAST with the new ARRAY keyword. With the index the execution plan of the SELECT queries above become respectively:
Execution plan of MEMBER OF with indexExecution plan of JSON OVERLAPS with index

  • Case 2 - array of objects
It's slightly different for arrays of objects, but only for the JSONPath expression. Let's create some data (after cleaning Case 1):

INSERT INTO translators (jdata)
VALUES ('{"name": "T1", "langs": [{"lang": "English"}, {"lang": "French"}, {"lang": "Spanish"}]}'),
       ('{"name": "T2", "langs": [{"lang": "English"}, {"lang": "Spanish"}]}'),
       ('{"name": "T3", "langs": [{"lang": "French"}, {"lang": "Spanish"}]}');


Then, let's query the translators that speak English the same two ways:

SELECT id, jdata->>'$.name', jdata->'$.langs[*].lang'
  FROM translators
 WHERE 'English' MEMBER OF (jdata->'$.langs[*].lang');


SELECT id, jdata->>'$.name', jdata->'$.langs[*].lang'
  FROM translators
 WHERE JSON_OVERLAPS(jdata->'$.langs[*].lang', '["English"]');


Just like in the first case the queries use full table scan, which could now easily be changed by adding a multi-valued index like this:

ALTER TABLE translators
  ADD INDEX idx_langs_obj ((CAST(jdata->'$.langs[*].lang' AS CHAR(8) ARRAY)));


Notice the slightly different syntax of the JSONPath expression. For the index to work it's of course necessary to use the same expression in WHERE clause as in the index definition. Developers that prefer to keep data directly into JSON columns should be happy by this new feature, as it makes it possible to index not only scalar variables, but also arrays.

JSON

Apart from the already mentioned new standard operator MEMBER OF for searching values in JSON arrays there are three new functions JSON_OVERLAPS, JSON_SCHEMA_VALID and JSON_SCHEMA_VALIDATION_REPORT.
  • Function JSON_OVERLAPS compares two JSON documents and returns true "if the two document have any key-value pairs or array elements in common". Like MEMBER OF and JSON_CONTAINS function JSON_OVERLAPS may benefit from multi-value indexes.
  • Functions JSON_SCHEMA_VALID and JSON_SCHEMA_VALIDATION_REPORT are added in regards to JSON Schema support. The first one validates a JSON document against a JSON schema and returns true if valid otherwise false, so its usable as a CHECK Constraint. The second one would provide insights into validation errors in the form of JSON document.

Cast to FLOAT, DOUBLE and REAL

Functions CAST and CONVERT now could cast to floating point data types such as FLOAT, DOUBLE and REAL. Let's try it with an incorrect value:

SELECT CAST('1.23.34' AS FLOAT) cast_res, CONVERT('1.23.34', FLOAT) conv_res;

+--------------------+--------------------+
| cast_res           | conv_res           |
+--------------------+--------------------+
| 1.2300000190734863 | 1.2300000190734863 |
+--------------------+--------------------+
1 row in set, 2 warnings (0.00 sec)

There are two warnings, so let's see them with show warnings:

+---------+------+---------------------------------------------+
| Level   | Code | Message                                     |
+---------+------+---------------------------------------------+
| Warning | 1292 | Truncated incorrect DOUBLE value: '1.23.34' |
| Warning | 1292 | Truncated incorrect DOUBLE value: '1.23.34' |
+---------+------+---------------------------------------------+
2 rows in set (0.00 sec)

Casting to DOUBLE and REAL produced different result and the same warnings:

SELECT CAST('1.23.34' AS DOUBLE) cast_res, CONVERT('1.23.34', DOUBLE) conv_res;
SELECT CAST('1.23.34' AS REAL) cast_res, CONVERT('1.23.34', REAL) conv_res;


+----------+----------+
| cast_res | conv_res |
+----------+----------+
|     1.23 |     1.23 |
+----------+----------+
1 row in set, 2 warnings (0.00 sec)

CLONE command

For many years MySQL administrators had to dump masters, transfer the dump over network and load it into replicas in order to initialize their state (see Copy a Database from one Server to Another). I'm not that much into database administration, but it was cumbersome and annoying procedure especially in case of unrecoverable replication errors where I had to re-initialize the state of replicas, so I considered it a burden. Why I could not just "clone" the master after wiping the replica? Well, the new version makes it possible to easily create new instances or re-initialize existing by implementing native provisioning into the server with the CLONE command. It's made possible by the new MySQL Clone Plugin. You could find more about this by reading the following articles:
It's the focus of this release and I'm sure it's going to be a game changer for organizations using extensively MySQL replication.

Deprecations

This is a list of features that are depreciated with this release and would be removed in future releases:
  • Function FOUND_ROWS and query modifier SQL_CALC_FOUND_ROWS. Documentation suggests using COUNT(*) to find the number of rows.
  • Numeric data type attributes:
    • Integer display width. There is now warning "1681 Integer display width is deprecated and will be removed in a future release." if you try to create tables with INT(11) for example. I personally have such definitions as provided by mysqldump and MySQL Workbench, so I'll have to remove them all, before the statements are rejected with syntax error in future.
    • The ZEROFILL attribute. I never used it anyway.
    • The UNSIGNED attribute for FLOAT, DOUBLE, and DECIMAL data types. I haver used it as well.
    • AUTO_INCREMENT support for FLOAT and DOUBLE data types. Anyone?
  • The syntax  FLOAT(M,D) and DOUBLE(M,D) for specifying number of digits for floating point types.
  • Logical operators && (double ampersand), whish is synonym for AND, || (double pipe), which is synonym for OR and ! (exclamation mark), which is synonym for NOT. I'm not sure I ever used any of these even in daily queries, because I find using ANDOR and NOT a lot more expressive.
  • BINARY keyword for specifying _bin collations. I never used this as well.
Developers should consider getting rid of these nonstandard features to avoid unpleasant surprises in future. I'll personally review my projects and correct as soon as possible.

Bug fixes

In regards to CHECK constraints I found two bugs in the prevoius release and one of them was fixed (see bug #95189 CHECK constraint comparing columns is not always enforced with UPDATE queries). For the full example please, see bug_95189_test_case.sql script, so let's just verify it:

SET binlog_format = 'STATEMENT';
SET binlog_row_image = 'minimal';

UPDATE tst SET end_date = '2019-04-20' WHERE id = 1;
/* Error Code: 3819. Check constraint 'chk_dat' is violated. */

So it's fixed, because as expected the UPDATE query fails even with the specific binlog settings. The other one (see bug #95192 CHECK constraint comparing column with default value is not enforced) would have to wait.

That wraps up my review of new features in MySQL 8.0.17 maintenance release. I hope new releases bring more features for developers.

1 comment:

Anonymous said...

Thank you!!!