Georgi Sotirov's Blog

My personal and professional life

2020-02-29

Migrations to Git and GitHub continue

Last year I started migrating from CVS to Git and GitHub (see First migrations from CVS to Git), which is something I was willing to start back in 2018 (see Migration to Git and GitHub), but planned since even longer ago. Well, it proved to be a tough task, because I'm still migrating as many of my CVS repositories required corrections, so the history could be properly transferred in Git. During the migration I was surprised how negligent I was to my sources, because I found some uncommitted changes from as far back as 10 years ago, corrupted revision control files and various history inconsistencies. After migrating some projects, I immediately started working on updates and fixes, so I already continue my work on GitHub. I also enabled continuous integration for some projects using Travis CI and GitHub Actions.

I'm currently migrating my Slackware package's build scripts, which is a collection of over 300 Shell scripts and related files organized as separate repositories in dedicated directories, but under the same root. In CVS this was completely right, but it does not produce good history in Git, because the version tags or branches (e.g. FFmpeg-3_4_7 or MySQL-5_5) are specific only to some build script and used only by the files in its respective directory. I thus decided to migrate these separately and then combine them with another repository with sub modules. There were many problems with this migration as well.

These were some the problems:
  • tags or branches with similar names (e.g. TEST-123 and TEST_123). Those are easy to correct - I just delete the wrong tag/branch;
  • misplaced tags or branches. Since in CVS I was tagging and branching as necessary some tags and branches were put on different files at different time, which in Git history resulted in commits with the message "This commit was manufactured by cvs2git to create tag 'TEST-123'" or "This commit was manufactured by cvs2git to create branch 'TEST-123'" with files being added, deleted or modified to adjust history for the tag or branch respectively. I'm fixing such problems by reordering the problematic commits, which is easy by just changing the time in the revision control file in the repository, but requires time to review and understand the reason;
  • unused files not deleted in version history. This also produced the commits with messages mentioned in the previous bullet point. I'm fixing these by deleting the unused files with past date, so such files are not considered by consecutive commits;
  • files belonging to a branch, but committed to main trunk instead. It's similar to the above. In some cases, there are files that should exist only on some branch (e.g. like patches to fix issues with specific software versions), but were committed to main trunk. As in CVS there is no big difference between tags and branches it was enough just to tag the changes for checking out properly. I'm moving these from main trunk into a branch, which means deleting revisions from main trunk and moving them on the branch;
  • spelling errors. Since I mostly committed to CVS on the command line and I did not use spell checker there were many commit messages with spelling errors that irritate my eyes. I'm correcting them after initial migration to Git, so I could check all messages at once.
Two utilities proved to be of great help after migration - licensee and github-linguist. I use the first to check whether license is detected properly. I had some problems (see issues 361 and 392) with some of my projects, so I now check before pushing to GitHub. The second is useful for fine tunning languages detection (e.g. I'd like to see just SQL and not PLSQL, PLpgSQL, SQLPL or TSQL that linguist would detect although all my published SQL sources are for MySQL. And for me SQL is code, not data).

Anyway, this post is to note that my migrations to Git continue. I've made great progress last year by migrating 28 repositories. And this year by now I migrated 114 more. So hopefully by the end of the year I'll be free from using CVS (or at least only by exception). Initially I planned migrating everything, but now I consider skipping some low value projects or examples, which if necessary, I could migrate later.

2020-01-17

MySQL 8.0.19 suite released

MySQL 8.0.19 suite including Community Server, Router and Shell, was released on Monday (01/13). The focus of these releases is the new InnoDB ReplicaSet functionality that simplifies management of classic MySQL replication with single primary and multiple secondaries in the same way as setting up Group Replication with InnoDB Cluster. You could read more about it in the article The all new MySQL InnoDB ReplicaSet. Apart from this there are some other improvements for which you could read below.

SQL syntax

The server now understands the more general and standard ALTER TABLE … DROP/ALTER CONSTRAINT syntax for which I tweeted earlier. It's a feature request since MySQL 4.1 from more than 15 years ago. You could find more in Foreign Keys and Other Constraints section of ALTER TABLE article in documentation.

Table value constructors are another improvement towards the standard allowing generation of rows with the VALUES statement (not to be confused with VALUES keyword for INSERT statements). Due to name conflict with the VALUES function it's necessary to use the verbose form with ROW constructor. This basically means the standalone statement VALUES ROW(1), ROW(2), ROW(3) is now properly understood by the server. VALUES statement could be used also in unions, joins, as derived table and other cases where you could employ SELECT. See the VALUES statement article in documentation for more details.

The new TABLE statement implements another standard feature - explicit table clauses. It could be used like SELECT, because TABLE t is just like SELECT * FROM t, so it could be used in unions, with INTO, as IN subquery and elsewhere where SELECT could be employed.

An interesting new addition is the LIMIT clause for recursive CTEs, which allows the debugging of expressions generating too many results. The idea is simply to limit the result from a recursive CTE to some reasonable number (e.g. 10, 100, etc.), so you could see the generated output and be able to understand how and why it's generated like this. This new features is covered in the article A new, simple way to figure out why your recursive CTE is running away and in Limiting Common Table Expression Recursion section of WITH (Common Table Expressions) in documentation.

It's is now possible to use an alias for the new row and optionally columns with INSERT ... ON DUPLICATE KEY UPDATE statement following the VALUES or SET clause with the use of AS keyword. I guess this is the first step to removing VALUES function (see above) for better standards support. You could find more in the article for the ON DUPLICATE KEY UPDATE Statement.

Datatype changes

The TIMESTAMP and DATETIME types could now include timezone information as offset (e.g. +02:00 or -08:00) appended to the value. The details are in the article The DATE, DATETIME, and TIMESTAMP Types in documentation.

The YEAR(4) data type (i.e. with explicit display width) and the (undocumented) UNSIGNED attribute for YEAR are now deprecated and would be removed in future. This follows other deprecations of display widths and other type attributes from 8.0.17 for which I wrote before, so developers should check and correct any SQL scripts using deprecated types and attributes. With this release statements that print data type definitions (e.g. SHOW CREATE TABLE) no longer show display width for integers with exception for TINYINT(1) and ZEROFILL attribute.

Other changes in server

User-defined functions (UDFs) can now determine character set and collation of string argument as I noticed earlier. You could find more about this in the article A Tale of UDFs with Character Sets.

There are now new standard INFORMATION_SCHEMA views related to roles like ADMINISTRABLE_ROLE_AUTHORIZATION, APPLICABLE_ROLES, ENABLED_ROLES, ROLE_COLUMN_GRANTS, ROLE_ROUTINE_GRANTS and ROLE_TABLE_GRANTS.

InnoDB engine now supports efficient sampling of data for histogram statistics. Sampling no longer requires full table scan, which could be costly for large tables, but samples data pages from the clustered index instead. I should try this on first opportunity.

Function JSON_SCHEMA_VALID now provides detailed information about the reasons for validation failures by using SHOW WARNINGS.

Other notable changes is the possibility to lock user accounts after FAILED_LOGIN_ATTEMPTS login attempts for PASSWORD_LOCK_TIME days (see Failed-Login Tracking and Temporary Account Locking), compression in X protocol (see Connection Compression with X Plugin) and many more as you could find from the release notes.

Bugs fixed in server

The previously reported problem with EXPLAIN ANALYZE not executing and reporting on subqueries in the select list has been fixed (see bug 97296).

Workbench

As I read in the release notes of the new version that "The following MySQL server language features are now supported: value references in INSERT statements, new options (TLS version, cipher suite, compression, and privilege check) in CHANGE MASTER TO replication statements, random passwords in CREATE USER and ALTER USER statements, EXPLAIN ANALYZE, and binary collation names.", I was hopeful that the application is finally catching up with the server, but that was until I reviewed the new SQL syntax (see above).

Two of the bugs I reported previously were fixed - 97416 (for EXPLAIN ANALYZE) and 97281 (for ANALYZE TABLE ... UPDATE HISTOGRAM), but I opened two new for TABLE (see bug 98263) and VALUES (see bug 98266) statements. Such problems should be obvious, so it still strange to me how they continue appearing.

There is a new menu item "Copy Row (with names, tab separated)" for copying headers and rows of queried data with tabs as separators that makes it easy to move results into Excel or LibreOffice Calc.

Happy using and experimenting with the new MySQL releases!

2020-01-09

Using xrdp on various Linuxes and a problem on Fedora

I've been using various Linuxes in virtual machines since more than 15 years in order to keep my skills sharp. It's simply not enough to know just one Linux distribution, but you have to know at least the major ones (e.g. Debian, RPM or based on another package management system ones). So apart from Slackware, I also regularly use Debian, Ubuntu, Fedora, OpenSuSE, CentOS and more recently Oracle Linux (for Oracle XE). Over the years, I have also experimented with Gentoo, FreeBSD, OpenBSD, NetBSD and others.
What I immediately need in any of these is secure shell (SSH) of course, so I could type commands remotely. However, I also like to work and play with GUI software, so I also need remote graphics access. Over the years I used VNC and NoMachine, but I never liked any of these for various reasons. What I have always missed in Linux was an easy to use access like RDP with real session management and all the other goodies that is lightweight and robust (e.g. NoMachine also manages sessions, but encodes video of the desktop, which makes the picture blurred and frequently doesn't work on my Linux virtual machines after upgrades).
So, I discovered xrdp, which is "an open source RDP server". It's easy to install, configure and run on Debian, Ubuntu and OpenSuSE and it just works. The installation for Fedora requires some additional steps, which are described in README.Fedora by xrdp package's maintainer Bojan Smojver. By default, xrdp on Fedora works only with VNC sessions, but I already said I do not like it, so I prefer using Xorg like with my other Linuxes. I'll summarize the steps here:
  1. Install necessary packages: dnf install xrdp xorgrdp
  2. Edit /etc/xrdp/xrdp.ini and enable Xorg sessions;
  3. Enable access by anybody or just specific users, e.g. echo "allowed_users = anybody" >> /etc/X11/Xwrapper.config (see also bug 1450720);
  4. Enable port 3389 on the firewall, e.g. firewall-cmd --zone=dmz --add-port=3389/tcp
  5. Point your RDP client to the Linux machine and have fun.
Well, until yesterday. After upgrading my Fedora virtual machine to Fedora 31 the RDP connection stopped working with Xorg sessions although VNC sessions were working fine. The xrdp log at /var/log/xrdp-session.log was suggesting that the Xorg session is crashing just after it's started, so I was getting a blank screen.

[20200109-10:12:53] [INFO ] setpriv --no-new-privs Xorg :10 -auth .Xauthority -config xrdp/xorg.conf -noreset -nolisten tcp -logfile .xorgxrdp.%s.log
[20200109-10:12:53] [CORE ] waiting for window manager (pid 4902) to exit
[20200109-10:12:57] [CORE ] window manager (pid 4902) did exit, cleaning up session
[20200109-10:12:57] [INFO ] calling auth_stop_session and auth_end from pid 4877

Investigating the reason, I noticed the following in the journal (e.g. with journalctl -r):

Jan 09 10:12:55 fedora systemd[4881]: Failed to start GNOME Shell on X11.
Jan 09 10:12:55 fedora systemd[4881]: gnome-shell-x11.service: Failed with result 'protocol'.
Jan 09 10:12:55 fedora gnome-shell[5157]:   Current serial number in output stream:  359
Jan 09 10:12:55 fedora gnome-shell[5157]:   Serial number of failed request:  358
Jan 09 10:12:55 fedora gnome-shell[5157]:   Minor opcode of failed request:  7 (RRSetScreenSize)
Jan 09 10:12:55 fedora gnome-shell[5157]:   Major opcode of failed request:  139 (RANDR)
Jan 09 10:12:55 fedora gnome-shell[5157]: X Error of failed request:  BadMatch (invalid parameter attributes)
Jan 09 10:12:55 fedora xrdp-sesman[4870]: rdpRRScreenSetSize: not allowing resize
Jan 09 10:12:55 fedora xrdp-sesman[4870]: rdpRRScreenSetSize: width 1688 height 932 mmWidth 447 mmHeight 247
Jan 09 10:12:55 fedora gnome-shell[5157]: Xlib:  extension "DPMS" missing on display ":10.0".
Jan 09 10:12:55 fedora xrdp-sesman[4870]: rdpRRGetInfo:

So, gnome-shell was failing actually. Initially, I found bug 1575792, which seemed similar, but it wasn't resolved. Finally, the problem turned out to be the latest version of xorgxrdp 0.2.12 (see issue #156). I had to simply downgrade to xorgxrdp 0.2.11, by forcefully installing the RHEL 8's xorgxrdp package like this:

rpm --nodeps -i xorgxrdp-0.2.11-1.el8.x86_64.rpm

And things are again back to normal.

Update 2020-01-14: Of course it's good to eclude package xorgxrdp from upgrades (see Excluding Packages From Transactions) or simply echo "excludepkgs=xorgxrdp" >> /etc/dnf/dnf.conf as one line command.

Happy using your favorite Linux distro!

2019-12-25

Bugzilla in Bulgarian on GitHub

Exactly 14 years ago on this day I committed the localization of Bugzilla 2.18.1 in Mozilla-BG project repository on OpenFMI. I have considered using Bugzilla as bug tracker for my home server since the spring of 2005 and when I was reviewing the project I noticed that there is no Bulgarian localization, so I decided to contribute. The translation was done in the summer of 2005 and in the autumn I began searching for a way to publish it. Initially, I considered SourceForge, but finally decided to become part of Mozilla-BG project after I was invited by Ognyan Kulev.
The localization was moved to BitBucket in 2012 after OpenFMI was decommissioned, so the repository was converted from Subversion (SVN) to Mercurial (Hg). It was then when some problems with the SVN repository become evident, but I found no time to fix these until now. Well, it was about time to migrate the project to Git and publish it on GitHub and I accomplished it last week. The Bugzilla-BG project is now on GitHub with all its history since December 2005. And ever since you could try the localization at Sotirov-BG.Net Bugzilla.
The localization went a long way from CVS on my home server initially to SVN, then Hg and finally ended up in Git. My only project that went through this path :-)

Have fun!
Merry Christmas!

2019-12-05

Picture-in-picture mode for videos in Firefox 71

Firefox 71 arrived two days ago (see release notes) and I already absolutely love the new picture-in-picture mode for playing videos while you're browsing. It's a real deal for me, because this is the usual way I'm browsing today - having a video playing in the background and listening to it while surfing websites. The only drawback was that I had to switch to the video tab from time to time, when I'd like to actually see what's playing. This sometimes involved searching for the tab, because I usually have many tabs open in different groups (organized with Panorama View extension).
Now my online experience is much more pleasant, because I could just start a video, pop it up into a separate small window wherever on the screen I like and then continue surfing or do something else. Like this I could still keep an eye on the video even when I go out of Firefox. It's super useful even when I'm working on something, because I normally use many online resources. The feature is only available on Windows for now, but Mozilla promises to make it available for MacOS and Linux with the next release due in a month or so.
Thank you Mozilla for constantly making my online experience better!

2019-10-18

New features in MySQL 8.0.18

On Monday, Oracle released MySQL 8.0.18 (see also release notes) and unlike with previous releases there was lot of information about what would be included into this one. I personally attended several presentations and read some posts in social media. Like I already wrote the major new features in this release are hash joins and EXPLAIN ANALYZE, which aim to respectively improve the performance of some queries and give better insight about how optimizer plan compares to actual execution.

Hash join optimization

Before 8.0.18 the only type of join done by MySQL was nested loop (a loop within a loop, an inner loop within an outer one). And there were two algorithms implementing it:
  • Nested-Loop Join - it's the simplest where each row from the outer loop is passed to the inner loop for processing. The obvious drawback of this one is that the inner table needs to be read many times.
  • Block Nested Loop (BNL) - uses buffering of rows from the outer table to reduce the number of times the inner table is read. In this case instead of passing just one row to the inner loop multiple rows from the buffer could be passed at once. This greatly reduces the number of time the inner table needs to be read. Of course, this requires more memory (see join_buffer_size). This algorithm is used for range, index and ALL join types.
There is also Batched Key Access (BKA) algorithm, which uses buffering like BNL, but puts keys for the rows in the buffer and passes them in a batch to the database engine for index lookups. The keys are then used to fetch the rows from the inner table. This algorithm can be used when there is index access to the inner table.

Here come hash joins. The idea is to build a hash table for the values from the outer table, which normally is the smallest. Then, read the inner table and find matches in the hash table. Like this both tables could be read at best just once. Hash joins are best for large results sets where indexes cannot be used. Due to their nature they are useful only for equality joins. Since 8.0.18 MySQL would choose hash join before BNL for any query that uses equality join condition (e.g. ref or eq_ref join types) and uses no indexes. Let's try it out.

For the example, I added a new table called job_sal with salary ranges per job to the dept_emp schema and I also generated 1 million more employees. Now let's say you want to find all the employees whose salary is outside the range. I would come up with a query like this:

SELECT E.ename, E.sal, JS.sal_min, JS.sal_max
  FROM emp     E,

       job_sal JS
 WHERE E.job = JS.job

   AND E.sal NOT BETWEEN JS.sal_min AND JS.sal_max;


Which given the absence of indexes would be executed using BNL algorithm in MySQL prior to 8.0.18, but in the new release the same query would benefit from the hash join optimization as visualized by the following execution plan in tree format:

+----------------------------------------------------------------------------------------+
| EXPLAIN                                                                                |
+----------------------------------------------------------------------------------------+
| -> Filter: (E.sal not between JS.sal_min and JS.sal_max)  (cost=499211.71 rows=442901)
    -> Inner hash join (E.job = JS.job)  (cost=499211.71 rows=442901)
        -> Table scan on E  (cost=1962.39 rows=996514)
        -> Hash
            -> Table scan on JS  (cost=0.75 rows=5)
 |
+----------------------------------------------------------------------------------------+
1 row in set (0.0023 sec)


Visual Explain Plan
The plan reveals that the smaller table JS would be hashed and the join would be done using the hash table. Unfortunately, the hash join optimization is not visible in traditional explain plan, JSON and thus into MySQL Workbench. The visual explain plan (look right) would be thus misleading showing BNL as the join algorithm. This is understandable, because the hash join optimization is possible only with the new iterator executor, which actually generates the explain plan in TREE format. This executor is not able to explain some queries, so you may see only the message "not executable by iterator executor". However, I really hope this is improved in future releases, because the explain plan should be consistent between formats. I reported a feature request as bug 97280.

I tested the performance of the query for 1M employees and its execution time was 0.89 sec. Disabling the hash join optimization with NO_HASH_JOIN hint increased the execution time to 1.26 sec. Hash join for sure would be much more beneficial when outer table has more rows.

EXPLAIN ANALYZE

This new feature also comes on top of TREE explain plan and it represents kind of a profiling tool, because apart from information about how optimizer plans to execute the query (see above) there's also information from its actual execution. This information includes:
  • the time for returning the first row (in ms);
  • the time for returning all rows (in ms);
  • the number of rows read;
  • the number of loops.
It makes it possible to compare how optimizer estimations compare to the real execution of the query. Let's try it with the previous query. I have intentionally shortened the output.

+-------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                               |
+-------------------------------------------------------------------------------------------------------+
| -> Filter: (E.sal not between JS.sal_min and JS.sal_max)
       (cost=502072.82 rows=442901) (actual time=0.372..747.742 rows=915768 loops=1)
    -> Inner hash join (E.job = JS.job)
         (cost=502072.82 rows=442901) (actual time=0.355..575.011 rows=1000014 loops=1)
        -> Table scan on E  (cost=2534.62 rows=996514) (actual time=0.185..353.877 rows=1000014 loops=1)
        -> Hash
            -> Table scan on JS  (cost=0.75 rows=5) (actual time=0.133..0.144 rows=5 loops=1)
 |
+-------------------------------------------------------------------------------------------------------+
1 row in set (0.7754 sec)


If it looks familiar to you check PostgreSQL's EXPLAIN ANALYZE. Please, note that times are milliseconds, not seconds! I think it's easy to note that number of rows estimated by the optimizer for the scan of table E differs, because table's statistics are not accurate. I have added 1M rows to table emp, so it's necessary to increase the number of sampling pages (see innodb_stats_persistent_sample_pages) and run ANALYZE TABLE again. After doing it the optimizer estimation and actual number of rows match. However, the optimizer is also wrong about the rows involved into the upper level operations - the hash join and filter, but this cannot be fixed with index statistics. According to Norvald H. Ryeng (see his article MySQL EXPLAIN ANALYZE) both "estimated and actual numbers" of rows "are averages over all loop iterations", but I have just one loop for all the operations in the plan.

In any case EXPLAIN ANALYZE is a nice addition to the instrumentation of the optimizer. Unfortunately, neither TREE format nor EXPLAIN ANALYZE are available currently even in the latest MySQL Workbench, so another feature request from me as bug 97282.

MySQL is OpenSSL only

With this version the support for YaSSL and WolfSSL libraries are removed, so MySQL could be compiled only with OpenSSL. I have personally always built MySQL with OpenSSL (i.e. using option -DWITH_SSL=system), because this is what comes with Slackware, but of course the more important is that "the MySQL/OpenSSL combination is very well tested and production proven" as Georgi Kodinov explained in his post MySQL is OpenSSL-only now !.

Spatial

Function ST_Distance now accepts SRS arguments of all geometry types and not only arguments of types Point and Point, or Point and MultiPoint as before.
Happy using MySQL!

Update 2019-11-09: That timings are in milliseconds wasn't specified in the manual, so I filed bug 97492 as I also think that it would be good to print the unit next to timings.

2019-09-29

New usages of TREE explain plan in MySQL

As I wrote back in June (see TREE explain format in MySQL 8.0.16) a new execution plan format was introduced - TREE. It was a representation of the operations taken by the optimizer to satisfy a query in a nested form like a tree. The new format was defined experimental, so I was waiting to see how it would evolve in next versions. Well, last Friday as I was browsing the manual I noticed it's already updated for two new features coming in MySQL 8.0.18:
  • HASH joins would be visible only in TREE format. I find this rather strange, because you would expect to see this operation no matter the format of the plan, so I hope it's extended in future.
  • EXPLAIN ANALYZE would actually run the query and provide timing information about it's execution. I would also include additional iterator-based information about how the optimizer estimations compares to the actual execution.
I'm already eager to try the new version, which I presume should be released somewhere mid-October. Stay tuned as I'll write about these new features when they arrive and I'm able to play with them.

2019-07-31

New features in MySQL 8.0.17

Last Tuesday (July 27-th), Oracle released MySQL 8.0.7 following the three months release cycle introduced last year with 8 series. This is the next "maintenance" release introducing a few new features as well as depreciating some non-standard ones, so here are my picks starting with the ones I consider important for developers.

Multi-valued indexes

With the help of generated columns (introduced with MySQL 5.7.6) and functional indexes (introduced with MySQL 8.0.13 for which I wrote before) it become possible to index data in complex column values like JSON. But in JSON you could have scalars as well as arrays, so searching into arrays with the help of an index wasn't possible. Multi-valued indexes come to solve this by allowing multiple records in the index to point to the same data record. Such indexes are created like any other functional index and used automatically by the optimizer when possible. Let's see an example - a register for translators with their spoken languages.

CREATE TABLE translators (
  id INT AUTO_INCREMENT,
  jdata JSON,

  PRIMARY KEY(id)
);

  • Case 1 - array of strings
Let's create some data:

INSERT INTO translators (jdata)
VALUES ('{"name": "T1", "langs": ["English", "French", "Spanish"]}'),
       ('{"name": "T2", "langs": ["English", "Spanish"]}'),
       ('{"name": "T3", "langs": ["French", "Spanish"]}');


Then, let's query the translators that speak English using the new MEMBER OF operator:

SELECT id, jdata->>'$.name', jdata->'$.langs'
  FROM translators
 WHERE 'English' MEMBER OF (jdata->'$.langs');


Or the new function JSON_OVERLAPS like this:

SELECT id, jdata->>'$.name', jdata->'$.langs'
  FROM translators
 WHERE JSON_OVERLAPS(jdata->'$.langs', '["English"]');


Both queries are leading to the same result:

+----+------------------+----------------------------------+
| id | jdata->>'$.name' | jdata->'$.langs'                 |
+----+------------------+----------------------------------+
|  1 | T1               | ["English", "French", "Spanish"] |
|  2 | T2               | ["English", "Spanish"]           |
+----+------------------+----------------------------------+
2 rows in set (0.00 sec)


Given the data as expected the queries return T1 and T2, but not T3. However, these queries are doing a full table scan, so their performance would degrade with the accumulation of data in the table.

Execution plan without index

Fortunately, it's now possible to add a multi-valued index like this:

ALTER TABLE translators
  ADD INDEX idx_langs_arr ((CAST(jdata->'$.langs' AS CHAR(8) ARRAY)));

It's a functional index into which it's necessary to use function CAST with the new ARRAY keyword. With the index the execution plan of the SELECT queries above become respectively:
Execution plan of MEMBER OF with indexExecution plan of JSON OVERLAPS with index

  • Case 2 - array of objects
It's slightly different for arrays of objects, but only for the JSONPath expression. Let's create some data (after cleaning Case 1):

INSERT INTO translators (jdata)
VALUES ('{"name": "T1", "langs": [{"lang": "English"}, {"lang": "French"}, {"lang": "Spanish"}]}'),
       ('{"name": "T2", "langs": [{"lang": "English"}, {"lang": "Spanish"}]}'),
       ('{"name": "T3", "langs": [{"lang": "French"}, {"lang": "Spanish"}]}');


Then, let's query the translators that speak English the same two ways:

SELECT id, jdata->>'$.name', jdata->'$.langs[*].lang'
  FROM translators
 WHERE 'English' MEMBER OF (jdata->'$.langs[*].lang');


SELECT id, jdata->>'$.name', jdata->'$.langs[*].lang'
  FROM translators
 WHERE JSON_OVERLAPS(jdata->'$.langs[*].lang', '["English"]');


Just like in the first case the queries use full table scan, which could now easily be changed by adding a multi-valued index like this:

ALTER TABLE translators
  ADD INDEX idx_langs_obj ((CAST(jdata->'$.langs[*].lang' AS CHAR(8) ARRAY)));


Notice the slightly different syntax of the JSONPath expression. For the index to work it's of course necessary to use the same expression in WHERE clause as in the index definition. Developers that prefer to keep data directly into JSON columns should be happy by this new feature, as it makes it possible to index not only scalar variables, but also arrays.

JSON

Apart from the already mentioned new standard operator MEMBER OF for searching values in JSON arrays there are three new functions JSON_OVERLAPS, JSON_SCHEMA_VALID and JSON_SCHEMA_VALIDATION_REPORT.
  • Function JSON_OVERLAPS compares two JSON documents and returns true "if the two document have any key-value pairs or array elements in common". Like MEMBER OF and JSON_CONTAINS function JSON_OVERLAPS may benefit from multi-value indexes.
  • Functions JSON_SCHEMA_VALID and JSON_SCHEMA_VALIDATION_REPORT are added in regards to JSON Schema support. The first one validates a JSON document against a JSON schema and returns true if valid otherwise false, so its usable as a CHECK Constraint. The second one would provide insights into validation errors in the form of JSON document.

Cast to FLOAT, DOUBLE and REAL

Functions CAST and CONVERT now could cast to floating point data types such as FLOAT, DOUBLE and REAL. Let's try it with an incorrect value:

SELECT CAST('1.23.34' AS FLOAT) cast_res, CONVERT('1.23.34', FLOAT) conv_res;

+--------------------+--------------------+
| cast_res           | conv_res           |
+--------------------+--------------------+
| 1.2300000190734863 | 1.2300000190734863 |
+--------------------+--------------------+
1 row in set, 2 warnings (0.00 sec)

There are two warnings, so let's see them with show warnings:

+---------+------+---------------------------------------------+
| Level   | Code | Message                                     |
+---------+------+---------------------------------------------+
| Warning | 1292 | Truncated incorrect DOUBLE value: '1.23.34' |
| Warning | 1292 | Truncated incorrect DOUBLE value: '1.23.34' |
+---------+------+---------------------------------------------+
2 rows in set (0.00 sec)

Casting to DOUBLE and REAL produced different result and the same warnings:

SELECT CAST('1.23.34' AS DOUBLE) cast_res, CONVERT('1.23.34', DOUBLE) conv_res;
SELECT CAST('1.23.34' AS REAL) cast_res, CONVERT('1.23.34', REAL) conv_res;


+----------+----------+
| cast_res | conv_res |
+----------+----------+
|     1.23 |     1.23 |
+----------+----------+
1 row in set, 2 warnings (0.00 sec)

CLONE command

For many years MySQL administrators had to dump masters, transfer the dump over network and load it into replicas in order to initialize their state (see Copy a Database from one Server to Another). I'm not that much into database administration, but it was cumbersome and annoying procedure especially in case of unrecoverable replication errors where I had to re-initialize the state of replicas, so I considered it a burden. Why I could not just "clone" the master after wiping the replica? Well, the new version makes it possible to easily create new instances or re-initialize existing by implementing native provisioning into the server with the CLONE command. It's made possible by the new MySQL Clone Plugin. You could find more about this by reading the following articles:
It's the focus of this release and I'm sure it's going to be a game changer for organizations using extensively MySQL replication.

Deprecations

This is a list of features that are depreciated with this release and would be removed in future releases:
  • Function FOUND_ROWS and query modifier SQL_CALC_FOUND_ROWS. Documentation suggests using COUNT(*) to find the number of rows.
  • Numeric data type attributes:
    • Integer display width. There is now warning "1681 Integer display width is deprecated and will be removed in a future release." if you try to create tables with INT(11) for example. I personally have such definitions as provided by mysqldump and MySQL Workbench, so I'll have to remove them all, before the statements are rejected with syntax error in future.
    • The ZEROFILL attribute. I never used it anyway.
    • The UNSIGNED attribute for FLOAT, DOUBLE, and DECIMAL data types. I haver used it as well.
    • AUTO_INCREMENT support for FLOAT and DOUBLE data types. Anyone?
  • The syntax  FLOAT(M,D) and DOUBLE(M,D) for specifying number of digits for floating point types.
  • Logical operators && (double ampersand), whish is synonym for AND, || (double pipe), which is synonym for OR and ! (exclamation mark), which is synonym for NOT. I'm not sure I ever used any of these even in daily queries, because I find using ANDOR and NOT a lot more expressive.
  • BINARY keyword for specifying _bin collations. I never used this as well.
Developers should consider getting rid of these nonstandard features to avoid unpleasant surprises in future. I'll personally review my projects and correct as soon as possible.

Bug fixes

In regards to CHECK constraints I found two bugs in the prevoius release and one of them was fixed (see bug #95189 CHECK constraint comparing columns is not always enforced with UPDATE queries). For the full example please, see bug_95189_test_case.sql script, so let's just verify it:

SET binlog_format = 'STATEMENT';
SET binlog_row_image = 'minimal';

UPDATE tst SET end_date = '2019-04-20' WHERE id = 1;
/* Error Code: 3819. Check constraint 'chk_dat' is violated. */

So it's fixed, because as expected the UPDATE query fails even with the specific binlog settings. The other one (see bug #95192 CHECK constraint comparing column with default value is not enforced) would have to wait.

That wraps up my review of new features in MySQL 8.0.17 maintenance release. I hope new releases bring more features for developers.

2019-06-05

TREE explain format in MySQL 8.0.16

I have omitted an interesting new feature in latest MySQL release 8.0.16, which I covered last month. It is mentioned in the release notes (see Functionality Added or Changed and scroll down or search for TREE) and also in the documentation (see Obtaining Execution Plan Information). As it turns out new TREE format for the explain plan was added.

Until now there were two options for format of the explain plans generated with EXPLAIN statement:
  • TRADITIONAL - the old tabular explain plan suitable for console;
  • JSON - the explain plan in JSON format used by MySQL Workbench to display nice graphics in Visual Explain, which I use in presentations.
Whoever is familiar with the traditional tabular output knows that it's not indented and sometimes it's hard to understand the order of operations. Now there is the option to use TREE format, which prints the "generated iterator tree" with the operations indented to right similarly to Oracle execution plans. This should help users to understand better how query execution is actually set up, but it's still experimental, so both the syntax and output may change in future.

Let's take the following two queries that generate the same result - minimal, average and maximum salaries for the departments in the organization based on the DEPT and EMP schema (see my adaptation for MySQL and EMP and DEPT script). The second query is possible only with MySQL 8.0.14 and later (see New development features in MySQL 8.0.14).

Query 1: Using LEFT OUTER JOIN and derived table

SELECT D.dname, DT.min_sal, DT.avg_sal, DT.max_sal
  FROM dept D
       LEFT OUTER JOIN
       (SELECT E.deptno,
               MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal
          FROM emp E
         GROUP BY E.deptno
       ) AS DT
       ON DT.deptno = D.deptno;

Query 2: Using LATERAL derived table

SELECT D.dname, LDT.min_sal, LDT.avg_sal, LDT.max_sal
  FROM dept D,
       LATERAL
       (SELECT MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal
          FROM emp E
         WHERE E.deptno = D.deptno
       ) AS LDT;


The tabular execution plans of the two queries (with columns partitions, possible_keys and filtered removed to shorten the output) and the extended output are as follows:

Query 1

+----+-------------+------------++-------++-------------+---------+-------------------+------++-------+
| id | select_type | table      || type  || key         | key_len | ref               | rows || Extra |
+----+-------------+------------++-------++-------------+---------+-------------------+------++-------+
|  1 | PRIMARY     | D          || ALL   || NULL        | NULL    | NULL              |    4 || NULL  |
|  1 | PRIMARY     | <derived2> || ref   || <auto_key0> | 5       | dept_emp.D.deptno |    2 || NULL  |
|  2 | DERIVED     | E          || index || fk_deptno   | 5       | NULL              |   14 || NULL  |
+----+-------------+------------++-------++-------------+---------+-------------------+------++-------+
3 rows in set, 1 warning (0.01 sec)


+-------+------+-----------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                             |
+-------+------+-----------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `dept_emp`.`D`.`dname` AS `dname`... left join (
                 /* select#2 */ select `dept_emp`.`E`.`deptno` AS `deptno`,min(`dept_emp`.`E`.`sal`) AS `min_sal`... |
+-------+------+-----------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

Query 2

+----+-------------------+------------++------++-----------+---------+-------------------+------++----------------------------+
| id | select_type       | table      || type || key       | key_len | ref               | rows || Extra                      |
+----+-------------------+------------++------++-----------+---------+-------------------+------++----------------------------+
|  1 | PRIMARY           | D          || ALL  || NULL      | NULL    | NULL              |    4 || Rematerialize (<derived2>) |
|  1 | PRIMARY           | <derived2> || ALL  || NULL      | NULL    | NULL              |    2 || NULL                       |
|  2 | DEPENDENT DERIVED | E          || ref  || fk_deptno | 5       | dept_emp.D.deptno |    1 || NULL                       |
+----+-------------------+------------++------++-----------+---------+-------------------+------++----------------------------+
3 rows in set, 2 warnings (0.0006 sec)

+-------+------+-----------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                             |
+-------+------+-----------------------------------------------------------------------------------------------------+
| Note  | 1276 | Field or reference 'dept_emp.D.deptno' of SELECT #2 was resolved in SELECT #1                       |
| Note  | 1003 | /* select#1 */ select `dept_emp`.`D`.`dname` AS `dname`, ... join lateral (
                 /* select#2 */ select min(`dept_emp`.`E`.`sal`) AS `min_sal`...                                     |
+-------+------+-----------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)


That's not nice, right? The additional information printed with SHOW WARNINGS only marks the involved queries, so you could more easily identify them in the plan (corresponding to first column id). For the second query there is also a note about how the reference from the second query was resolved in the first query.

Now, here is how the new execution plan outputs look like:

Query 1

+-----------------------------------------------------------------+
| EXPLAIN                                                         |
+-----------------------------------------------------------------+
| -> Nested loop left join
    -> Table scan on D
    -> Index lookup on DT using <auto_key0> (deptno=d.deptno)
        -> Materialize
            -> Group aggregate: min(e.sal), avg(e.sal), max(e.sal)
                -> Index scan on E using fk_deptno                |
+-----------------------------------------------------------------+
1 row in set (0.0025 sec)

Query 2

+----------------------------------------------------------------------+
| EXPLAIN                                                              |
+----------------------------------------------------------------------+
| -> Nested loop inner join
    -> Invalidate materialized tables (row from D)
        -> Table scan on D
    -> Table scan on LDT
        -> Materialize (invalidate on row from D)
            -> Aggregate: min(e.sal), avg(e.sal), max(e.sal)
                -> Index lookup on E using fk_deptno (deptno=d.deptno) |
+----------------------------------------------------------------------+
1 row in set, 1 warning (0.0007 sec)
Note (code 1276): Field or reference 'dept_emp.D.deptno' of SELECT #2 was resolved in SELECT #1


That's nice, right? You see all the operations that would be executed and the indentation helps understand how they're nested or in other words which is executed before others. I will have to see how this new format would evolve, but even now I think it would be quite useful for complex queries, so it's very good they added it.

Thank you MySQL developers!

2019-05-24

Translations of Bugzilla 5.0.5 and 5.0.6

A few days ago I accidentally noticed that there are new versions of Bugzilla namely 5.0.5 (from 2019-01-30) and 5.0.6 (from 2019-02-09), which apparently appeared silently in the beginning of the year. Silently, because at leas I haven't received any notification and apparently I haven't logged in the web interfaces of my installation quite a while to notice the notification on the main page. I would have to check with the developers abut the notifications by e-mail, because they may have figured out something new. You could download the last version from the official page, but I highly recommend you to upgrade with Git - it's lot more easier and faster, especially if you have additional extensions, skins and other changes (see Upgrade below).

About the translation

Anyway, today I managed to update the translation. The changes were not much, because these are pure maintenance releases without significant new functionalities, which to come with new templates. Everything is already pushed to the repository, which I want to reorganized and eventually migrate to Git, but I still do not find the time. Generally the project seems not to develop well in the past 2-3 years, so I do not know it worth it. I'm leaving it as a task eventually for the summar.

Upgrade

This morning, after I prepared the translations, I managed to upgrade my installation, which was absolutely flawless following the instructions for direct upgrade from Git. With 5.0.5 the table bugs_fulltext was converted from MyISAM to InnoDB, and with 5.0.6 the type of the columns type_id in the tables flaginclusions, flagexclusions and flags was changed from  smallint to mediumint (see Integer types), which show allow for many more flagtypes. I got delayed a little only because I had to cleanup some local changes and hide (with .gitignore) some files (logs, patches, archives with translations, and etc.). That reminded me to issue a pull request for one of my local changes - adding of viewport meta tag, which is important nowadays for support of mobile devices.

Download the translations

You could download the translations form the usual places by FTP or HTTPS or download directly  version 5.0.6 and upgrade the translation by just de-archive in the base directory.

Happy using! And of course happy Bulgarian Education and Culture and Slavonic Literature Day (a.k.a Saints Cyril and Methodius' Day)!