Georgi Sotirov's Blog

My personal and professional life

2019-04-30

New features in MySQL 8.0.16 ecosystem

Last Thursday (2019-04-25), just before the Orthodox Easter holidays Oracle released version 8.0.16 of the products in the MySQL ecosystem (including server, router, shell, connectors and workbench GUI). The new versions of the server continued the tradition established with MySQL 8 of introducing new features even though they're still labeled "Maintenance Release". On the last Pre-FOSDEM MySQL Day Giuseppe Maxia suggested in a tweet that semantic versioning should be used and I stay behind this as I understand and support semantic versioning.

This would be really helpful, because version 8.0.15 of the server was as a true maintenance release fixing only an important group replication issue. Following the new numbering scheme (see MySQL 8.0: It Goes to 11!) all other products from the ecosystem were bumped to 8.0.15 even without any changes whatsoever. I'm really not sure this is necessary, because as soon as you're using the same major (and minor) versions then there shouldn't be a problem with compatibility. And the same version numbers doesn't really ensure full compatibility between the products as my bugs on MySQL Workbench (e.g. 90620, 92900, 92908 and 94012) show.

MySQL Server

So, what's new in the new release of the server. Here are my picks.

CHECK constraints

Personally, I was waiting for this feature since MySQL 4 when they started adding major new features into the server and it started becoming more like other more advanced databases. Bug #3465 was opened in April 2004 (and was there promised to be fixed in MySQL 5.1), so this SQL feature is finally implemented after more than 15 years.

Until now (as you may have noticed) the CHECK keyword in CREATE TABLE was parsed, but silently ignored. We used workarounds in the past. For example one could ensure only nonnegative values in column with numeric type (integers and fixed/floating point types) by using the nonstandard attribute UNSIGNED (see Numeric Type Attributes). More complex constraints could be implemented through the use of triggers and raising conditions (see SIGNAL syntax) as I've shown in my presentation MySQL 8 for developers on BGOUG Spring 2018 Conference and these would still be required.

MySQL 8.0.16 now supports both table constraints and column constraints (see CHECK constraints) for all storage engines. Table constraints appear anywhere in CREATE TABLE outside column definitions and could refer to one or more columns even with forward references (i.e. for columns defined later in the statement). Column constraints appear within the definition of a column and could refer only to this column. Unfortunately, the expression that specifies the check condition could use only literals, deterministic built-in functions, and operators (i.e. stored routines, UDFs, variables and subqueries are not permitted). It is also not possible to use CHECK constraints on columns with foreign key referential actions (i.e. ON UPDATE, ON DELETE).

The thing to consider with CHECK constraints is that error is raised only if the condition evaluates to FALSE. If it evaluates to UNKNOWN due to NULL values error won't be raised. This is covered very well by Markus Winand in his article The Three-Valued Logic of SQL, so I highly recommended you to read the whole article.

CHECK constraints have a name of up to 64 characters and if not specified the server would generate a name like [table name]_chk_[ordinal number], because the names must be unique per schema. A useful feature is the possibility to create, but not enforce the constraint (i.e. with NOT ENFORCED clause), which is like enable/disable for triggers that is a feature I'd like to see implemented in future releases.

For basic usage and simple examples of CHECK constraints in MySQL, please check the manual and MySQL 8.0.16 Introducing CHECK constraint blog post. There's a more interesting example of CHECK constraint on JSON data in the article MySQL 8.0.16: how to validate JSON values in NoSQL with check constraint.

Here's my first example. I think it's quite common to have two dates in a table (e.g. start and end date) that should represent the beginning and end of something, so they should be in chronological order. Let's try to do this on the emp table from the example DEPT and EMP schema of Oracle, which I adapted for MySQL.

ALTER TABLE emp
  ADD COLUMN retdate DATE AFTER hiredate,
  ADD CONSTRAINT ret_after_hire CHECK (
retdate > hiredate);

However, when I tried to update an employee row that already has value for hire date with the following query (the retirement date presumably coming from wrong user input and/or bad application):

UPDATE emp
   SET retdate = STR_TO_DATE('1019-04-25', '%Y-%m-%d')
 WHERE empno = 7369;

1 row(s) affected Rows matched: 1  Changed: 1  Warnings: 0

the query unexpectedly succeed and there was no "constraint violated" error. After discussing with Frédéric Deschamps on Slack I filed bug #95189. It seems to be related to replication, because on my master MySQL 8.0.16 server on windows the problem isn't reproducible and the query works as expected, but the test I've done was on my MySQL 8.0.16 server running on Slackware64 -current, which is replicating from my main MySQL 5.7.26 server.

Here's another example. A table constraint that ensures hire date is today or in future and salary is positive number.

ALTER TABLE emp
  ADD CONSTRAINT emp_chks CHECK (hiredate >= CURDATE() AND sal > 0);

Such a constraint may seem reasonable (if you consider just inserting data), but of course results in:

Error Code: 3814. An expression of a check constraint 'emp_chks' contains disallowed function: curdate.

because CURDATE is non-deterministic function. Such constraint is possible in PostgreSQL (which also supports usage of stored routines and UDFs), but not in Oracle (that supports usage of storage routines) and MariaDB (which pretty much has the same restrictions as MySQL although not clearly listed). The problems are how do you validate the constraint for existing rows and how do you update rows, because the value of CURDATE changes every day. The solution is to create an additional column preserving the current date when the row was created and then use it to validate the constraint.

If you try to directly create the column and enforce the constraint (i.e. with an ALTER TABLE) you'll of course get:

Error Code: 3819. Check constraint 'emp_chks' is violated.

because check constraints, like other constraints (e.g. primary and foreign keys, unique indexes, NOT NULL), are validated for all the rows when created and should always remain valid. So, let's try another way.

First, initialize the new column from existing data (e.g. based on hiredate column as expressions are possible as default values since 8.0.13):

ALTER TABLE emp
  ADD COLUMN created DATE DEFAULT (hiredate);

Then, modify the column and add the check constraint:

ALTER TABLE emp
  MODIFY COLUMN created DATE NOT NULL DEFAULT (CURDATE()),
  ADD CONSTRAINT emp_chks CHECK (hiredate >= created AND sal > 0);

Now, let's try some queries:

UPDATE emp SET hiredate = '1979-04-25' WHERE empno = 7369;

INSERT INTO emp
  (empno, ename, job, mgr, hiredate, sal)
VALUES
  (9999, 'MULDER', 'INVESTIG.', 7839, '2019-04-25', 4242);

As expected, the UPDATE query produces error:

Error Code: 3819. Check constraint 'emp_chks' is violated.

but the INSERT query succeeds, which is unexpected. Apparently, the value of created is not initialized with the default value when the CHECK constraint is validated. I tried the same (although slightly different syntax) on Oracle XE 18 and it worked as expected - both the UPDATE and INSERT queries violated the CHECK constraint.

There is a new table CHECK_CONSTRAINTS in INFORMATION_SCHEMA that provides information on defined check constraints in all sachems. Additional information about table name and whether a CHECK constraint is enforced or not could be retrieved from table TABLE_CONSTRAINTS by filtering the data by the new value CHECK of column CONSTRAINT_TYPE.

Spatial data

After ST_Distance since 8.0.14, now ST_Length function also supports optional second argument unit, so it's possible to calculate lengths in the different supported units as defined in INFORMATION_SCHEMA.ST_UNITS_OF_MEASURE table.

Server does all upgrades

The server is already capable of upgrading everything - the mysql schema, data dictionary and system tables as well as PERFORMANCE_SCHEMA, INFORMATION_SCHEMA, sys and user schemes if necessary (see What the MySQL Upgrade Process Upgrades), so mysql_upgrade command is going into retirement. It's an important administrative feature, because this would make upgrades easier and more convenient. In the past I regularly forgot to execute the command, which led to strange problems later.

In regard to upgrades I had this strange problem that the server wouldn't start with the error message [ERROR] [MY-013384] [Server] Could not create server upgrade info file at '/var/lib/mysql/data/' although the permissions were OK. I was able to find the gist mysql will not start after 8.0.15 to 8.0.16 update on Ubuntu 16.04 LTS with Google and after creating the file mysql_upgrade_info in /var/lib/mysql/data/ and changing its ownership to mysql:mysql the server was able to start successfully. I was upgrading from 8.0.14 in fact, but I think it may be a bug. Perhaps the server expects the file to have been created.

Another interesting new feature in the option --validate-config for validation of server configuration much like system administrators are used to do so with other servers (e.g. Apache). It's really important especially for production environments where unexpected downtime could be extremely unpleasant. Read more in Server Configuration Validation.

System users

MySQL accounts are now categorized and thus distinguished by system (who posses the SYSTEM_USER privilege) and regular users (who does not). This allows for better separation of roles as only system users could perform certain administrative operations on system accounts as previously any user having the appropriate privileges could for example delete any account or kill connections belonging to any user. It's also possible to revoke global privileges partially for certain schemas (i.e. like an exceptions) through the new system variable partial_revokes, which previously required granting privileges separately for each existing schemes and adding grants for each new schema.

Security

The new things are support for TLS 1.3, possibility to update SSL certificates without restarting the running server (see ALTER INSTANCE RELOAD TLS) and information about the certificates in PERFORMANCE_SCHEMA's keyring_keys table).

MySQL Router

The router now has an HTTP component that allows it to expose simple web-interface and REST APIs for the purpose to ensure better observability through integration with external tools for monitoring and management. Other notable changes include dynamic change between single and multi master modes and improved logging. There is now the WITH_ROUTER CMake option to build router together with the MySQL server, which defaults to ON and which I decided to switch OFF, because I plan to continue building router as a separate package.

MySQL Shell

The shell comes with the new Shell Reporting Framework (see Reporting with MySQL Shell), which allows registration, displaying and monitoring of custom reports. I'm eager to try it and I'll write a separate post on this later. See the article MySQL Shell 8.0.16: User Defined Reports by Jesper Wisborg Krogh. It's now also possible to execute SQL without changing the mode, as the \sql command now doesn't change the mode if provided with an SQL statement, which I think is pretty useful.

MySQL Workbench

MySQL Workbench now supports window functions in SELECT (see bug #90620), expressions in DEFAULT (see bug #92900) and as key parts (see bug #92908) as well as the LATERAL keyword (see bug #94012) - all of these implemented into previous 8.0.x versions of the server. Unfortunately, I do not see any support for CHECK constraints (see bug #95143 that was already verified), so again the same version numbers mean nothing.

2019-03-30

Finishing with Dir.bg's mail

From the beginning of last year I had problems with the mail of Dir.bg, which I'm using from 20 or so years. The service was unavailable by IMAP and I was loosing messages. After I didn't mange to get in touch with support I wrote in Linux-bg.org's forums with the hope somebody to know something about. Leaving aside the needless and pompous comments, one of the users responded and my problem was solved (I supposed it was an administrator). Since then there were several short term outages of the service, but last week I noticed that the mail is again not available most of the time. Since yesterday unfortunately again I do not have any access neither through IMAP nor through the web interface.
Now this forces me to think seriously for migrating completely to Gmail. This includes:
  • copying about 1 GB mails, which shouldn't be a problem considering the capacity of today's connections as long as the service becomes available again to which I strongly hope, because I have some important messages.
  • redirection of N in count services to my mail address in Gmail. This would be a tough task, because I'm not even sure I'll be able to enumerate all although I keep the mails with registration confirmation, but for now I'm considering keeping the address in Dir.bg until I transfer everything. I have already started, because I didn't receive the mails from BGOUG.
  • updating web pages in Internet, source code and others.
I'm really not sure why it happened like this and I'm sorry, because the service was useful and comfortable. I used it without problems quite some time (even since Gmail did not existed and there was Yahoo). Apparently however recently the service is not supported well and I think they have problems with spam and other attacks.

That's it. I maybe post something short, when I'm done with the migration.

2019-02-10

First migrations from CVS to Git

Last May I had the good intention to start gradually migrating my old CVS repositories to Git, but I only managed to begin the process this weekend. Before plunging deeper I decided to migrate some smaller repositories with linear history, so I could exercise, verify and master the process. As a result some small projects already landed on my GitHub profile (see guthub.com/gdsotirov). I have read much about migrating from CVS and SVN to Mercurial and Git in relation to an internal project in the company I'm working for. I have also already experimented migrating from CVS to SVN in the past and from CVS to Hg/Git more recently, so I considered myself prepared, but still there are some things to consider when doing such migration.

Repository preparation

Revision log messages. You may have a convention for writing revision log messages in CVS, but I did not have one for my personal projects, so I have written comments on single line and on multiple lines as unnumbered list like this:

* Add this
* Modify that

I had to modify these comments in CVS, before the migration, because these simply do not look good in TortoiseGit and GitHub. I also aligned some similar comments on files committed in few minutes time span, so that the migration could group them in a single revision. Depending on whether you had a convention for the format of the revision log messages in CVS you may need to make other adjustments. The easiest way in such case may be to modify directly the revision control files with perl, sed, awk, etc. while of course paying caution not to ruin your history (i.e. such operations should not be performed on "live" production repository).

Authors. You should be able to identify all your authors, because in CVS repositories they are normally UNIX login names and the users may no longer exist in the system. I've seen even cases where the same username was reused by different developers having the same given and family names. It's hard to identify and fix such revisions unless you're aware of developers' internship in the company, but what if their time in the company overlaps?
Hopefully, I do not have this problem with my personal repositories, but still I had to rewrite my username as full name and email address (see more about this in chapter Step One: The Author Map of Eric Raymond's DVCS migration HOWTO). It's very important to rewrite the authors, because otherwise the commits won't be properly identified in GitHub (see Contributions that are counted). If you need to fix usernames it's preferable to do it in the CVS repository before the conversion. And the easiest way to map usernames to authors is to rewrite the converted Git repository as I write about below.

Tools

In such a process one of the most important things is to chose the right tools, because you simply cannot go without them. Last year at work we evaluated several options and finally choose cvs2git, which is part of cvs2svn project. The command is written in Python and supports various options. For me it is important to make the best possible migration, so I had to make the following modifications in the example options file:

KeywordHandlingPropertySetter('expanded'),

from cvs2svn_lib.keyword_expander import _KeywordExpander
# Ensure dates are expanded as YYYY/MM/DD as in CVS 1.11
_KeywordExpander.use_old_date_format()


The first line ensures that CVS keywords (e.g. $Id$, $Date$, etc.) are expanded in text files (see topic cvs2svn changes 'date' string in source codes). The following three lines ensure that dates in keywords are expanded as in CVS 1.11 or in YYYY/DD/MM format, instead of the default format YYYY-MM-DD as in CVS 1.12 and SVN.

Migration

The way to covert a CVS repository to Git is described in cvs2git's documentation (see chapter Usage), so just read it. I wrote myself a shell script to execute the different steps defined there. Initially I used cvs2git only with command line options, but because of my requirements for keywords expansion (see above), I had to improve the script to generate options file and use it instead. Like this I retained the way I call the script by just providing path to the CVS repository.

For now I haven't had problems with the migration - the history was represented properly in Git. However, with many tags and branches over different files things could get really messy as I found out with the internal project at work. After the migration I check the result and if necessary fix more revision log messages. Then, I just rewrite authors with the script given in Changing author info and finally import the repository to GitHub (e.g. I use git daemon --verbose --export-all --base-path="/path/to/repo" to serve the converted repository). After the import to GitHub I have to add LICENSE and README.md files, but this is quite trivial directly through the web interface.

And that's all. I have more repositories to migrate and I really hope I'll be able to finish the task until the end of the year, but it would all depends on how quickly I'll be able to prepare the repositories, because after this it's the jobs of the tools.

2019-01-31

My MySQL Bugs

Since I started using MySQL 3.23.x during my university years I was trying to solve all my problems myself like with all the other free and open source software I was using. At first it was more about learning (or lack of knowledge if you want), but then the problems become more complex and harder to solve. Well, I was successful most of the times, so I didn't consider reporting bugs or asking for support. However MySQL ecosystem grew large since then, the software become lot more complex and the source code base increased significantly, so nowadays it's native to think one could manage to fix all the problems he encounters without going deep in product's internals or relaying on professional support. The first is possible only for MySQL developers well familiar with the code base and large organizations like Facebook, Booking, etc., which have dedicated teams and regularly contribute patches. The second depends on you and of course the company providing the support.

I haven't reported much bugs for MySQL - just 21, but I now consider reporting everything I found which seems like a bug, because this is the right thing to do. I may as well solve the problem myself, but someone else could be stuck by the same and should at least be aware that it's a known (and eventually already resolved) problem. With each one of these I learned a little something, so here below I make a review of my bugs so far describing their fate in chronological order.
  1. Bug 20098 (my first one) was reported on 2006-05-26 and is with status "No Feedback" as of 2006-07-01. It was about Query browser searching for a file that is not present (preferences.glade). I have forgot about this one and found out it was closed later, because no feedback was provided, which is fun as I have provided the requested information...
  2. Bug 69459 was reported 2013-06-13 and is in status "Can't repeat" as of 2013-12-24. It was about MySQL Workbench crashing on entering dot in a SQL string when automatic code completion is enabled. I reported the bug for version 5.2.47 CE, but I was suggested to try 6.0.7, which worked, so the bug didn't receive more attention since 5.2 series didn't receive more updates.
  3. Bug 73076 (my first confirmed and fixed one) was reported on 2014-06-22 and is with status "Closed" as of 2014-08-26. It was about MySQL Workbench actually committing modifications in a record set when you only apply them. It was a nasty one making my daily work with Workbench difficult as I had to be careful not to apply (pardon commit) too early. The problem appeared in 6.0.x, 6.1.x and was finally fixed with the release of 6.2.2 on 2014-09-05 (see release notes) after being verified by support.
  4. Bug 73079 was reported on 2014-06-23 and is with status "Closed" as of 2014-09-15. It was about MySQL Workbench not refreshing the cell after setting value to NULL. It affected 6.1.x and early 6.2.x versions. Although it wasn't officially confirmed for me it was a regression, but more importantly it was fixed with the release of 6.2.3 on 2014-09-23 (see release notes).
  5. Bug 73708 was reported on 2014-08-25 and is with status "Closed" as of 2014-12-01. I was about MySQL Workbench models being completely mangled. I forgot this one for a few months and provided the requested feedback later to realize the problem was already fixed with the release of 6.2.4 on 2014-11-20 although nothing was mentioned in the release notes.
  6. Bug 73770 (my oldest one still open) was reported on 2014-08-29 and is still with status "Verified" as of 2018-02-05. It's a feature request for MySQL synchronizer between model and database to display the actual differences. Whoever used this functionality should know that applying changes in the model to the database could be quite annoying, because there's always some changes you do not expect and furthermore you do not understand. You only see the update script generated, but it would be nice to see the actual difference, which in some cases may not even be difference (see 90772 below).
  7. Bug 82202 was reported on 2016-07-12 and is with status "Closed" as of 2018-01-31. It was about impossibility to link Connector/ODBC 5.3.6 to the shared libmysqlclient.so due to undefined symbols (e.g. my_malloc, my_free, etc.). The problem was quickly explained, but then closed as "Not A Bug". It was reopened after a Fedora maintainer reported the same problem and finally fixed with release of Connector/ODBC 5.3.10 on 2018-01-30 (see release notes).
  8. Bug 84951 (my first suggested patch) was reported on 2017-02-10 and is with status "Duplicate" as of 2017-02-12. It was about problem building MySQL Workbench 6.3.9, because of compile errors in jsonparser.cpp and jsonview.cpp for which I suggested a patch. The bug was made duplicate of 84886, which also contained a patch provided two days earlier, but it was closed on 2018-05-14 as "Won't fix", because they "no longer support 32bit systems" (!?).
  9. Bug 89608 was reported on 2018-02-09 and is still with status "Verified" as of 2018-02-13. It is a feature request about having password requirements messages in MySQL Workbench match the server configuration, so they're more useful. I provided sample code about how it could be done and even wrote a SQL function myself to test the idea.
  10. Bug 89615 was reported on 2018-02-10 and is with status "Unsupported" as of 2018-08-23. It was about failure to build MySQL Shell 1.0.10, because compilation fails for undefined vio* functions. It was related to 82202 and although confirmed it was later closed with the explanation that "MySQL Shell 1.0 is no longer supported and version 8.0 should be used instead" (!?) to which I reacted a bit harsh, but I really do not understand such answers.
  11. Bug 90619 was reported on 2018-04-25 and it is with status "Duplicate" of 79315 as of 2018-04-25. It was about MySQL Installer not offering upgrade from MySQL 5.7 to 8.0, which was explained as "not a bug", because it was "working as expected", so apparently MySQL Installer doesn't support upgrade between major releases. I hope this changes in future.
  12. Bug 90620 (my first of a series) was reported on 2018-04-25 and is still with status "Verified" as of 2018-04-26. It is about MySQL Workbench 8.0.11's SQL editor displaying error on SELECT query with window functions. You wouldn't expect MySQL Workbench having the same version as the server to lack support for new server functionalities, but more about this later.
  13. Bug 90727 was reported on 2018-05-03 and is with status "Closed" as of 2018-06-01. It was about impossibility to link Connector/C++ 1.1.11 due to missing mysql_sys and mysql_strings libraries. It turned out to be a regression from some Solaris specific modifications not made Solaris specific. I got in touch with the developer and it was declared fixed in 1.1.12, which got released on 2019-01-28 (see release notes), so for 1.1.11 I had to patch it myself.
  14. Bug 90772 was reported on 2018-05-06 (St George's Day) and is still with status "Verified" as of 2018-05-09. It is about MySQL Workbench synchronizer making difference between single quotes escapes (e.g. the statement COMMENT 'Currency\' symbol' is different from the statement COMMENT 'Currency''s symbol'). It is because MySQL Workbench allows backslash escape and the server accepts it execution, but then rewrites it with single quote escape. Thus, on the next synchronization the difference remains (e.g. like you haven't synchronized at all).
  15. Bug 90876 was reported on 2018-05-15 and is with status "Closed" as of 2018-06-21. I was about MySQL Shell 8.0.11 giving error 5115 on adding documents to collection in Server 5.7, but what it was actually about was automatically generated identifiers as I explained in details in my article MySQL Shell 8 and automatically generated document IDs. This one resulted in documentation update and since nobody else experienced this problem I guess it's fine, but for me such problems break backwards compatibility.
  16. Bug 91841 was reported on 2018-07-31 and is with status "Closed" as of 2019-01-25. It was about impossibility to build MySQL Connector/ODBC 5.3.11, because of compilation errors. I was able to fix the compilation errors myself (see patch) until waiting for help and explanation. And the explanation was that it was duplicate of an internal bug, "which explains the reasons for the failure", but which I'm not able to read. The funny thing about this one was that there were more than 400 lines wiped out from the beginning of a header most probably related to copyright text updates. It was fixed with the release of 5.3.12 on 2019-01-28 (see release notes).
  17. Bug 92898 was reported on 2018-10-23 and is with status "Not a Bug" as of 2018-10-23. I consider this as my one and only "support request" and I was pleased, that it was explained the same day.  It is about hitting an incompatible change during upgrade from 8.0.12 to 8.0.13, because I still had same old views using ASC or DESC qualifiers for GROUP BY clause. In this regards I'd like to thank MySQL for the detailed release notes they publish although as in this case even if you read them carefully you may omit something by considering it doesn't concern you. For this reason it's highly recommended to use MySQL Shell's Upgrade Checker Utility, which could tell you about compatibility errors and issues in advance.
  18. Bug 92900 was reported on 2018-10-23 and is with status "Verified" as of 2018-10-23. It is about MySQL Workbench 8.0.13 lacking support for expressions in DEFAULT although the feature is supported by the same version of the server.
  19. Bug 92908 was reported on 2018-10-23 and is with status "Verified" as of 2018-10-23. Is is about MySQL Workbench 8.0.13 lacking support for expressions as key parts although the feature is supported by the same version of the server.
  20. Bug 93835 was reported on 2019-01-07 and is with status "Verified" as of 2019-01-10. It's a feature request for having MySQL Workbench display a warning on using keyword as identifier (e.g. table and column names). I opened it after Frédéric Descamps suggested so in a comment on LinkedIn. Fingers crossed it's implemented.
  21. Bug 94012 was reported on 2019-01-23 and is with status "Verified" as of 2019-01-23. It is about MySQL Workbench 8.0.14 not recognizing the new LATERAL keyword and giving error after it in SQL editor. It's by now the last one of the series on MySQL Workbench not supporting the features of the server even though they're the same version. The bug was nominated as MySQL Bug of the Day by Valerii Kravchuk on the same day.
I hope I haven't bored you with all the information above. It's not always easy to properly explain a problem, but I have quite some experience being on the "other side", so I think I know what information is required for a problem to be investigated thoroughly. Still reporting bugs properly is probably an art and I'm looking forward to Valerii Kravchuk's presentation How to create a useful MySQL bug report at FOSDEM on Saturday.

Some of my bugs are pure technical ones (e.g. 82202, 84951, 89615, 90727 and 91841) for which I reported all the details and still the explanations on some of them seem pretty strange to me (e.g. no longer support 32bit systems, MySQL Shell 1.0 is no longer supported and version 8.0 should be used instead) even though the problem and what should be fixed was quite clear. I do not understand how could you not release a new maintenance version when there are open bugs for the last such release in the series? What vendors should understand is that as a package maintainer for me it's more complex to support patches for problems not fixed upstream.

I've made a series of bugs about unsupported sever features in MySQL Workbench (e.g. 90620, 92900, 92908 and 94012), which made me think about the meaning of version number alignment that was done with the release of MySQL 8.0.11 as General Availability release (see MySQL 8.0: It Goes to 11!). As I understand the post it was done, so that we use the "correct versions" of the all the products. It's generally a good idea, but shouldn't the "correct version" support the same feature set of the server? Users should be presumably less confused if so, but so far this is clearly not the case with Workbench. I hope the situation improves in future.

I'd like to complete these post with some final thoughts. As a developer I'm aware that could never be a software completely free of bugs. However, as a user I expect that when a bug is properly reported and the problem is clear the solution to not take that much time, but of course what really matters are priorities and they are normally as higher as the number of affects users is bigger.

2019-01-23

New development features in MySQL 8.0.14

With the release of MySQL 8.0.14 Oracle keeps the already established practice of introducing new development features even with maintenance releases, which usually include only small improvements and mostly bug fixes. I took a look at 8.0.14's release notes, the post The MySQL 8.0.14 Maintenance Release is Generally Available by and of course the manual, experimented and here below are my picks related to development.

Lateral derived tables

Before MySQL 8.0.14 it wasn't possible to refer to (depend on) columns of preceding tables in the same FROM clause. Now this restriction is removed with the addition of LATERAL keyword (see Lateral derived tables). The LATERAL keyword means that the derived table depends on the previous table on its left side. You could have more than one LATERAL derived table in a query and each one would depend only on the previous table(s) or derived table(s). Lateral derived tables are so called "for each" loop of SQL and this makes possible some operations that were otherwise not possible or less efficient.
Here's an example. Let's say you want to calculate the minimum, average and maximum salaries for each department in the organization. Previously you would have to write it like this:
Explain plan for derived table query

SELECT D.dname, DT.min_sal, DT.avg_sal, DT.max_sal
  FROM dept D,
       (SELECT E.deptno, MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal
          FROM emp E
         GROUP BY E.deptno
       ) AS DT
 WHERE DT.deptno = D.deptno;


So use a derived table DT to calculate min/avg/max salary for all departments from emp table and then join with the dept table to produce the following result:

+------------+---------+-------------+---------+
| dname      | min_sal | avg_sal     | max_sal |
+------------+---------+-------------+---------+
| ACCOUNTING | 1300.00 | 2916.666667 | 5000.00 |
| RESEARCH   |  800.00 | 2175.000000 | 3000.00 |
| SALES      |  950.00 | 1566.666667 | 2850.00 |
+------------+---------+-------------+---------+
3 rows in set (0.0014 sec)


The derived table is completely independent of the other joined table as it could generate result by itself (i.e. it doesn't depend on the column values of the other table). The explain plan for this query is given on the right and it confirms that the result set of the derived table is first materialized, so it could be joined to the other table.
Another approach would be to use sub queries in the SELECT clause like this:

SELECT D.dname,
       (SELECT MIN(E.sal) FROM emp E WHERE E.deptno = D.deptno) AS min_sal,
       (SELECT AVG(E.sal) FROM emp E WHERE E.deptno = D.deptno) AS avg_sal,
       (SELECT MAX(E.sal) FROM emp E WHERE E.deptno = D.deptno) AS max_sal
  FROM dept D;


which won't be efficient (imagine a sales table and thousands of salesman if you want to evaluate their salaries) as three queries would have to do the job of one and for each row from the main table. Having to select just one column in SELECT sub queries is a drawback and MySQL server generates error Error Code: 1241. Operand should contain 1 column(s) if you try to select more.
If you try to connect the derived table to the other table with a query like the following one:

SELECT D.dname, DT.min_sal, DT.avg_sal, DT.max_sal
  FROM dept D,
       (SELECT MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal
          FROM emp E
         WHERE E.deptno = D.deptno
       ) AS DT;


you'll get Error Code: 1054. Unknown column 'D.deptno' in 'where clause', because D table is not known to the derived table. The query is illegal in SQL-92, but in SQL-1999 it becomes legal if the derived table it's proceeded by the LATERAL keyword:

SELECT D.dname, LDT.min_sal, LDT.avg_sal, LDT.max_sal
  FROM dept D,
       LATERAL
       (SELECT MIN(E.sal) min_sal, AVG(E.sal) avg_sal, MAX(E.sal) max_sal
          FROM emp E
         WHERE E.deptno = D.deptno
       ) AS LDT;

Explain plan for lateral derived table query

and produces the following result:

+------------+---------+-------------+---------+
| dname      | min_sal | avg_sal     | max_sal |
+------------+---------+-------------+---------+
| ACCOUNTING | 1300.00 | 2916.666667 | 5000.00 |
| RESEARCH   |  800.00 | 2175.000000 | 3000.00 |
| SALES      |  950.00 | 1566.666667 | 2850.00 |
| OPERATIONS |    NULL |        NULL |    NULL |
+------------+---------+-------------+---------+
5 rows in set (0.1182 sec)


As clear from the explain plan graph on the right in this case there's no grouping, but MySQL gives higher query cost, because access to the derived table is by full table scan. The more interesting information is however into the tabular explain plan (column partitions is intentionally hidden):

+----+-------------------+------------++------+---------------+-----------+---------+-------------------+------+----------+----------------------------+
| id | select_type       | table      || type | possible_keys | key       | key_len | ref               | rows | filtered | Extra                      |
+----+-------------------+------------++------+---------------+-----------+---------+-------------------+------+----------+----------------------------+
|  1 | PRIMARY           | D          || ALL  | NULL          | NULL      | NULL    | NULL              |    4 |      100 | Rematerialize (<derived2>) |
|  1 | PRIMARY           | <derived2> || ALL  | NULL          | NULL      | NULL    | NULL              |    2 |      100 | NULL                       |
|  2 | DEPENDENT DERIVED | E          || ref  | fk_deptno     | fk_deptno | 5       | dept_emp.D.deptno |    4 |      100 | NULL                       |
+----+-------------------+------------++------+---------------+-----------+---------+-------------------+------+----------+----------------------------+
3 rows in set, 2 warnings (0.0010 sec)
Note (code 1276): Field or reference 'dept_emp.D.deptno' of SELECT #2 was resolved in SELECT #1
Note (code 1003): /* select#1 */ select `dept_emp`.`d`.`dname` AS `dname`,`ldt`.`min_sal` AS `min_sal`,`ldt`.`avg_sal` AS `avg_sal`,`ldt`.`max_sal` AS `max_sal` from `dept_emp`.`dept` `d` join lateral (/* select#2 */ select min(`dept_emp`.`e`.`sal`) AS `min_sal`,avg(`dept_emp`.`e`.`sal`) AS `avg_sal`,max(`dept_emp`.`e`.`sal`) AS `max_sal` from `dept_emp`.`emp` `e` where (`dept_emp`.`e`.`deptno` = `dept_emp`.`d`.`deptno`)) `ldt`

There are two new information and an additional note. The plan clearly shows that the derived table E (derived2) is DEPENDENT on the other table and that its rematerialized for each row of D (see EXPLAIN extra information). This is the reason for which lateral derived tables are also known as the "for each" loop of SQL. The note informs about how the external reference in the derived table was resolved. If you compare the results you should also notice two additional rows (i.e. departments without employees), which means LATERAL keyword has actually made left join.

Of course MySQL Workbench (even upgraded to 8.0.14 as well) is again not familiar with the new syntax (see my previous post New development features in MySQL 8.0.13), because it doesn't properly color the new keyword and shows error in SQL editor just after it. I reported this as bug 94012, but I have no much hope as 90620, 92900 and 92908 were verified yet still open. Version numbers doesn't mean much nowadays :-)

Please, note that it's possible even to connect to the outer table if the derived table is in a sub query (see an example in WL#461).

JSON aggregation functions can now be used as window functions

It is now possible to use JSON_ARRAYAGG and JSON_OBJECTAGG aggregate functions as window functions by using the OVER clause (see Window Function Concepts and Syntax). This makes all (except COUNT(DISTINCT) and GROUP_CONCAT) of the aggregate functions possible for use as window functions after bitwise AND/OR/XOR functions were made so with MySQL 8.0.12. Here's an example:

SELECT E.ename, E.sal,
       AVG(E.sal) OVER dw AS avg_sal,
       JSON_OBJECTAGG(D.dname, E.sal) OVER dw AS dept_sal
  FROM emp  E,
       dept D
 WHERE E.deptno = D.deptno
WINDOW dw AS (PARTITION BY D.deptno);


+--------+---------+-------------+------------------------+
| ename  | sal     | avg_sal     | dept_sal               |
+--------+---------+-------------+------------------------+
| CLARK  | 2450.00 | 2916.666667 | {"ACCOUNTING": 1300.0} |
| KING   | 5000.00 | 2916.666667 | {"ACCOUNTING": 1300.0} |
| MILLER | 1300.00 | 2916.666667 | {"ACCOUNTING": 1300.0} |
| SMITH  |  800.00 | 2175.000000 | {"RESEARCH": 3000.0}   |
| JONES  | 2975.00 | 2175.000000 | {"RESEARCH": 3000.0}   |
| SCOTT  | 3000.00 | 2175.000000 | {"RESEARCH": 3000.0}   |
| ADAMS  | 1100.00 | 2175.000000 | {"RESEARCH": 3000.0}   |
| FORD   | 3000.00 | 2175.000000 | {"RESEARCH": 3000.0}   |
| ALLEN  | 1600.00 | 1566.666667 | {"SALES": 950.0}       |
| WARD   | 1250.00 | 1566.666667 | {"SALES": 950.0}       |
| MARTIN | 1250.00 | 1566.666667 | {"SALES": 950.0}       |
| BLAKE  | 2850.00 | 1566.666667 | {"SALES": 950.0}       |
| TURNER | 1500.00 | 1566.666667 | {"SALES": 950.0}       |
| JAMES  |  950.00 | 1566.666667 | {"SALES": 950.0}       |
+--------+---------+-------------+------------------------+
14 rows in set (0.0021 sec)


It's important to note that MySQL does not permit duplicate keys in JSON data type, so in a window without ordering function JSON_OBJECTAGG would return the last value for the key, which may not be deterministic.

X Protocol improvements

According to the release notes data is now always converted to utf8mb4 character set (using the utf8mb4_general_ci collation). The other notable improvement is the support for "prepare functionality". The release notes provide no reference about this new functionality, but has mentioned WL#9270 in his article, so I believe this is about preparing CRUD operations (see Preparing CRUD Statements). A simple example in JavaScript would be the following:

MySQL Shell 8.0.14
Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.
 MySQL  JS > \connect user@localhost
Creating a session to 'user@localhost'
Your MySQL connection id is 22 (X protocol)
Server version: 8.0.14 MySQL Community Server - GPL
 MySQL  localhost:33060+ ssl  JS > \use test
Default schema `test` accessible through db.
 MySQL  localhost:33060+ ssl  test  JS > var usr = db.createCollection('users')
 MySQL  localhost:33060+ ssl  test  JS > usr.add({name:"User1",age:15})
Query OK, 1 item affected (0.0108 sec)
 MySQL  localhost:33060+ ssl  test  JS > usr.add({name:"User2",age:17})
Query OK, 1 item affected (0.0138 sec)
 MySQL  localhost:33060+ ssl  test  JS > usr.add({name:"User3",age:20})
Query OK, 1 item affected (0.0105 sec)
 MySQL  localhost:33060+ ssl  test  JS > usr.add({name:"User4",age:19})
Query OK, 1 item affected (0.0137 sec)
 MySQL  localhost:33060+ ssl  test  JS > usr.add({name:"User5",age:16})
Query OK, 1 item affected (0.0118 sec)
 MySQL  localhost:33060+ ssl  test  JS > usr.find()
[
    {"_id": "00005c46e7e5000000000000000a","age": 15,"name": "User1"},
    {"_id": "00005c46e7e5000000000000000e","age": 17,"name": "User2"},
    {"_id": "00005c46e7e5000000000000000f","age": 20,"name": "User3"},
    {"_id": "00005c46e7e50000000000000010","age": 19,"name": "User4"},
    {"_id": "00005c46e7e50000000000000011","age": 16,"name": "User5"}
]
5 documents in set (0.0006 sec)
 MySQL  localhost:33060+ ssl  test  JS > var fcmd = usr.find('age >= :page')
 MySQL  localhost:33060+ ssl  test  JS > fcmd.bind('page', 18)
[
    {"_id": "00005c46e7e5000000000000000f","age": 20,"name": "User3"},
    {"_id": "00005c46e7e50000000000000010","age": 19,"name": "User4"}
]
2 documents in set (0.0006 sec)
 MySQL  localhost:33060+ ssl  test  JS > fcmd.bind('page', 16)
[
    {"_id": "00005c46e7e5000000000000000e","age": 17,"name": "User2"},
    {"_id": "00005c46e7e5000000000000000f","age": 20,"name": "User3"},
    {"_id": "00005c46e7e50000000000000010","age": 19,"name": "User4"},
    {"_id": "00005c46e7e50000000000000011","age": 16,"name": "User5"}
]
4 documents in set (0.0003 sec)

 MySQL  localhost:33060+ ssl  test  JS > fcmd.bind('page', 17)
[
    {"_id": "00005c46e7e5000000000000000e","age": 17,"name": "User2"},
    {"_id": "00005c46e7e5000000000000000f","age": 20,"name": "User3"},
    {"_id": "00005c46e7e50000000000000010","age": 19,"name": "User4"}
]
3 documents in set (0.0004 sec)


The example creates a collection of users with their names and ages, then prints the whole collection. The interesting part starts with the line highlighted in yellow. It prepares a statement using a named parameter (anonymous parameters with ? are not supported by the X protocol), but does not execute it. The execution happens after a value is bind to the parameter and this could be done many times producing different results. It's interesting that in the general log the first bind actually executes a query, then there's prepare and then follow executes:

Query    SELECT doc FROM `test`.`users` -> usr.find()
Query    SELECT doc FROM `test`.`users` WHERE (JSON_EXTRACT(doc,'$.age') >= 18) -> fcmd.bind('page', 18)
Prepare    SELECT doc FROM `test`.`users` WHERE (JSON_EXTRACT(doc,'$.age') >= ?)
Execute    SELECT doc FROM `test`.`users` WHERE (JSON_EXTRACT(doc,'$.age') >= 16) -> fcmd.bind('page', 16)
Execute    SELECT doc FROM `test`.`users` WHERE (JSON_EXTRACT(doc,'$.age') >= 17) -> fcmd.bind('page', 17)


Using prepared statements for repeatedly executed statements could lead do performance improvements, because of the saved parse time, so this is something you should consider if you need to improve the performance of your applications and scripts.

Spatial improvements

The function ST_Distance now accepts an optional third parameter specifying the unit for the returned value. The possible values are defined in INFORMATION_SCHEMA.ST_UNITS_OF_MEASURE table along with conversion factor relative to the basic unit metre, which is the default. Here is an example for calculation of the distance between Sofia and Sydney in kilometers and nautical miles in SRID 4326:

SELECT ST_Distance(ST_PointFromText('POINT( 42.69751 23.32415)', 4326),
                   ST_PointFromText('POINT(-33.86667 151.20000)', 4326)) / 1000 dist_km;
+--------------------+
| dist_km            |
+--------------------+
| 15431.933058990671 |
+--------------------+
1 row in set (0.0023 sec)


SELECT ST_Distance(ST_PointFromText('POINT( 42.69751 23.32415)', 4326),
                   ST_PointFromText('POINT(-33.86667 151.20000)', 4326), 'nautical mile') dist_nm;


+------------------+
| dist_nm          |
+------------------+
| 8332.57724567531 |
+------------------+
1 row in set (0.0008 sec)

This wraps up my review. There is of course lot more in MySQL 8.0.14 not only for developers, so I encourage to explore and find out more. The references at the beginning of this article are a good starting point.

2019-01-10

The things I learned about myself in 2018

For me 2018 was a year of hopes, some of which didn't realized. I was hoping for some changes, but at the end they didn't happen however there's nothing tragic in this as I only learned more about myself. In this article I take a look back of what I learned about myself in the past year.

I do not have enough experience with open source

I was told this somewhere in beginning of March and I do not disagree at all. I have contributed small patches, translations and bugs to various open source projects in the past about 20 years, I have published some of my personal projects under free licenses, but I wish I could have done more. It's true I could have contributed more, but unfortunately, I simply haven't found the time for this. I'm still a strong believer in free and open source software (FOSS), which I used since my university years, I still use and I would continue using for the time being. However, nowadays I'm not that picky that FOSS should also be free as in "free beer", which brings me closer to GNU's philosophy.

I do not read enough/your books

I was suggested to read more books by some. Yes, perhaps I could read more, but I'm still reading. And I may not be reading the same as you and you're not the one to tell me what to read just like I'm not telling you. Apart from books I'm also reading some magazines like I've been since my teenage years. And some online resources as well. With this post I'd like to thank my family and friends who gave me books as gifts last year, so there are currently about 10 books on my reading list for 2019. Sorry, the list is private ;-)

I do not have enough experience with this or that technology

I learned this in the summer. I still consider myself a curious person about technology, but I simply cannot have experience with everything out there and I believe nobody could. For me it's more interesting how a person perceives technology and how it deals with changes. Two decades ago software was quite different as well as the tools and technologies used to implement it. The peace of changes has increased as well as the persons involved in developing software. There are many persons that started software development one, two, three or more decades ago that easily adapt to new tools and technologies, because they have the basic understanding about how software is build and works. And more importantly they know they have to learn constantly and be involved. This means reading the manuals, following development progress, discussing, reporting bugs, making feature request, contributing as much as possible and generally getting involved with the community.

I'm not agile

This was something I was told before the Christmas holidays. Yes, perhaps I'm not that agile after all. I know and understand agile values and principles, but at the same time I do not share them completely.
  • I agree that individuals and interactions are more important than processes and tools, but only the later could make the former simple and efficient. Without clear processes and proper tools even best individuals would fail to interact in a way that benefits software development. I believe that such individuals would first create the necessary processes and tools.
  • I agree that working software is more important than comprehensive documentation, but I've seen software written with scarce or no documentation at all. After some time (e.g. 5, 10, 15 years) when such software stops working, has to be upgraded or has to be explained to client it first has to be reverse engineered as most probably original developers have long left the company. So, in short term and for small projects I completely support this, but in the long term and for large projects it won't help anyone. The problem is that if not written on time documentation would most probably never be written.
  • I agree that customer collaboration is more important than contract negotiation, but not all customers could collaborate effectively. Some customers do not even know what exactly they want, they do not read documents (even refuse to do so when asked) hence they do not know the software they're using and they do not properly transfer knowledge over time. These are only my humble observations, but I believe that customer collaboration should be open in terms that it should be accessible to all the involved parties now and in future. I've seen cases where software was implemented in "close collaboration" between a developer and a person at the customer through personal messages, which afterwards nobody knows anything about and doesn't want to.
  • I agree that responding to change is more important than following a plan, but responding to constant changes means no project plan and scope at all. Projects that start as a bubble and constantly change their form easily become malformed and reach out of control. Such projects usually fail. I've seen some never-ending projects in my practice and it's one of the things I truly dislike.
These and perhaps other reasons make organizations choose hybrid approaches to software development as they find agile too extreme and inefficient in large organizations and I have worked in such organizations in the past more than 15 years, so perhaps my views are twisted from this perspective.

I'm an 'advanced user'

This was written to me by a support representative. It was after asking (and insisting) for some simple (in my view) improvements in the software provided and supported by a big vendor for whom this person was working. He wrote it in the sense that it's only me that is asking for these improvements, which made it automatically clear to me that they won't be implemented. Well, if this defines me as "advanced user", so be it, but I'm quite disappointed, because I was expecting more from a SOHO device using free and open source software. I may write a separate article on this later.

I'm myself

Last, but not least I'm still myself with all the pros and cons as a person and professional. I do not pretend to be everything to everyone, I do not pretend to know everything and I do not pretend to be the most experienced out there. Everything I learned about myself in 2018 was based on some experience and I'm sure it would help me become a better person and professional in future.

2018-12-29

Functional dependencies recognition in groupping by MySQL

If you've used MySQL in the past you should remember it was possible by default to aggregate in queries without a GROUP BY clause. So you could write a query like this one

SELECT D.dname, COUNT(E.empno) cnt
  FROM dept D,
       emp  E
 WHERE D.deptno = E.deptno;

most probably with the idea to count the employees in each department. However, the result you'll get (without error) would be:

+------------+-----+
| dname      | cnt |
+------------+-----+
| ACCOUNTING |  14 |
+------------+-----+

which is simply incorrect and most probably not what you actually want. Such query would result in an error in other databases (e.g. Oracle would give ORA-00937: not a single-group group function, PostgreSQL would say SQL state: 42803 ERROR: column "d.dname" must appear in the GROUP BY clause or be used in an aggregate function, etc.).

The reason for this behavior was the default SQL mode of the server and more precisely the absence of ONLY_FULL_GROUP_BY in the list with modes. There are some applications (like Bugzilla and Cacti) that did not allow me to enable this mode on my server although it was long suggested by some (e.g. I remember the presentations of Ronald Bradford from BGOUG Autumn 2010). I remember having problems with these applications, so I put a comment in the defaults file to not enable ONLY_FULL_GROUP_BY. I must admit I haven't checked recently, so the situation could have improved, but with Bugzilla's current state it's doubtful.

The default SQL modes changed much with the latest MySQL releases:
The only difference in 8.0 is in fact the absence of  NO_AUTO_CREATE_USER, because it was removed after being deprecated in previous releases. It's highly recommended to stick to the defaults especially when starting new application, so you do not have to fix queries and data later.

When ONLY_FULL_GROUP_BY was enabled with MySQL 5.7.5 (released on 2014-09-25) Oracle took care to implement a sophisticated functional dependencies as Guilhem Bichot wrote in December 2014. The effort was praised by Markus Winand in his great article One Giant Leap For SQL: MySQL 8.0 Released from April 2018, who wrote that MySQL now has "the most complete functional dependencies checking among the major SQL databases". Let's explore this, but before it you may refer to Wikipedia for a quick review of the definition of functional dependency. To put it simply Y is functionally dependent on X if the values of X unique determine the values of Y. For example the column values in a table are functionally dependent of the primary key, because its value uniquely determines all other values in a row.
Take a look at the following example with the primary key. It's the query from above, but with a GROUP BY clause:

SELECT D.dname, COUNT(E.empno) cnt
  FROM dept D,
       emp  E
 WHERE D.deptno = E.deptno
 GROUP BY D.deptno;

Is this query valid? The grouping is by D.deptno, but the column list in SELECT consist only of D.dname and aggregation on E.empno, so it's wrong, right? Not in MySQL 5.7 and above with enabled ONLY_FULL_GROUP_BY, so the old rule to put everything from GROUP BY into SELECT or use aggregation doesn't apply (i.e. no more SQL-92 restrictions). The result is:

+------------+-----+
| dname      | cnt |
+------------+-----+
| ACCOUNTING |   3 |
| RESEARCH   |   5 |
| SALES      |   6 |
+------------+-----+

Why so? MySQL properly determines that D.dname is functionally dependent on D.deptno, because this is the primary key for dept table. Same query works properly also in PostgreSQL, but not in Oracle and other major SQL databases. MySQL could properly determine functional dependencies also on unique keys, equalities in the WHERE clause or [LEFT] JOIN ON clauses and from selected expressions in a view’s or derived table’s body. It's quite exciting, right? Be sure to use it next time when you consider writing a GROUP BY query.

For more details, please refer to chapter Detection of Functional Dependence in the manual.