My personal and professional life

2018-12-29

Functional dependencies recognition in groupping by MySQL

If you've used MySQL in the past you should remember it was possible by default to aggregate in queries without a GROUP BY clause. So you could write a query like this one

SELECT D.dname, COUNT(E.empno) cnt
  FROM dept D,
       emp  E
 WHERE D.deptno = E.deptno;

most probably with the idea to count the employees in each department. However, the result you'll get (without error) would be:

+------------+-----+
| dname      | cnt |
+------------+-----+
| ACCOUNTING |  14 |
+------------+-----+

which is simply incorrect and most probably not what you actually want. Such query would result in an error in other databases (e.g. Oracle would give ORA-00937: not a single-group group function, PostgreSQL would say SQL state: 42803 ERROR: column "d.dname" must appear in the GROUP BY clause or be used in an aggregate function, etc.).

The reason for this behavior was the default SQL mode of the server and more precisely the absence of ONLY_FULL_GROUP_BY in the list with modes. There are some applications (like Bugzilla and Cacti) that did not allow me to enable this mode on my server although it was long suggested by some (e.g. I remember the presentations of Ronald Bradford from BGOUG Autumn 2010). I remember having problems with these applications, so I put a comment in the defaults file to not enable ONLY_FULL_GROUP_BY. I must admit I haven't checked recently, so the situation could have improved, but with Bugzilla's current state it's doubtful.

The default SQL modes changed much with the latest MySQL releases:
The only difference in 8.0 is in fact the absence of  NO_AUTO_CREATE_USER, because it was removed after being deprecated in previous releases. It's highly recommended to stick to the defaults especially when starting new application, so you do not have to fix queries and data later.

When ONLY_FULL_GROUP_BY was enabled with MySQL 5.7.5 (released on 2014-09-25) Oracle took care to implement a sophisticated functional dependencies as Guilhem Bichot wrote in December 2014. The effort was praised by Markus Winand in his great article One Giant Leap For SQL: MySQL 8.0 Released from April 2018, who wrote that MySQL now has "the most complete functional dependencies checking among the major SQL databases". Let's explore this, but before it you may refer to Wikipedia for a quick review of the definition of functional dependency. To put it simply Y is functionally dependent on X if the values of X unique determine the values of Y. For example the column values in a table are functionally dependent of the primary key, because its value uniquely determines all other values in a row.
Take a look at the following example with the primary key. It's the query from above, but with a GROUP BY clause:

SELECT D.dname, COUNT(E.empno) cnt
  FROM dept D,
       emp  E
 WHERE D.deptno = E.deptno
 GROUP BY D.deptno;

Is this query valid? The grouping is by D.deptno, but the column list in SELECT consist only of D.dname and aggregation on E.empno, so it's wrong, right? Not in MySQL 5.7 and above with enabled ONLY_FULL_GROUP_BY, so the old rule to put everything from GROUP BY into SELECT or use aggregation doesn't apply (i.e. no more SQL-92 restrictions). The result is:

+------------+-----+
| dname      | cnt |
+------------+-----+
| ACCOUNTING |   3 |
| RESEARCH   |   5 |
| SALES      |   6 |
+------------+-----+

Why so? MySQL properly determines that D.dname is functionally dependent on D.deptno, because this is the primary key for dept table. Same query works properly also in PostgreSQL, but not in Oracle and other major SQL databases. MySQL could properly determine functional dependencies also on unique keys, equalities in the WHERE clause or [LEFT] JOIN ON clauses and from selected expressions in a view’s or derived table’s body. It's quite exciting, right? Be sure to use it next time when you consider writing a GROUP BY query.

For more details, please refer to chapter Detection of Functional Dependence in the manual.