My personal and professional life

2018-11-28

Execution plan change in MySQL 8.0

About 6 weeks ago at work I had to write a new report query, because an application page was loading slowly (about 25 seconds). It turned out there is a problem in the application, because it was first executing a relatively slow query (taking about 6 seconds) and then a relatively fast query for each returned row from the first query. That second query was taking between 0.3 and 0.5 seconds per execution, but it was executed about 60 times, so the delay accumulated and explained the slow page load.
I decided to provide a single query that produces the whole result set, so I had to join 6 tables. One table (msg_progs) had just 13 rows, another two had just 64 rows (bldenv and prjprocstat) and another one just about 100 rows (p) and these four had no tendency of growing much more. There was one table with several tens of thousands of rows (m) and another with several hundreds thousands of rows (mtp), which for sure were going to grow on a daily basis. To improve the selectivity in the second largest table (m) I created a multi-column index on three columns involved in the query, but the problem was that sill much of its data was selected. Anyway, I ended up with a query that was executing for 6-7 seconds, which was much more acceptable from the initial 25 seconds. I'm not able to provide all the details, but I have captured the execution plan and I consider it safe to publish.
Query explain plan in MySQL 5.7.20
Remark: MySQL Workbench nicely visualizes the execution plans graphically (notice the lines connecting the different operations - the higher the number of rows the thicker the line). Textual execution plans are thing of the past, right?
That was until 3 weeks ago when we upgraded to MySQL 8.0.13 (from 5.7.20). The same query and in the same database started running for 24 seconds. When I checked the execution plan it has changed (see below).
Original explain plan in MySQL 8.0
I think you could easily notice that the join order has changed. Previously the second largest table was considered in the first nested loop, then the largest table and finally the smaller tables. Now it's the opposite as the optimizer first considers the smaller tables and only then the largest ones. I could have used a hint to fix the join order (e.g. JOIN_ORDER), but I noticed that if I re-create the multi-column index on just two of the columns (that are more selective in fact) the optimizer chooses a different plan, but the query execution time is still about 14 seconds. I finally decided to remove one of the tables in the joins (the smallest msg_progs with just 13 rows), which made the optimizer choose a different execution plan (here below) and the query now runs for about 4 seconds.
Finaly explain plan in MySQL 8.0
I'm not sure where this difference come from apart from the server, but I cannot blame without a proof (e.g. a reference in the manual that explains the difference), which unfortunately I haven't had the time to search. If you know the reason for the change of execution plans in MySQL 8.0 please, let me know.
The situation reminded me of Oracle database where after each major release we had queries with modified execution plans and had to use query hints or other tricks to make them fast again. It appears to be no different with MySQL nowadays.
Update 2018-12-17: I made a premature conclusion above, because the change of the plan happened (or was at least noticed) after we upgraded the server. However, as it turned out the real reason was the number of rows into tables m and mtp for which the server was choosing a different execution plan. We found it out experimenting with different in volume data sets.

No comments: