My personal and professional life

2018-07-11

Why loading data with MySQL Workbench is so slow?

I recently started a small personal project for statistics on 24 Hours of Le Mans endurance race of which I'm a fan. I started by gathering data and for loading them started using data import functionality in MySQL Workbench. The functionality is accessible from the context menu of a table - you click with the right button on a table and then on the item "Table Data Import Wizard". It's a pretty neat feature, because it allows you:
  • first, to select a local file;
  • then, to chose destination - either an existing table (and truncate it if necessary) or a new one;
  • then, to configure settings (e.g. CSV options like line and column separators and enclosure character), file encoding and column mappings;
  • finally, to start the import and watch for errors (a message is printed for each line from the input file).
However, with the growing number of lines in the CSV file, I noticed that the data import becomes slower and slower (I didn't pay much attention initially), so for about 4500 lines it now takes about 5 minutes (!) in a local network and on a server that is not that much loaded with work. I cannot imagine if I have to load hundreds of thousands or even millions of lines like this.
I tried the same with LOAD DATA INFILE and the same data was loaded in about 1 second, even though I have some triggers on the table checking newly inserted data. The difference is quite obvious. So why is loading data in MySQL Workbench so slow compared to LOAD DATA INFILE statement?
By checking the general log of the server the difference becomes immediately apparent - MySQL Workbench generates an INSERT statement for each line of the input file an executes these statements consecutively one by one. That's quite expensive and time consuming especially on the network (see Optimizing INSERT Statements).
But why it is so? Two possible explanations:
  1. LOAD DATA INFILE statement requires the file to be available to the server (i.e. on a local file system accessible to the server). However, with LOCAL keyword the client could transfer the file to the server, so that the server could read it. The overhead in this case would be the transfer of the file over the network. This however requires that both server and client have local data loading capabilities enabled (for server see local_infile system variable and for client it's a compile time ENABLED_LOCAL_INFILE CMake option). So although I have this capability enabled on the server apparently it's disabled in the MySQL client used by Workbench, because trying to execute LOAD DATA INFILE with the LOCAL keyword gives Error Code: 1148. The used command is not allowed with this MySQL version (see more in Security Issues with LOAD DATA LOCAL).
  2. Using INSERT statement with multiple VALUES lists to insert many rows at once would not provide Workbench with the possibility to give information for each processed line from the input file.
MySQL Workbench Data ImportIf I'm right for the above, then I would expect the data import log in MySQL Workbench to be more advanced. First of all, no line information is printed, so the messages "Data import" (successful?) and various possible errors are useless in determining the problematic line. I had to copy the log and index the messages in an external text editor in order to identify the problem in the data file. Then, I'd like to have information only for errors (e.g. some kind of filtering). There is a progress bar, but without any numbers - I think a message like "Processing line X of NB_LNS_IN_FILE" would be quite useful, but this would require counting the lines in advance. There is also no option to save the import log and copying it requires scrolling (i.e. no Ctrol+A or Select everything exists as an option).
In any case, I find data import functionality in MySQL Workbench to be quite useful, but be ware if you need to load large amounts of data (i.e. several thousand lines or more). In case you need to load such amounts of data it's highly preferable to transfer the data file locally and use LOAD DATA INFILE, which according to the manual "reads rows from a text file into a table at a very high speed".

No comments: