MySQL Sever Blog に MySQL Shell 8.0.21 の Dump / Import 機能についてエントリーが投稿されています。 並列化により非常に高速に Dump / Import 出来ることなどが示されてます。非常に強力なツールであることが感じられます。
MySQL Shell Dump & Load part 1: Demo! | MySQL Server Blog
とりあえず動かしてみる
\help util.dumpInstance
で説明が見れます。
threads
や maxRate
オプションでパフォーマンスの調整ができそうです。
$ mysqlsh root@localhost mysqlsh > \help util.dumpInstance NAME dumpInstance - Dumps the whole database to files in the output directory. SYNTAX util.dumpInstance(outputUrl[, options]) WHERE outputUrl: Target directory to store the dump files. options: Dictionary with the dump options. DESCRIPTION The outputUrl specifies where the dump is going to be stored. By default, a local directory is used, and in this case outputUrl can be prefixed with file:// scheme. If a relative path is given, the absolute path is computed as relative to the current working directory. If the output directory does not exist but its parent does, it is created. If the output directory exists, it must be empty. All directories created during the dump will have the following access rights (on operating systems which support them): rwxr-x---. All files created during the dump will have the following access rights (on operating systems which support them): rw-r-----. The following options are supported: - excludeSchemas: list of strings (default: empty) - list of schemas to be excluded from the dump. - excludeTables: list of strings (default: empty) - List of tables to be excluded from the dump in the format of schema.table. - users: bool (default: true) - Include users, roles and grants in the dump file. - events: bool (default: true) - Include events from each dumped schema. - routines: bool (default: true) - Include functions and stored procedures for each dumped schema. - triggers: bool (default: true) - Include triggers for each dumped table. - tzUtc: bool (default: true) - Convert TMESTAMP data to UTC. - consistent: bool (default: true) - Enable or disable consistent data dumps. - ddlOnly: bool (default: false) - Only dump Data Definition Language (DDL) from the database. - dataOnly: bool (default: false) - Only dump data from the database. - dryRun: bool (default: false) - Print information about what would be dumped, but do not dump anything. - ocimds: bool (default: false) - Enable checks for compatibility with MySQL Database Service (MDS) - compatibility: list of strings (default: empty) - Apply MySQL Database Service compatibility modifications when writing dump files. Supported values: "force_innodb", "strip_definers", "strip_restricted_grants", "strip_role_admin", "strip_tablespaces". - chunking: bool (default: true) - Enable chunking of the tables. - bytesPerChunk: string (default: "32M") - Sets average estimated number of bytes to be written to each chunk file, enables chunking. - threads: int (default: 4) - Use N threads to dump data chunks from the server. - maxRate: string (default: "0") - Limit data read throughput to maximum - maxRate: string (default: "0") - Limit data read throughput to maximum rate, measured in bytes per second per thread. Use maxRate="0" to set no limit. - showProgress: bool (default: true if stdout is a TTY device, false otherwise) - Enable or disable dump progress information. - compression: string (default: "zstd") - Compression used when writing the data dump files, one of: "none", "gzip", "zstd". - defaultCharacterSet: string (default: "utf8mb4") - Character set used for the dump. - osBucketName: string (default: not set) - Use specified OCI bucket for the location of the dump. - osNamespace: string (default: not set) - Specify the OCI namespace (tenancy name) where the OCI bucket is located. - ociConfigFile: string (default: not set) - Use the specified OCI configuration file instead of the one in the default location. - ociProfile: string (default: not set) - Use the specified OCI profile instead of the default one. <snip>
引数に出力先のディレクトリを指定して実行します。
$ mysqlsh root@localhost mysqlsh> util.dumpInstance("/tmp/dumpdir")
dump時 はどんなSQLが流れるのか
一般クエリログを有効にし、ダンプ処理中に流れるSQLを観測してみました。テスト用に小さなテーブル(lockt
)を1つ用意してます。
mysql> SHOW TABLES FROM test; +----------------+ | Tables_in_test | +----------------+ | lockt | +----------------+ 1 row in set (0.01 sec)
一般クエリログに出力されたログの Id
列に注目すると、コネクションIDが17〜22の5本のコネクションにより並列にバックアップ処理が行われていることがわかります。
$ tail -n +3 general.log | awk '{ print $2 }' | sort | uniq 17 18 19 20 21 22 Id
コネクションごとにログを分解するとこんな感じ。
- コネクションID 17: ポート番号確認
$ grep '17 ' general.log 2020-08-15T04:03:37.955055Z 17 Query select schema() 2020-08-15T04:04:03.138587Z 17 Query SELECT @@GLOBAL.port 2020-08-15T04:04:04.708654Z 17 Quit
- コネクションID 18:
FLUSH TABLE WITH READ LOCK
とLOCK INSTANCE FOR BACKUP
でロックする
$ grep '18 ' general.log 2020-08-15T04:04:03.156406Z 18 Connect root@localhost on using SSL/TLS 2020-08-15T04:04:03.156714Z 18 Query SET SQL_MODE = '' 2020-08-15T04:04:03.156883Z 18 Query SET NAMES 'utf8mb4' 2020-08-15T04:04:03.157033Z 18 Query SET SESSION net_write_timeout = 300 2020-08-15T04:04:03.157150Z 18 Query SET TIME_ZONE = '+00:00' 2020-08-15T04:04:03.157338Z 18 Query SELECT SCHEMA_NAME FROM information_schema.schemata WHERE SCHEMA_NAME NOT IN ('information_schema', 'mysql', 'ndbinfo', 'performance_schema', 'sys') 2020-08-15T04:04:03.159080Z 18 Query SELECT TABLE_NAME FROM information_schema.tables WHERE TABLE_SCHEMA = 'test' AND TABLE_TYPE = 'BASE TABLE' 2020-08-15T04:04:03.160512Z 18 Query SELECT TABLE_NAME FROM information_schema.tables WHERE TABLE_SCHEMA = 'test' AND TABLE_TYPE = 'VIEW' 2020-08-15T04:04:03.161561Z 18 Query SELECT TABLE_ROWS FROM information_schema.tables WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 'lockt' 2020-08-15T04:04:03.162400Z 18 Query SELECT @@GLOBAL.HOSTNAME 2020-08-15T04:04:03.162565Z 18 Query SELECT @@GLOBAL.VERSION 2020-08-15T04:04:03.162733Z 18 Query SELECT @@GLOBAL.GTID_EXECUTED 2020-08-15T04:04:03.163244Z 18 Query select * from (select distinct grantee, substr(grantee, 2, length(grantee)-locate('@', reverse(grantee))-2) as User, substr(grantee, length(grantee)-locate('@', reverse(grantee))+3, locate('@', reverse(grantee))-3) as host from information_schema.user_privileges order by user, host) as Users where Users.User not like 'mysql.infoschema' AND Users.User not like 'mysql.session' AND Users.User not like 'mysql.sys' 2020-08-15T04:04:03.164371Z 18 Query USE `test` 2020-08-15T04:04:03.164583Z 18 Query show events 2020-08-15T04:04:03.165572Z 18 Query USE `test` 2020-08-15T04:04:03.165755Z 18 Query SHOW FUNCTION STATUS WHERE Db = 'test' 2020-08-15T04:04:03.166877Z 18 Query USE `test` 2020-08-15T04:04:03.167059Z 18 Query SHOW PROCEDURE STATUS WHERE Db = 'test' 2020-08-15T04:04:03.168525Z 18 Query FLUSH TABLES WITH READ LOCK 2020-08-15T04:04:03.209360Z 18 Query /*!80000 LOCK INSTANCE FOR BACKUP */ 2020-08-15T04:04:03.209530Z 18 Query UNLOCK TABLES 2020-08-15T04:04:03.210561Z 18 Query select * from (select distinct grantee, substr(grantee, 2, length(grantee)-locate('@', reverse(grantee))-2) as User, substr(grantee, length(grantee)-locate('@', reverse(grantee))+3, locate('@', reverse(grantee))-3) as host from information_schema.user_privileges order by user, host) as Users where Users.User not like 'mysql.infoschema' AND Users.User not like 'mysql.session' AND Users.User not like 'mysql.sys' 2020-08-15T04:04:03.211197Z 18 Query SHOW CREATE USER 'root'@'localhost' 2020-08-15T04:04:03.211977Z 18 Query SHOW GRANTS FOR 'root'@'localhost' 2020-08-15T04:04:03.212358Z 18 Query SELECT INDEX_NAME, COLUMN_NAME FROM information_schema.statistics WHERE NON_UNIQUE = 0 AND SEQ_IN_INDEX = 1 AND TABLE_SCHEMA = 'test' AND TABLE_NAME = 'lockt' 2020-08-15T04:04:03.240608Z 18 Quit
$ grep '19 ' general.log 2020-08-15T04:04:03.206269Z 19 Connect root@localhost on using SSL/TLS 2020-08-15T04:04:03.207182Z 19 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ 2020-08-15T04:04:03.207402Z 19 Query START TRANSACTION WITH CONSISTENT SNAPSHOT 2020-08-15T04:04:03.207531Z 19 Query SET SQL_MODE = '' 2020-08-15T04:04:03.207673Z 19 Query SET NAMES 'utf8mb4' 2020-08-15T04:04:03.207802Z 19 Query SET SESSION net_write_timeout = 300 2020-08-15T04:04:03.208020Z 19 Query SET TIME_ZONE = '+00:00' 2020-08-15T04:04:03.215139Z 19 Query SHOW VARIABLES LIKE 'ndbinfo\_version' 2020-08-15T04:04:03.233438Z 19 Query USE `test` 2020-08-15T04:04:03.233704Z 19 Query show table status like 'lockt' 2020-08-15T04:04:03.235205Z 19 Query SET SQL_QUOTE_SHOW_CREATE=1 2020-08-15T04:04:03.235319Z 19 Query SET SESSION character_set_results = 'binary' 2020-08-15T04:04:03.235398Z 19 Query show create table `lockt` 2020-08-15T04:04:03.235584Z 19 Query SET SESSION character_set_results = 'utf8mb4' 2020-08-15T04:04:03.235685Z 19 Query show fields from `lockt` 2020-08-15T04:04:03.236751Z 19 Query select count(TRIGGER_NAME) from information_schema.triggers where TRIGGER_SCHEMA = 'test' and EVENT_OBJECT_TABLE = 'lockt' 2020-08-15T04:04:03.240935Z 19 Quit
- コネクションID 20: SELECTでデータを抜き出す
SELECT SQL_NO_CACHE `pk`,`name`,`favorite`,`addr` FROM `test`.`lockt` WHERE `pk` BETWEEN 1 AND 4 OR `pk` IS NULL ORDER BY `pk` /* mysqlsh dumpInstance, dumping table `test`.`lockt`, chunk ID: 0 */
と BETWEEN
を使って、データを分割して並列でバックアップする作りになっていることが読み取れます(今回はデータが小さすぎるので、データのダンプは並列されずこの1つのSELECTのみで終わってます)。
$ grep '20 ' general.log 2020-08-15T04:04:03.205716Z 20 Connect root@localhost on using SSL/TLS 2020-08-15T04:04:03.206984Z 20 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ 2020-08-15T04:04:03.207611Z 20 Query START TRANSACTION WITH CONSISTENT SNAPSHOT 2020-08-15T04:04:03.207739Z 20 Query SET SQL_MODE = '' 2020-08-15T04:04:03.207950Z 20 Query SET NAMES 'utf8mb4' 2020-08-15T04:04:03.208088Z 20 Query SET SESSION net_write_timeout = 300 2020-08-15T04:04:03.208156Z 20 Query SET TIME_ZONE = '+00:00' 2020-08-15T04:04:03.219989Z 20 Query SELECT COLUMN_NAME, DATA_TYPE FROM information_schema.columns WHERE EXTRA <> 'VIRTUAL GENERATED' AND EXTRA <> 'STORED GENERATED' AND TABLE_SCHEMA = 'test' AND TABLE_NAME = 'lockt' ORDER BY ORDINAL_POSITION 2020-08-15T04:04:03.221376Z 20 Query USE `test` 2020-08-15T04:04:03.222424Z 20 Query SHOW TRIGGERS LIKE 'lockt' 2020-08-15T04:04:03.224205Z 20 Query SET SESSION character_set_results = 'binary' 2020-08-15T04:04:03.224505Z 20 Query SELECT COLUMN_NAME, JSON_EXTRACT(HISTOGRAM, '$."number-of-buckets-specified"') FROM information_schema.COLUMN_STATISTICS WHERE SCHEMA_NAME = 'test' AND TABLE_NAME = 'lockt' 2020-08-15T04:04:03.224985Z 20 Query SET SESSION character_set_results = 'utf8mb4' 2020-08-15T04:04:03.225329Z 20 Query SELECT SQL_NO_CACHE MIN(`pk`), MAX(`pk`) FROM `test`.`lockt` 2020-08-15T04:04:03.227138Z 20 Query SELECT AVG_ROW_LENGTH FROM information_schema.tables WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 'lockt' 2020-08-15T04:04:03.230184Z 20 Query SELECT SQL_NO_CACHE `pk`,`name`,`favorite`,`addr` FROM `test`.`lockt` WHERE `pk` BETWEEN 1 AND 4 OR `pk` IS NULL ORDER BY `pk` /* mysqlsh dumpInstance, dumping table `test`.`lockt`, chunk ID: 0 */ 2020-08-15T04:04:03.231534Z 20 Quit
- コネクションID 21: ファンクションやプロシージャを抜き出す
$ grep '21 ' general.log /usr/sbin/mysqld, Version: 8.0.21 (MySQL Community Server - GPL). started with: 2020-08-15T04:04:03.205102Z 21 Connect root@localhost on using SSL/TLS 2020-08-15T04:04:03.206485Z 21 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ 2020-08-15T04:04:03.206782Z 21 Query START TRANSACTION WITH CONSISTENT SNAPSHOT 2020-08-15T04:04:03.207065Z 21 Query SET SQL_MODE = '' 2020-08-15T04:04:03.207263Z 21 Query SET NAMES 'utf8mb4' 2020-08-15T04:04:03.207343Z 21 Query SET SESSION net_write_timeout = 300 2020-08-15T04:04:03.207467Z 21 Query SET TIME_ZONE = '+00:00' 2020-08-15T04:04:03.214462Z 21 Query SHOW CREATE DATABASE IF NOT EXISTS test 2020-08-15T04:04:03.221139Z 21 Query USE `test` 2020-08-15T04:04:03.221476Z 21 Query show events 2020-08-15T04:04:03.223192Z 21 Query LOCK TABLES mysql.proc READ 2020-08-15T04:04:03.224379Z 21 Query use `test` 2020-08-15T04:04:03.224863Z 21 Query select @@collation_database 2020-08-15T04:04:03.225075Z 21 Query SET SESSION character_set_results = 'binary' 2020-08-15T04:04:03.225726Z 21 Query USE `test` 2020-08-15T04:04:03.225833Z 21 Query SHOW FUNCTION STATUS WHERE Db = 'test' 2020-08-15T04:04:03.229322Z 21 Query USE `test` 2020-08-15T04:04:03.229487Z 21 Query SHOW PROCEDURE STATUS WHERE Db = 'test' 2020-08-15T04:04:03.231929Z 21 Query SET SESSION character_set_results = 'utf8mb4' 2020-08-15T04:04:03.232081Z 21 Query UNLOCK TABLES 2020-08-15T04:04:03.233234Z 21 Quit
- コネクションID 22: 何もしてない(おそらくデータ量が少なかったため)
$ grep '22 ' general.log 2020-08-15T04:04:03.208791Z 22 Connect root@localhost on using SSL/TLS 2020-08-15T04:04:03.208917Z 22 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ 2020-08-15T04:04:03.208987Z 22 Query START TRANSACTION WITH CONSISTENT SNAPSHOT 2020-08-15T04:04:03.209066Z 22 Query SET SQL_MODE = '' 2020-08-15T04:04:03.209126Z 22 Query SET NAMES 'utf8mb4' 2020-08-15T04:04:03.209195Z 22 Query SET SESSION net_write_timeout = 300 2020-08-15T04:04:03.209252Z 22 Query SET TIME_ZONE = '+00:00' 2020-08-15T04:04:03.229120Z 22 Quit
どうやって一貫性を保ったバックアップを実現しているのか
\help util.dumpInstance
で表示される説明には consistent
オプションの記載があります(デフォルトは true
)。dumpInstance
では一貫性を持ったバックアップ*1が取れるようです。
複数のコネクション(スレッド)が一貫性を持ったバックアップを取るためにどのように連携しているのか確認してみました。
$ egrep 'Time|FLUSH|UNLOCK|LOCK|SELECT .* FROM `test`.`lockt`|START' general.log | grep -v '21 ' Time Id Command Argument 2020-08-15T04:04:03.168525Z 18 Query FLUSH TABLES WITH READ LOCK 2020-08-15T04:04:03.207402Z 19 Query START TRANSACTION WITH CONSISTENT SNAPSHOT 2020-08-15T04:04:03.207611Z 20 Query START TRANSACTION WITH CONSISTENT SNAPSHOT 2020-08-15T04:04:03.208987Z 22 Query START TRANSACTION WITH CONSISTENT SNAPSHOT 2020-08-15T04:04:03.209360Z 18 Query /*!80000 LOCK INSTANCE FOR BACKUP */ 2020-08-15T04:04:03.209530Z 18 Query UNLOCK TABLES 2020-08-15T04:04:03.225329Z 20 Query SELECT SQL_NO_CACHE MIN(`pk`), MAX(`pk`) FROM `test`.`lockt` 2020-08-15T04:04:03.230184Z 20 Query SELECT SQL_NO_CACHE `pk`,`name`,`favorite`,`addr` FROM `test`.`lockt` WHERE `pk` BETWEEN 1 AND 4 OR `pk` IS NULL ORDER BY `pk` /* mysqlsh dumpInstance, dumping table `test`.`lockt`, chunk ID: 0 */
まず、FLUSH TABLES WITH READ LOCK
(FTWRL) を使って、データベース全体の静止点を設けています(コネクションID: 18)。
各ダンプスレッド(コネクションID: 19-22)が、START TRANSACTION WITH CONSISTENT SNAPSHOT
を実行し、静止点からトランザクションを開始します。
各ダンプスレッドがトランザクションを開始し、静止点からSELECTを実行する準備ができたら、FTWRLの役割は終わりです。UNLOCK TABLES
で、全体ロックを開放します。
FTWRL
から UNLOCK TABLES
するまでの間、データベースに書き込むことはできませんが、通常は一瞬で終わります。
バックアップ中にデータベースの読み書きができるように工夫されてますね。
LOCK INSTANCE FOR BACKUP
見慣れない LOCK INSTANCE FOR BACKUP
が出てきました。
これは、MySQL 8.0 から登場した機能です。バックアップ中にテーブル定義が変更されてしまうと、バックアップが失敗してしまうため、DDLの実行をブロックしてくれます。
LOCK INSTANCE FOR BACKUP
されているデータベースに対して、DDLを実行すると Waiting for backup lock
で待たされます。
mysql> SHOW PROCESSLIST; +----+-----------------+<snip>-------------------------+------------------+ | Id | User |<snip> State | Info | +----+-----------------+<snip>-------------------------+------------------+ | 5 | event_scheduler |<snip> Waiting on empty queue | NULL | | 36 | root |<snip> starting | SHOW PROCESSLIST | | 39 | root |<snip> | NULL | | 40 | root |<snip> Waiting for backup lock | truncate test.t2 | +----+-----------------+<snip>-------------------------+------------------+ 4 rows in set (0.00 sec)
DDLが待機している状態で、対象テーブルにDMLを実行することは可能でした。
mysql> INSERT INTO test.t2 VALUES(); Query OK, 1 row affected (0.01 sec)
metadata lock
のようにDDLに続く後続のクエリが待機してしまう事はなさそうです(よかった!)。
まとめ
- MySQL Shell の dump / load Utility は並列バックアップなど強力なツールっぽい
FLUSH TABLES WITH READ LOCK
とLOCK INSTANCE FOR BACKUP
を使って、一貫性の持ったバックアップを確実に取れるように工夫されてる- dump 中は DDLが待機させられるので注意
*1:すべてのテーブルで同じ時点のデータがダンプされているバックアップを「一貫性を持ったバックアップ」と呼んでいます