mita2 database life

主にMySQLに関するメモです

MySQL Shell dumpInstance でバックアップ中はDDL が待機させられる

MySQL Sever Blog に MySQL Shell 8.0.21 の Dump / Import 機能についてエントリーが投稿されています。 並列化により非常に高速に Dump / Import 出来ることなどが示されてます。非常に強力なツールであることが感じられます。

https://mysqlserverteam.com/wp-content/uploads/2020/07/dump-1.png

MySQL Shell Dump & Load part 1: Demo! | MySQL Server Blog

とりあえず動かしてみる

\help util.dumpInstance で説明が見れます。 threadsmaxRate オプションでパフォーマンスの調整ができそうです。

$ mysqlsh root@localhost
mysqlsh > \help util.dumpInstance
NAME
      dumpInstance - Dumps the whole database to files in the output directory.

SYNTAX
      util.dumpInstance(outputUrl[, options])

WHERE
      outputUrl: Target directory to store the dump files.
      options: Dictionary with the dump options.

DESCRIPTION
      The outputUrl specifies where the dump is going to be stored.

      By default, a local directory is used, and in this case outputUrl can be
      prefixed with file:// scheme. If a relative path is given, the absolute
      path is computed as relative to the current working directory. If the
      output directory does not exist but its parent does, it is created. If
      the output directory exists, it must be empty. All directories created
      during the dump will have the following access rights (on operating
      systems which support them): rwxr-x---. All files created during the dump
      will have the following access rights (on operating systems which support
      them): rw-r-----.

      The following options are supported:
      - excludeSchemas: list of strings (default: empty) - list of schemas to
        be excluded from the dump.
      - excludeTables: list of strings (default: empty) - List of tables to be
        excluded from the dump in the format of schema.table.
      - users: bool (default: true) - Include users, roles and grants in the
        dump file.
      - events: bool (default: true) - Include events from each dumped schema.
      - routines: bool (default: true) - Include functions and stored
        procedures for each dumped schema.
      - triggers: bool (default: true) - Include triggers for each dumped
        table.
      - tzUtc: bool (default: true) - Convert TMESTAMP data to UTC.
      - consistent: bool (default: true) - Enable or disable consistent data
        dumps.
      - ddlOnly: bool (default: false) - Only dump Data Definition Language
        (DDL) from the database.
      - dataOnly: bool (default: false) - Only dump data from the database.
      - dryRun: bool (default: false) - Print information about what would be
        dumped, but do not dump anything.
      - ocimds: bool (default: false) - Enable checks for compatibility with
        MySQL Database Service (MDS)
      - compatibility: list of strings (default: empty) - Apply MySQL Database
        Service compatibility modifications when writing dump files. Supported
        values: "force_innodb", "strip_definers", "strip_restricted_grants",
        "strip_role_admin", "strip_tablespaces".
      - chunking: bool (default: true) - Enable chunking of the tables.
      - bytesPerChunk: string (default: "32M") - Sets average estimated number
        of bytes to be written to each chunk file, enables chunking.
      - threads: int (default: 4) - Use N threads to dump data chunks from the
        server.
      - maxRate: string (default: "0") - Limit data read throughput to maximum
            - maxRate: string (default: "0") - Limit data read throughput to maximum
        rate, measured in bytes per second per thread. Use maxRate="0" to set
        no limit.
      - showProgress: bool (default: true if stdout is a TTY device, false
        otherwise) - Enable or disable dump progress information.
      - compression: string (default: "zstd") - Compression used when writing
        the data dump files, one of: "none", "gzip", "zstd".
      - defaultCharacterSet: string (default: "utf8mb4") - Character set used
        for the dump.
      - osBucketName: string (default: not set) - Use specified OCI bucket for
        the location of the dump.
      - osNamespace: string (default: not set) - Specify the OCI namespace
        (tenancy name) where the OCI bucket is located.
      - ociConfigFile: string (default: not set) - Use the specified OCI
        configuration file instead of the one in the default location.
      - ociProfile: string (default: not set) - Use the specified OCI profile
        instead of the default one.
<snip>

引数に出力先のディレクトリを指定して実行します。

$ mysqlsh root@localhost
mysqlsh> util.dumpInstance("/tmp/dumpdir")

dump時 はどんなSQLが流れるのか

一般クエリログを有効にし、ダンプ処理中に流れるSQLを観測してみました。テスト用に小さなテーブル(lockt)を1つ用意してます。

mysql> SHOW TABLES FROM test;
+----------------+
| Tables_in_test |
+----------------+
| lockt          |
+----------------+
1 row in set (0.01 sec)

一般クエリログに出力されたログの Id 列に注目すると、コネクションIDが17〜22の5本のコネクションにより並列にバックアップ処理が行われていることがわかります。

$ tail -n +3 general.log  | awk '{ print $2 }' | sort | uniq
17
18
19
20
21
22
Id

コネクションごとにログを分解するとこんな感じ。

  • コネクションID 17: ポート番号確認
$ grep '17 ' general.log
2020-08-15T04:03:37.955055Z    17 Query select schema()
2020-08-15T04:04:03.138587Z    17 Query SELECT @@GLOBAL.port
2020-08-15T04:04:04.708654Z    17 Quit
  • コネクションID 18: FLUSH TABLE WITH READ LOCKLOCK INSTANCE FOR BACKUP でロックする
$ grep '18 ' general.log
2020-08-15T04:04:03.156406Z    18 Connect   root@localhost on  using SSL/TLS
2020-08-15T04:04:03.156714Z    18 Query SET SQL_MODE = ''
2020-08-15T04:04:03.156883Z    18 Query SET NAMES 'utf8mb4'
2020-08-15T04:04:03.157033Z    18 Query SET SESSION net_write_timeout = 300
2020-08-15T04:04:03.157150Z    18 Query SET TIME_ZONE = '+00:00'
2020-08-15T04:04:03.157338Z    18 Query SELECT SCHEMA_NAME FROM information_schema.schemata WHERE SCHEMA_NAME NOT IN ('information_schema', 'mysql', 'ndbinfo', 'performance_schema', 'sys')
2020-08-15T04:04:03.159080Z    18 Query SELECT TABLE_NAME FROM information_schema.tables WHERE TABLE_SCHEMA = 'test' AND TABLE_TYPE = 'BASE TABLE'
2020-08-15T04:04:03.160512Z    18 Query SELECT TABLE_NAME FROM information_schema.tables WHERE TABLE_SCHEMA = 'test' AND TABLE_TYPE = 'VIEW'
2020-08-15T04:04:03.161561Z    18 Query SELECT TABLE_ROWS FROM information_schema.tables WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 'lockt'
2020-08-15T04:04:03.162400Z    18 Query SELECT @@GLOBAL.HOSTNAME
2020-08-15T04:04:03.162565Z    18 Query SELECT @@GLOBAL.VERSION
2020-08-15T04:04:03.162733Z    18 Query SELECT @@GLOBAL.GTID_EXECUTED
2020-08-15T04:04:03.163244Z    18 Query select * from (select distinct grantee, substr(grantee, 2, length(grantee)-locate('@', reverse(grantee))-2) as User, substr(grantee, length(grantee)-locate('@', reverse(grantee))+3, locate('@', reverse(grantee))-3) as host from information_schema.user_privileges order by user, host) as Users where Users.User not like 'mysql.infoschema' AND Users.User not like 'mysql.session' AND Users.User not like 'mysql.sys'
2020-08-15T04:04:03.164371Z    18 Query USE `test`
2020-08-15T04:04:03.164583Z    18 Query show events
2020-08-15T04:04:03.165572Z    18 Query USE `test`
2020-08-15T04:04:03.165755Z    18 Query SHOW FUNCTION STATUS WHERE Db = 'test'
2020-08-15T04:04:03.166877Z    18 Query USE `test`
2020-08-15T04:04:03.167059Z    18 Query SHOW PROCEDURE STATUS WHERE Db = 'test'
2020-08-15T04:04:03.168525Z    18 Query FLUSH TABLES WITH READ LOCK
2020-08-15T04:04:03.209360Z    18 Query /*!80000 LOCK INSTANCE FOR BACKUP */
2020-08-15T04:04:03.209530Z    18 Query UNLOCK TABLES
2020-08-15T04:04:03.210561Z    18 Query select * from (select distinct grantee, substr(grantee, 2, length(grantee)-locate('@', reverse(grantee))-2) as User, substr(grantee, length(grantee)-locate('@', reverse(grantee))+3, locate('@', reverse(grantee))-3) as host from information_schema.user_privileges order by user, host) as Users where Users.User not like 'mysql.infoschema' AND Users.User not like 'mysql.session' AND Users.User not like 'mysql.sys'
2020-08-15T04:04:03.211197Z    18 Query SHOW CREATE USER 'root'@'localhost'
2020-08-15T04:04:03.211977Z    18 Query SHOW GRANTS FOR 'root'@'localhost'
2020-08-15T04:04:03.212358Z    18 Query SELECT INDEX_NAME, COLUMN_NAME FROM information_schema.statistics WHERE NON_UNIQUE = 0 AND SEQ_IN_INDEX = 1 AND TABLE_SCHEMA = 'test' AND TABLE_NAME = 'lockt'
2020-08-15T04:04:03.240608Z    18 Quit
$ grep '19 ' general.log
2020-08-15T04:04:03.206269Z    19 Connect   root@localhost on  using SSL/TLS
2020-08-15T04:04:03.207182Z    19 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-08-15T04:04:03.207402Z    19 Query START TRANSACTION WITH CONSISTENT SNAPSHOT
2020-08-15T04:04:03.207531Z    19 Query SET SQL_MODE = ''
2020-08-15T04:04:03.207673Z    19 Query SET NAMES 'utf8mb4'
2020-08-15T04:04:03.207802Z    19 Query SET SESSION net_write_timeout = 300
2020-08-15T04:04:03.208020Z    19 Query SET TIME_ZONE = '+00:00'
2020-08-15T04:04:03.215139Z    19 Query SHOW VARIABLES LIKE 'ndbinfo\_version'
2020-08-15T04:04:03.233438Z    19 Query USE `test`
2020-08-15T04:04:03.233704Z    19 Query show table status like 'lockt'
2020-08-15T04:04:03.235205Z    19 Query SET SQL_QUOTE_SHOW_CREATE=1
2020-08-15T04:04:03.235319Z    19 Query SET SESSION character_set_results = 'binary'
2020-08-15T04:04:03.235398Z    19 Query show create table `lockt`
2020-08-15T04:04:03.235584Z    19 Query SET SESSION character_set_results = 'utf8mb4'
2020-08-15T04:04:03.235685Z    19 Query show fields from `lockt`
2020-08-15T04:04:03.236751Z    19 Query select count(TRIGGER_NAME) from information_schema.triggers where TRIGGER_SCHEMA = 'test' and EVENT_OBJECT_TABLE = 'lockt'
2020-08-15T04:04:03.240935Z    19 Quit
  • コネクションID 20: SELECTでデータを抜き出す
SELECT SQL_NO_CACHE `pk`,`name`,`favorite`,`addr` FROM `test`.`lockt` WHERE `pk` BETWEEN 1 AND 4 OR `pk` IS NULL ORDER BY `pk` /* mysqlsh dumpInstance, dumping table `test`.`lockt`, chunk ID: 0 */

BETWEEN を使って、データを分割して並列でバックアップする作りになっていることが読み取れます(今回はデータが小さすぎるので、データのダンプは並列されずこの1つのSELECTのみで終わってます)。

$ grep '20 ' general.log
2020-08-15T04:04:03.205716Z    20 Connect   root@localhost on  using SSL/TLS
2020-08-15T04:04:03.206984Z    20 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-08-15T04:04:03.207611Z    20 Query START TRANSACTION WITH CONSISTENT SNAPSHOT
2020-08-15T04:04:03.207739Z    20 Query SET SQL_MODE = ''
2020-08-15T04:04:03.207950Z    20 Query SET NAMES 'utf8mb4'
2020-08-15T04:04:03.208088Z    20 Query SET SESSION net_write_timeout = 300
2020-08-15T04:04:03.208156Z    20 Query SET TIME_ZONE = '+00:00'
2020-08-15T04:04:03.219989Z    20 Query SELECT COLUMN_NAME, DATA_TYPE FROM information_schema.columns WHERE EXTRA <> 'VIRTUAL GENERATED' AND EXTRA <> 'STORED GENERATED' AND TABLE_SCHEMA = 'test' AND TABLE_NAME = 'lockt' ORDER BY ORDINAL_POSITION
2020-08-15T04:04:03.221376Z    20 Query USE `test`
2020-08-15T04:04:03.222424Z    20 Query SHOW TRIGGERS LIKE 'lockt'
2020-08-15T04:04:03.224205Z    20 Query SET SESSION character_set_results = 'binary'
2020-08-15T04:04:03.224505Z    20 Query SELECT COLUMN_NAME,                 JSON_EXTRACT(HISTOGRAM, '$."number-of-buckets-specified"')               FROM information_schema.COLUMN_STATISTICS               WHERE SCHEMA_NAME = 'test' AND TABLE_NAME = 'lockt'
2020-08-15T04:04:03.224985Z    20 Query SET SESSION character_set_results = 'utf8mb4'
2020-08-15T04:04:03.225329Z    20 Query SELECT SQL_NO_CACHE MIN(`pk`), MAX(`pk`) FROM `test`.`lockt`
2020-08-15T04:04:03.227138Z    20 Query SELECT AVG_ROW_LENGTH FROM information_schema.tables WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 'lockt'
2020-08-15T04:04:03.230184Z    20 Query SELECT SQL_NO_CACHE `pk`,`name`,`favorite`,`addr` FROM `test`.`lockt` WHERE `pk` BETWEEN 1 AND 4 OR `pk` IS NULL ORDER BY `pk` /* mysqlsh dumpInstance, dumping table `test`.`lockt`, chunk ID: 0 */
2020-08-15T04:04:03.231534Z    20 Quit
  • コネクションID 21: ファンクションやプロシージャを抜き出す
$ grep '21 ' general.log
/usr/sbin/mysqld, Version: 8.0.21 (MySQL Community Server - GPL). started with:
2020-08-15T04:04:03.205102Z    21 Connect   root@localhost on  using SSL/TLS
2020-08-15T04:04:03.206485Z    21 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-08-15T04:04:03.206782Z    21 Query START TRANSACTION WITH CONSISTENT SNAPSHOT
2020-08-15T04:04:03.207065Z    21 Query SET SQL_MODE = ''
2020-08-15T04:04:03.207263Z    21 Query SET NAMES 'utf8mb4'
2020-08-15T04:04:03.207343Z    21 Query SET SESSION net_write_timeout = 300
2020-08-15T04:04:03.207467Z    21 Query SET TIME_ZONE = '+00:00'
2020-08-15T04:04:03.214462Z    21 Query SHOW CREATE DATABASE IF NOT EXISTS test
2020-08-15T04:04:03.221139Z    21 Query USE `test`
2020-08-15T04:04:03.221476Z    21 Query show events
2020-08-15T04:04:03.223192Z    21 Query LOCK TABLES mysql.proc READ
2020-08-15T04:04:03.224379Z    21 Query use `test`
2020-08-15T04:04:03.224863Z    21 Query select @@collation_database
2020-08-15T04:04:03.225075Z    21 Query SET SESSION character_set_results = 'binary'
2020-08-15T04:04:03.225726Z    21 Query USE `test`
2020-08-15T04:04:03.225833Z    21 Query SHOW FUNCTION STATUS WHERE Db = 'test'
2020-08-15T04:04:03.229322Z    21 Query USE `test`
2020-08-15T04:04:03.229487Z    21 Query SHOW PROCEDURE STATUS WHERE Db = 'test'
2020-08-15T04:04:03.231929Z    21 Query SET SESSION character_set_results = 'utf8mb4'
2020-08-15T04:04:03.232081Z    21 Query UNLOCK TABLES
2020-08-15T04:04:03.233234Z    21 Quit
  • コネクションID 22: 何もしてない(おそらくデータ量が少なかったため)
$ grep '22 ' general.log
2020-08-15T04:04:03.208791Z    22 Connect   root@localhost on  using SSL/TLS
2020-08-15T04:04:03.208917Z    22 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
2020-08-15T04:04:03.208987Z    22 Query START TRANSACTION WITH CONSISTENT SNAPSHOT
2020-08-15T04:04:03.209066Z    22 Query SET SQL_MODE = ''
2020-08-15T04:04:03.209126Z    22 Query SET NAMES 'utf8mb4'
2020-08-15T04:04:03.209195Z    22 Query SET SESSION net_write_timeout = 300
2020-08-15T04:04:03.209252Z    22 Query SET TIME_ZONE = '+00:00'
2020-08-15T04:04:03.229120Z    22 Quit

どうやって一貫性を保ったバックアップを実現しているのか

\help util.dumpInstance で表示される説明には consistent オプションの記載があります(デフォルトは true)。dumpInstance では一貫性を持ったバックアップ*1が取れるようです。 複数のコネクション(スレッド)が一貫性を持ったバックアップを取るためにどのように連携しているのか確認してみました。

$ egrep 'Time|FLUSH|UNLOCK|LOCK|SELECT .* FROM `test`.`lockt`|START' general.log  | grep -v '21 '
Time                 Id Command    Argument
2020-08-15T04:04:03.168525Z        18 Query     FLUSH TABLES WITH READ LOCK

2020-08-15T04:04:03.207402Z        19 Query     START TRANSACTION WITH CONSISTENT SNAPSHOT
2020-08-15T04:04:03.207611Z        20 Query     START TRANSACTION WITH CONSISTENT SNAPSHOT
2020-08-15T04:04:03.208987Z        22 Query     START TRANSACTION WITH CONSISTENT SNAPSHOT

2020-08-15T04:04:03.209360Z        18 Query     /*!80000 LOCK INSTANCE FOR BACKUP */
2020-08-15T04:04:03.209530Z        18 Query     UNLOCK TABLES

2020-08-15T04:04:03.225329Z        20 Query     SELECT SQL_NO_CACHE MIN(`pk`), MAX(`pk`) FROM `test`.`lockt`
2020-08-15T04:04:03.230184Z        20 Query     SELECT SQL_NO_CACHE `pk`,`name`,`favorite`,`addr` FROM `test`.`lockt` WHERE `pk` BETWEEN 1 AND 4 OR `pk` IS NULL ORDER BY `pk` /* mysqlsh dumpInstance, dumping table `test`.`lockt`, chunk ID: 0 */

まず、FLUSH TABLES WITH READ LOCK (FTWRL) を使って、データベース全体の静止点を設けています(コネクションID: 18)。 各ダンプスレッド(コネクションID: 19-22)が、START TRANSACTION WITH CONSISTENT SNAPSHOT を実行し、静止点からトランザクションを開始します。 各ダンプスレッドがトランザクションを開始し、静止点からSELECTを実行する準備ができたら、FTWRLの役割は終わりです。UNLOCK TABLESで、全体ロックを開放します。

FTWRL から UNLOCK TABLES するまでの間、データベースに書き込むことはできませんが、通常は一瞬で終わります。 バックアップ中にデータベースの読み書きができるように工夫されてますね。

LOCK INSTANCE FOR BACKUP

見慣れない LOCK INSTANCE FOR BACKUP が出てきました。 これは、MySQL 8.0 から登場した機能です。バックアップ中にテーブル定義が変更されてしまうと、バックアップが失敗してしまうため、DDLの実行をブロックしてくれます。 LOCK INSTANCE FOR BACKUP されているデータベースに対して、DDLを実行すると Waiting for backup lock で待たされます。

mysql> SHOW PROCESSLIST;
+----+-----------------+<snip>-------------------------+------------------+
| Id | User            |<snip> State                   | Info             |
+----+-----------------+<snip>-------------------------+------------------+
|  5 | event_scheduler |<snip> Waiting on empty queue  | NULL             |
| 36 | root            |<snip> starting                | SHOW PROCESSLIST |
| 39 | root            |<snip>                         | NULL             |
| 40 | root            |<snip> Waiting for backup lock | truncate test.t2 |
+----+-----------------+<snip>-------------------------+------------------+
4 rows in set (0.00 sec)

DDLが待機している状態で、対象テーブルにDMLを実行することは可能でした。

mysql> INSERT INTO test.t2 VALUES();
Query OK, 1 row affected (0.01 sec)

metadata lock のようにDDLに続く後続のクエリが待機してしまう事はなさそうです(よかった!)。

まとめ

  • MySQL Shell の dump / load Utility は並列バックアップなど強力なツールっぽい
  • FLUSH TABLES WITH READ LOCKLOCK INSTANCE FOR BACKUP を使って、一貫性の持ったバックアップを確実に取れるように工夫されてる
  • dump 中は DDLが待機させられるので注意

*1:すべてのテーブルで同じ時点のデータがダンプされているバックアップを「一貫性を持ったバックアップ」と呼んでいます