1. 多路径存储策略介绍
ClickHouse 19.15 版本之前,Merge 支持支单路径存储,所有数据都被写入 config.xml 配置的 path 指定的路径下,即使挂载多块磁盘也无法有效利用这些空间
为了解决这个痛点,19.15 版本实现了自定义存储策略功能,支持以数据分区为最小移动单元,将分区目录写入多个磁盘目录。
2. 配置文件 存储配置需要预先定义在 config.xml 配置文件中,由 storage_configuration 标签标示,在 storage_configuration 之下又分成了 disks 和 policies 两组标签标示,分别表示磁盘与存储策略。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 <storage_configuration > <disks > <disk_name_a > <path > /clickhouse/data</path > <keep_free_space_bytes > 1073741824</keep_free_space_bytes > </disk_name_a > <disk_name_b > <path > /clickhouse2/data</path > <keep_free_space_bytes > 1073741824</keep_free_space_bytes > </disk_name_b > </disks > ...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 <policies > <policie_name_a > <volumes > <volume_name_a > <disk > disk_name_a</disk > <disk > disk_name_b</disk > <max_data_part_size_bytes > 1048576</max_data_part_size_bytes > </volume_name_a > <volume_name_b > ... </volume_name_b > </volumes > <move_factor > 0.2</move_factor > </policie_name_a > <policie_name_b > ... </policie_name_b > </policies > </storage_configuration >
3. 策略类型 根据配置策略不同,现在有三种存储策略:
默认存储策略
JBOD 存储策略
HOT/COLD 存储策略
4. 默认存储策略 MergeTree 原本的存储策略,无需任何配置,所有分区会自动保存到config.xml 配置的 path指定的路径下。
5. JBOD 存储策略 5.1 介绍
这种策略适合服务器挂载了多块磁盘,但是没有做 RAID 的场景。
JBOD 全称为:Just a Bunch of Disks , 它是一种轮询策略,每执行一次 insert 或者 merge ,所产生的的新的分区会轮询写入各个磁盘。
JBOD 类似 RAID 0 ,可以降低单块磁盘的负载,在一定条件下能够增加数据并行读写的性能。
如果单块磁盘发生故障,则会丢掉应用 JBOD 策略写入的这部分数据。
5.2 配置示例
1 2 3 4 # 切换到配置文件夹 [root@node3 ~]# cd /etc/clickhouse-server # 编辑 config.xml 配置文件 [root@node3 clickhouse-server]# vim config.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 <storage_configuration > <disks > <disk1 > <path > /clickhouse/data1/</path > <keep_free_space_bytes > 1073741824</keep_free_space_bytes > </disk1 > <disk2 > <path > /clickhouse/data2/</path > <keep_free_space_bytes > 1073741824</keep_free_space_bytes > </disk2 > </disks > <policies > <policie_name_a > <volumes > <volume_name_a > <disk > disk1</disk > <disk > disk2</disk > </volume_name_a > </volumes > <move_factor > 0.2</move_factor > </policie_name_a > </policies > </storage_configuration >
1 2 sudo chown clickhouse:clickhouse -R /clickhouse/data1 /clickhouse/data2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # 关闭服务 systemctl stop clickhouse-server.service # 启动服务 systemctl start clickhouse-server.service # 验证服务是否启动成功 [root@node3 clickhouse]# netstat -nltp | grep clickhouse tcp6 0 0 :::9977 :::* LISTEN 19971/clickhouse-se tcp6 0 0 :::8123 :::* LISTEN 19971/clickhouse-se tcp6 0 0 :::9004 :::* LISTEN 19971/clickhouse-se tcp6 0 0 :::9009 :::* LISTEN 19971/clickhouse-se [root@node3 clickhouse]# ps -aux | grep clickhouse clickho+ 19970 0.0 0.0 477408 22868 ? Ss 11:35 0:00 clickhouse-watchdog --config=/etc/clickhouse-server/config.xml --pid-file=/run/clickhouse-server/clickhouse-server.pid clickho+ 19971 1.2 0.8 1395324 270444 ? SLl 11:35 0:03 /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml --pid-file=/run/clickhouse-server/clickhouse-server.pid root 22516 0.0 0.0 112728 1000 pts/0 S+ 11:40 0:00 grep --color=auto clickhouse
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 SELECT name, path, formatReadableSize(free_space) AS free , formatReadableSize(total_space) AS total, formatReadableSize(keep_free_space) AS keep_free FROM system.disks┌─name────┬─path─────────────────┬─free ─────┬─total────┬─keep_free─┐ │ default │ / var/ lib/ clickhouse/ │ 1.61 TiB │ 1.79 TiB │ 0.00 B │ │ disk1 │ / clickhouse/ data1/ │ 1.61 TiB │ 1.79 TiB │ 1.00 GiB │ │ disk2 │ / clickhouse/ data2/ │ 1.61 TiB │ 1.79 TiB │ 1.00 GiB │ └─────────┴──────────────────────┴──────────┴──────────┴───────────┘
1 2 3 4 5 6 7 8 9 10 SELECT * FROM system.storage_policies┌─policy_name────┬─volume_name───┬─volume_priority─┬─disks─────────────┬─volume_type─┬─max_data_part_size─┬─move_factor─┬─prefer_not_to_merge─┐ │ default │ default │ 1 │ ['default' ] │ JBOD │ 0 │ 0 │ 0 │ │ policie_name_a │ volume_name_a │ 1 │ ['disk1' ,'disk2' ] │ JBOD │ 0 │ 0.2 │ 0 │ └────────────────┴───────────────┴─────────────────┴───────────────────┴─────────────┴────────────────────┴─────────────┴─────────────────────┘
5.3 使用测试 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 CREATE TABLE t_jbod( `id` UInt64 ) ENGINE = MergeTree ORDER BY idSETTINGS storage_policy = 'policie_name_a' insert into t_jbod select rand() from numbers(10 );SELECT name, disk_name FROM system.partsWHERE table = 't_jbod' Query id: a76ae2eb- abf9-4809 - bc8e- a55f2ccb02f1 ┌─name──────┬─disk_name─┐ │ all_1_1_0 │ disk1 │ └───────────┴───────────┘ insert into t_jbod select rand() from numbers(10 );select name,disk_name from system.parts where table = 't_jbod' ;┌─name──────┬─disk_name─┐ │ all_1_1_0 │ disk1 │ │ all_2_2_0 │ disk2 │ └───────────┴───────────┘ optimize table t_jbod; select name,disk_name from system.parts where table = 't_jbod' ;┌─name──────┬─disk_name─┐ │ all_1_1_0 │ disk1 │ │ all_1_2_1 │ disk1 │ │ all_2_2_0 │ disk2 │ └───────────┴───────────┘
在 JBOD 策略中,多个磁盘会组成一个磁盘组,即卷,每当生成一个新分区的时候,分区目录会依照卷中定义的磁盘顺序,依次轮询写入。
5.4 测试增加磁盘地址 1 2 3 4 5 mkdir /clickhouse/data3 sudo chown clickhouse:clickhouse -R /clickhouse/data3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 <storage_configuration > <disks > <disk1 > <path > /clickhouse/data1/</path > <keep_free_space_bytes > 1073741824</keep_free_space_bytes > </disk1 > <disk2 > <path > /clickhouse/data2/</path > <keep_free_space_bytes > 1073741824</keep_free_space_bytes > </disk2 > <disk3 > <path > /clickhouse/data3/</path > <keep_free_space_bytes > 1073741824</keep_free_space_bytes > </disk3 > </disks > <policies > <policie_name_a > <volumes > <volume_name_a > <disk > disk1</disk > <disk > disk2</disk > <disk > disk3</disk > <max_data_part_size_bytes > 1073741824</max_data_part_size_bytes > </volume_name_a > </volumes > <move_factor > 0.2</move_factor > </policie_name_a > </policies > </storage_configuration >
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 # 关闭服务 systemctl stop clickhouse-server.service # 启动服务 systemctl start clickhouse-server.service # 验证服务是否启动成功 [root@node3 clickhouse]# netstat -nltp | grep clickhouse tcp6 0 0 :::9977 :::* LISTEN 19971/clickhouse-se tcp6 0 0 :::8123 :::* LISTEN 19971/clickhouse-se tcp6 0 0 :::9004 :::* LISTEN 19971/clickhouse-se tcp6 0 0 :::9009 :::* LISTEN 19971/clickhouse-se [root@node3 clickhouse]# ps -aux | grep clickhouse clickho+ 19970 0.0 0.0 477408 22868 ? Ss 11:35 0:00 clickhouse-watchdog --config=/etc/clickhouse-server/config.xml --pid-file=/run/clickhouse-server/clickhouse-server.pid clickho+ 19971 1.2 0.8 1395324 270444 ? SLl 11:35 0:03 /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml --pid-file=/run/clickhouse-server/clickhouse-server.pid root 22516 0.0 0.0 112728 1000 pts/0 S+ 11:40 0:00 grep --color=auto clickhouse
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 SELECT name, path, formatReadableSize(free_space) AS free , formatReadableSize(total_space) AS total, formatReadableSize(keep_free_space) AS keep_free FROM system.disks┌─name────┬─path─────────────────┬─free ─────┬─total────┬─keep_free─┐ │ default │ / var/ lib/ clickhouse/ │ 1.61 TiB │ 1.79 TiB │ 0.00 B │ │ disk1 │ / clickhouse/ data1/ │ 1.61 TiB │ 1.79 TiB │ 1.00 GiB │ │ disk2 │ / clickhouse/ data2/ │ 1.61 TiB │ 1.79 TiB │ 1.00 GiB │ │ disk3 │ / clickhouse/ data3/ │ 1.61 TiB │ 1.79 TiB │ 1.00 GiB │ └─────────┴──────────────────────┴──────────┴──────────┴───────────┘ SELECT * FROM system.storage_policies;┌─policy_name────┬─volume_name───┬─volume_priority─┬─disks─────────────────────┬─volume_type─┬─max_data_part_size─┬─move_factor─┬─prefer_not_to_merge─┐ │ default │ default │ 1 │ ['default' ] │ JBOD │ 0 │ 0 │ 0 │ │ policie_name_a │ volume_name_a │ 1 │ ['disk1' ,'disk2' ,'disk3' ] │ JBOD │ 0 │ 0.2 │ 0 │ └────────────────┴───────────────┴─────────────────┴───────────────────────────┴─────────────┴────────────────────┴─────────────┴─────────────────────┘ select name,disk_name from system.parts where table = 't_jbod' ;┌─name──────┬─disk_name─┐ │ all_1_7_2 │ disk2 │ └───────────┴───────────┘ insert into t_jbod select rand() from numbers(10 );select name,disk_name from system.parts where table = 't_jbod' ;┌─name──────┬─disk_name─┐ │ all_1_7_2 │ disk2 │ │ all_8_8_0 │ disk1 │ └───────────┴───────────┘ insert into t_jbod select rand() from numbers(10 );select name,disk_name from system.parts where table = 't_jbod' ;┌─name──────┬─disk_name─┐ │ all_1_7_2 │ disk2 │ │ all_8_8_0 │ disk1 │ │ all_9_9_0 │ disk2 │ └───────────┴───────────┘ insert into t_jbod select rand() from numbers(10 );select name,disk_name from system.parts where table = 't_jbod' ;┌─name────────┬─disk_name─┐ │ all_1_7_2 │ disk2 │ │ all_8_8_0 │ disk1 │ │ all_9_9_0 │ disk2 │ │ all_10_10_0 │ disk3 │ └─────────────┴───────────┘
6. HOT/COLD 存储策略 6.1 介绍
HOT/COLD 存储策略适用于服务器挂在了不同类型磁盘的场景。
将不同类型的磁盘划分为 hot 和 cold 两种类型。
hot 类型对应的磁盘为 SSD 一类的高性能存储媒介,注重存取的性能。
cold 类型对应的是 HDD 这类的大容量存储媒介,注重经济性。
在数据写入 MergeTree 初期,会到 hot 区域创建分区目录并保存数据,当数据大小累积到阈值时,数据会移动到 cold 区域。
在每个分区的内部,也支持多个磁盘,所以在单个区域的写入过程中,也能应用 JBOD 策略
6.2 配置示例 在前面的配置文件基础上增加策略即可
1 2 3 4 5 # 切换到配置文件夹 [root@node3 ~]# cd /etc/clickhouse-server # 编辑 config.xml 配置文件 [root@node3 clickhouse-server]# vim config.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 <storage_configuration > <disks > <disk1 > <path > /clickhouse/data1/</path > <keep_free_space_bytes > 1073741824</keep_free_space_bytes > </disk1 > <disk2 > <path > /clickhouse/data2/</path > <keep_free_space_bytes > 1073741824</keep_free_space_bytes > </disk2 > <disk3 > <path > /clickhouse/data3/</path > <keep_free_space_bytes > 1073741824</keep_free_space_bytes > </disk3 > </disks > <policies > <policie_name_a > <volumes > <volume_name_a > <disk > disk1</disk > <disk > disk2</disk > <disk > disk3</disk > </volume_name_a > </volumes > <move_factor > 0.2</move_factor > </policie_name_a > <hot_to_cold > <volumes > <hot > <disk > disk1</disk > <max_data_part_size_bytes > 1048576</max_data_part_size_bytes > </hot > <cold > <disk > disk2</disk > </cold > </volumes > <move_factor > 0.2</move_factor > </hot_to_cold > </policies > </storage_configuration >
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # 关闭服务 systemctl stop clickhouse-server.service # 启动服务 systemctl start clickhouse-server.service # 验证服务是否启动成功 [root@node3 clickhouse]# netstat -nltp | grep clickhouse tcp6 0 0 :::9977 :::* LISTEN 19971/clickhouse-se tcp6 0 0 :::8123 :::* LISTEN 19971/clickhouse-se tcp6 0 0 :::9004 :::* LISTEN 19971/clickhouse-se tcp6 0 0 :::9009 :::* LISTEN 19971/clickhouse-se [root@node3 clickhouse]# ps -aux | grep clickhouse clickho+ 19970 0.0 0.0 477408 22868 ? Ss 11:35 0:00 clickhouse-watchdog --config=/etc/clickhouse-server/config.xml --pid-file=/run/clickhouse-server/clickhouse-server.pid clickho+ 19971 1.2 0.8 1395324 270444 ? SLl 11:35 0:03 /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml --pid-file=/run/clickhouse-server/clickhouse-server.pid root 22516 0.0 0.0 112728 1000 pts/0 S+ 11:40 0:00 grep --color=auto clickhouse
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 SELECT policy_name, volume_name, disks, volume_type, formatReadableSize(max_data_part_size) AS max_data_part_size FROM system.storage_policiesQuery id: 0 dee9fb6-0 fa7-4 f46- b282-5 b769f1342ef ┌─policy_name────┬─volume_name───┬─disks─────────────────────┬─volume_type─┬─max_data_part_size─┐ │ default │ default │ ['default' ] │ JBOD │ 0.00 B │ │ hot_to_cold │ hot │ ['disk1' ] │ JBOD │ 1.00 MiB │ │ hot_to_cold │ cold │ ['disk2' ] │ JBOD │ 0.00 B │ │ policie_name_a │ volume_name_a │ ['disk1' ,'disk2' ,'disk3' ] │ JBOD │ 0.00 B │ └────────────────┴───────────────┴───────────────────────────┴─────────────┴────────────────────┘
6.3 使用测试 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 CREATE TABLE t_hot_to_cold( `id` UInt64 ) ENGINE = MergeTree ORDER BY idSETTINGS storage_policy = 'hot_to_cold' insert into t_hot_to_cold select rand() from numbers(100000 );select name,disk_name from system.parts where table = 't_hot_to_cold' ;SELECT name, disk_name FROM system.partsWHERE table = 't_hot_to_cold' Query id: fdc909bc-508 f-40e6 -8 eb2-43 dd89c78373 ┌─name──────┬─disk_name─┐ │ all_1_1_0 │ disk1 │ └───────────┴───────────┘ insert into t_hot_to_cold select rand() from numbers(100000 );select name,disk_name from system.parts where table = 't_hot_to_cold' ;┌─name──────┬─disk_name─┐ │ all_1_1_0 │ disk1 │ │ all_2_2_0 │ disk1 │ └───────────┴───────────┘ optimize table t_hot_to_cold; select name,disk_name from system.parts where table = 't_hot_to_cold' ;┌─name──────┬─disk_name─┐ │ all_1_1_0 │ disk1 │ │ all_1_2_1 │ disk2 │ │ all_2_2_0 │ disk1 │ └───────────┴───────────┘ SELECT name, disk_name, formatReadableSize(bytes_on_disk) AS bytes_on_disk, active FROM system.partsWHERE table = 't_hot_to_cold' Query id: 2472 a34a- a266-444 f-83e5 - fee0459db187 ┌─name──────┬─disk_name─┬─bytes_on_disk─┬─active─┐ │ all_1_1_0 │ disk1 │ 538.88 KiB │ 0 │ │ all_1_2_1 │ disk2 │ 1.01 MiB │ 1 │ │ all_2_2_0 │ disk1 │ 538.69 KiB │ 0 │ └───────────┴───────────┴───────────────┴────────┘ insert into t_hot_to_cold select rand() from numbers(300000 );select name,disk_name from system.parts where table = 't_hot_to_cold' ;┌─name──────┬─disk_name─┐ │ all_1_2_1 │ disk2 │ │ all_3_3_0 │ disk2 │ └───────────┴───────────┘
7. 手动移动分区
MergeTree 存储策略不能修改,但是可以手动修改分区所在磁盘
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 select name,disk_name from system.parts where table = 't_jbod' ;┌─name───────┬─disk_name─┐ │ all_1_10_3 │ disk1 │ └────────────┴───────────┘ alter table t_jbod move part 'all_1_10_3' to disk 'disk2' ;select name,disk_name from system.parts where table = 't_jbod' ;┌─name───────┬─disk_name─┐ │ all_1_10_3 │ disk2 │ └────────────┴───────────┘ insert into t_hot_to_cold select rand() from numbers(100000 );select name,disk_name from system.parts where table = 't_hot_to_cold' ;┌─name──────┬─disk_name─┐ │ all_1_2_1 │ disk2 │ │ all_3_3_0 │ disk2 │ │ all_4_4_0 │ disk1 │ └───────────┴───────────┘ alter table t_hot_to_cold move part 'all_4_4_0' to volume 'cold' ;select name,disk_name from system.parts where table = 't_hot_to_cold' ;┌─name──────┬─disk_name─┐ │ all_1_2_1 │ disk2 │ │ all_3_3_0 │ disk2 │ │ all_4_4_0 │ disk2 │ └───────────┴───────────┘ alter table t_hot_to_cold move part 'all_1_2_1' to volume 'hot' ;Received exception from server (version 21.4 .3 ): Code: 243. DB::Exception: Received from localhost:9977. DB::Exception: Move is not possible. Not enough space on 'hot' .