ClickHouse表引擎 8.其他类型引擎

clickhouse

发布日期: 2021-07-02

文章字数: 3k

阅读时长: 13 分

阅读次数:

1. 日志类型表引擎

如果数据量比较小（100 万以下），面的的数据查询场景也相对简单，并且一次写入多次查询的模式，那么可以使用日志系列的表引擎。
日志系列的表引擎也有共同的特征，比如他们均不支持索引、分区等特性，不支持并发读写，针对一整日志表进行写入数据时，他的查询就会被阻塞，直至写入动作结束。
日志系列的表引擎也同时有用切实的物理存储，数据会被保存到本地文件中。
日志系列不同的表引擎也有自己特有的特征。

1.1 TinyLog

TinyLog 是日志系列表引擎里性能最低的，他的存储结构由数据文件和元数据两部分组成
元数据是按列独立存储的，每个列都有一个与之对应的.bin 文件，这种结构与 mergeTree 类似但是没有标记文件，同事也不支持分区。由于没有标记文件，所以无法支持.bin 文件的并行读取，它适合在非常简单的场景下使用。
使用示例

create table t_tinylog(
  id UInt32,
  price UInt32,
  time Datetime
)ENGINE = TinyLog


// 向 buffer 表写入数据
insert into t_tinylog values(1,100,'2021-06-26 11:40:41');
insert into t_tinylog values(2,102,'2021-06-26 11:40:43');
insert into t_tinylog values(3,103,'2021-06-26 11:40:44');
insert into t_tinylog values(4,104,'2021-06-26 11:40:45');
insert into t_tinylog values(5,105,'2021-06-26 11:40:46');

[root@node3 t_tinylog]# cd /var/lib/clickhouse/data/default/t_tinylog
[root@node3 t_tinylog]# ll
总用量 16
-rw-r----- 1 clickhouse clickhouse 150 8月   3 17:26 id.bin
-rw-r----- 1 clickhouse clickhouse 150 8月   3 17:26 price.bin
-rw-r----- 1 clickhouse clickhouse  95 8月   3 17:26 sizes.json
-rw-r----- 1 clickhouse clickhouse 150 8月   3 17:26 time.bin
# 在存储目录下面分别生成了，各个字段的.bin 文件

# 查看记录数据量大小的文件
[root@node3 t_tinylog]# cat sizes.json
{"yandex":{"id%2Ebin":{"size":"150"},"price%2Ebin":{"size":"150"},"time%2Ebin":{"size":"150"}}}

1.2 StripeLog

StripeLog 表引擎存储结构有固定的 3 个文件组成：

data.bin ：数据文件，所有的列字段使用同一个文件保存，
index.mrk：数据标记文件，保存了数据在 data.bin 中的位置，利用数据标记能够使用多个线程以并行的方式读取 data.bin 内的压缩块数据从而提升查询性能。
sizes.json：元数据文件，记录了 data.bin 和 index.mrk 大小的信息。

与 tinylog 相比 StripeLog 拥有更高的查询性能，同事使用量更小的文件描述符。

使用示例


create table t_stripelog(
  id UInt32,
  price UInt32,
  time Datetime
)ENGINE = StripeLog;


// 向 buffer 表写入数据
insert into t_stripelog values(1,100,'2021-06-26 11:40:41');
insert into t_stripelog values(2,102,'2021-06-26 11:40:43');
insert into t_stripelog values(3,103,'2021-06-26 11:40:44');
insert into t_stripelog values(4,104,'2021-06-26 11:40:45');
insert into t_stripelog values(5,105,'2021-06-26 11:40:46');

/var/lib/clickhouse/data/default/t_stripelog
[root@node3 t_stripelog]# ll
总用量 12
-rw-r----- 1 clickhouse clickhouse 785 8月   3 17:38 data.bin
-rw-r----- 1 clickhouse clickhouse 411 8月   3 17:38 index.mrk
-rw-r----- 1 clickhouse clickhouse  69 8月   3 17:38 sizes.json

cat sizes.json
{"yandex":{"data%2Ebin":{"size":"785"},"index%2Emrk":{"size":"411"}}}

1.3 Log

Log 表引擎结合了 TinyLog 和 StripeLog 的长处是日志系列表引擎中性能最高的表引擎，Log 表引擎的存储结构由 3 部分组成：

[column].bin ：数据文件，数据按列存储，每个字段拥有一个与之对应的 .bin 文件
index.mrk：数据标记文件，统一保存了数据在各个.bin 中的位置，利用数据标记能够使用多个线程以并行的方式读取 .bin 内的压缩块数据从而提升查询性能。
sizes.json：元数据文件，记录了 data.bin 和 index.mrk 大小的信息。

由于拥有数据标记，且各个列独立存储，所以 Log 既能够并行查询，又能够按列按需读取数据，而付出的代价仅仅是比StripeLog 更多的文件描述符。

使用示例

create table t_log(
  id UInt32,
  price UInt32,
  time Datetime
)ENGINE = Log;


// 写入数据
insert into t_log values(1,100,'2021-06-26 11:40:41');
insert into t_log values(2,102,'2021-06-26 11:40:43');
insert into t_log values(3,103,'2021-06-26 11:40:44');
insert into t_log values(4,104,'2021-06-26 11:40:45');
insert into t_log values(5,105,'2021-06-26 11:40:46');

/var/lib/clickhouse/data/default/t_log
[root@node3 t_log]#
[root@node3 t_log]# ll
总用量 20
-rw-r----- 1 clickhouse clickhouse 150 8月   3 17:45 id.bin
-rw-r----- 1 clickhouse clickhouse  48 8月   3 17:45 __marks.mrk
-rw-r----- 1 clickhouse clickhouse 150 8月   3 17:45 price.bin
-rw-r----- 1 clickhouse clickhouse 125 8月   3 17:45 sizes.json
-rw-r----- 1 clickhouse clickhouse 150 8月   3 17:45 time.bin

2. 接口类型表引擎

2.1 Merge

Merge 表引擎不存储任何数据，有人不支持数据写入，它的作用是负责合并多个查询结果集。
Merge 表引擎可以代理查询任意数量的数据表，这些查询会异步执行，并最终合并成一个结果集返回。
Merge 表引擎查询时，被代理的数据表被要求处于同一个数据库内，并且拥有相同的表结构，他们可以使用不同的表引擎与不同的分区定义方式
使用示例

// 创建三张测试数据表

create table t_test_merge1(
  id UInt32,
  price UInt32,
  time Datetime
)ENGINE = TinyLog;

create table t_test_merge2(
  id UInt32,
  price UInt32,
  time Datetime
)ENGINE = TinyLog;

create table t_test_merge3(
  id UInt32,
  price UInt32,
  time Datetime
)ENGINE = TinyLog;


// 向三张表写入数据
insert into t_test_merge1 values(1,100,'2021-06-26 11:40:41');
insert into t_test_merge1 values(2,102,'2021-06-26 11:40:43');
insert into t_test_merge2 values(3,103,'2021-06-26 11:40:44');
insert into t_test_merge2 values(4,104,'2021-06-26 11:40:45');
insert into t_test_merge3 values(5,105,'2021-06-26 11:40:46');
insert into t_test_merge3 values(6,105,'2021-06-26 11:40:47');

// 创建 Merge 表
create table t_merge(
  id UInt32,
  price UInt32,
  time Datetime
)ENGINE = Merge(default,'t_test_merge*');

// 查询数据
select * from t_merge;

┌─id─┬─price─┬────────────────time─┐
│  3 │   103 │ 2021-06-26 11:40:44 │
│  4 │   104 │ 2021-06-26 11:40:45 │
└────┴───────┴─────────────────────┘
┌─id─┬─price─┬────────────────time─┐
│  5 │   105 │ 2021-06-26 11:40:46 │
│  6 │   105 │ 2021-06-26 11:40:47 │
└────┴───────┴─────────────────────┘
┌─id─┬─price─┬────────────────time─┐
│  1 │   100 │ 2021-06-26 11:40:41 │
│  2 │   102 │ 2021-06-26 11:40:43 │
└────┴───────┴─────────────────────┘

// 查询指定表的数据

select * from t_merge where _table='t_test_merge1';
┌─id─┬─price─┬────────────────time─┐
│  1 │   100 │ 2021-06-26 11:40:41 │
│  2 │   102 │ 2021-06-26 11:40:43 │
└────┴───────┴─────────────────────┘

2.2 Dictionary

2.2.1 介绍

Dictionray 表引擎是数据字典的一层代理封装，它可以取代字典函数没让用户通过表查询字典。
Dictionray 表引擎的数据被加载后，会全部保存到内存中，所以使用Dictionray 表对字典性能不会有任何影响

2.2.2 使用介绍

增加数据字典配置

vim /etc/clickhouse-server/flat_dictionary.xml

<!-- 通过手动创建的flat字典配置 -->
<dictionaries>
    <dictionary>
        <name>test_flat_dict</name>
        <source>
            <!-- 准备好的测试数据 -->
            <file>
                <path>/etc/clickhouse-server/dict_data/organization.csv</path>
                <format>CSV</format>
            </file>
        </source>
        <layout>
              <flat/>
        </layout>
        <!-- 与测试数据的结构对应 -->
        <structure>

            <id>
                  <name>id</name>
            </id>
            <attribute>
                <name>code</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
            <attribute>
                <name>name</name>
                <type>String</type>
                <null_value></null_value>
            </attribute>
        </structure>
        <lifetime>
            <min>300</min>
            <max>360</max>
        </lifetime>
    </dictionary>
</dictionaries>

检查数据字典配置

在上述的配置中，source数据源是 CSV 格式的文件，structure 数据结构与其对应。将配置文件复制到 ClickHouse 服务节点的 /etc/clickhouseserver 目录后，即完成了对该字典的创建过程。查验 system.dictionaries 系统表后，能够看到flat字典已经创建成功。

:) SELECT name,type,key.names,key.types,attribute.names,attribute.types,origin FROM system.dictionaries;


┌─name───────────┬─type─┬─key.names─┬─key.types──┬─attribute.names─┬─attribute.types─────┬─origin─────────────────────────────────────┐
│ test_flat_dict │      │ ['id']    │ ['UInt64'] │ ['code','name'] │ ['String','String'] │ /etc/clickhouse-server/flat_dictionary.xml │
└────────────────┴──────┴───────────┴────────────┴─────────────────┴─────────────────────┴────────────────────────────────────────────┘

通过数据字典函数查询数据字典内容

-- 数据字典只能通过字典函数获取，以下为查询示例：
-- 使用 dictGet 函数查询数据
:) select dictGet('test_flat_dict','name',toUInt64(1)) as dept_name;

┌─dept_name─┐
│ 研发部    │
└───────────┘

创建数据字典表

-- 创建字典表
create table t_test_flat_dict(
  id UInt64,
  code String,
  name String
) ENGINE=Dictionary(test_flat_dict)

-- 查询数据
select * from t_test_flat_dict;

┌─id─┬─code──┬─name───┐
│  1 │ a0001 │ 研发部 │
│  2 │ a0002 │ 产品部 │
│  3 │ a0003 │ 数据部 │
│  4 │ a0004 │ 测试部 │
│  5 │ a0005 │ 运维部 │
│  6 │ a0006 │ 规划部 │
│  7 │ a0007 │ 市场部 │
└────┴───────┴────────┘

2.2.3 Dictionary 库引擎

如果数据字典很多，逐一创建各自的 Dictionary 表过于麻烦，这时，可以使用 Dictionary 类型的数据库引擎来解决这个问题。

// 创建数据字典类型的数据库
create database test_dictionaries ENGINE = Dictionary;

// 查询数据字典库中的数据表
select database,name,engine_full from system.tables where database = 'test_dictionaries';

┌─database──────────┬─name─────────────────────────────────┬─engine_full────────────────────────────────────────┐
│ test_dictionaries │ 577cf726-e8e5-4aae-8679-8fbfa4bf02ce │ Dictionary(`577cf726-e8e5-4aae-8679-8fbfa4bf02ce`) │
│ test_dictionaries │ test_cache_dict                      │ Dictionary(test_cache_dict)                        │
│ test_dictionaries │ test_complex_key_cache_dict          │ Dictionary(test_complex_key_cache_dict)            │
│ test_dictionaries │ test_complex_key_hashed_dict         │ Dictionary(test_complex_key_hashed_dict)           │
│ test_dictionaries │ test_flat_dict                       │ Dictionary(test_flat_dict)                         │
│ test_dictionaries │ test_hashed_dict                     │ Dictionary(test_hashed_dict)                       │
│ test_dictionaries │ test_ip_trie_dict                    │ Dictionary(test_ip_trie_dict)                      │
│ test_dictionaries │ test_range_hashed_dict               │ Dictionary(test_range_hashed_dict)                 │
└───────────────────┴──────────────────────────────────────┴────────────────────────────────────────────────────┘

// 之前测试的数据字典都已经被加载到数据字典库中

2.3 Distributed

在后面介绍副本与分片时介绍

3. 其他类型表引擎

3.1 Live View

Live View 是一种特殊的视图，他不属于表引擎，但是它与数据表相关密切
Live View 的作用类似与事件监听器，它能够将 SQL 查询结果作为监控目标，当目标数据增加时，Live View 可以及时发出响应。
使用示例

// 开启 live view 
set allow_experimental_live_view=1;

// 查看 Live View 是否开启
select name,value from system.settings where name like '%live_view%';

┌─name───────────────────────────────────────┬─value─┐
│ allow_experimental_live_view               │ 0     │
│ live_view_heartbeat_interval               │ 15    │
│ max_live_view_insert_blocks_before_refresh │ 64    │
│ temporary_live_view_timeout                │ 5     │
│ periodic_live_view_refresh                 │ 60    │
└────────────────────────────────────────────┴───────┘

// 创建一张数据表作为监听目标
create table t_live_view1(
  id UInt32,
  price UInt32,
  time Datetime
)ENGINE = TinyLog

// 创建 live view 建立监听

create live view lv_t_live_view1 as select count(1) count from t_live_view1;

// 写入数据
insert into t_live_view1 values(1,100,'2021-06-26 11:40:41');
insert into t_live_view1 values(2,102,'2021-06-26 11:40:43');
insert into t_live_view1 values(3,103,'2021-06-26 11:40:44');
insert into t_live_view1 values(4,104,'2021-06-26 11:40:44');

// 查看 live view
select * from lv_t_live_view1;

┌─count─┐
│     1 │
└───────┘

// 写入数据
insert into t_live_view1 values(2,102,'2021-06-26 11:40:43');

// 查看 live view
select * from lv_t_live_view1;

┌─count─┐
│     2 │
└───────┘

// 写入数据
insert into t_live_view1 values(3,103,'2021-06-26 11:40:44');

// 查看 live view
select * from lv_t_live_view1;

┌─count─┐
│     3 │
└───────┘


// 还可以使用 watch 查看表当前的状态

WATCH lv_t_live_view1

┌─count─┬─_version─┐
│     3 │        4 │
└───────┴──────────┘

3.2 Null

Null 表引擎的作用于 unix 系统的空设备 /dev/null 很相似，如果用户向 null 表写入数据，系统会正确返回，但是null 表会自动忽略数据，永远不会保存数据，所以向 null 表发起查询时，永远不会返回数据。

在使用物化视图的时候，如果不希望保留源表的数据，那么可以将源表设置成 null 表引擎。

使用示例：

create table t_null(
  id UInt8
) ENGINE = Null;

// 以 null 表为源表 建立物化视图
create materialized view view_t_null ENGINE = TinyLog as select * from t_null;

// 向 null 表写入数据
insert into t_null values(10);

// 查询数据

select * from v_t_null; // null 表查询不到数据

0 rows in set. Elapsed: 0.003 sec.

select * from view_t_null; // 物化视图能够查询到数据

┌─id─┐
│ 10 │
└────┘

3.3 URL

URL 表引擎的作用类似 HTTP 客户端，它可以通过 HTTP/HTTPS 协议，直接访问远端的 REST 服务。执行查询的时候，底层会转换为 GET 请求远程调用，而执行写入的时候，会将其转换为 POST 请求的远端调用。

语法

CREATE TABLE -- 创建表
 [IF NOT EXISTS] -- 如果该表名已经存在则不进行创建
 [db_name.] -- 指定数据库名，未指定则使用默认数据库：default
 table_name ( -- 表名 
   name1 -- 列名
     [type] -- 列数据类型
     [DEFAULT|MATERIALIZED|ALIASexpr],  -- 默认数据表达式
   name2[type][DEFAULT|MATERIALIZED|ALIASexpr],
   ...
 ) ENGINE = URL('url',format)

 // url 表示远端服务器地址，
 // format 表示 clickhouse 支持的数据格式，TSC,CSV,JSON 等

使用示例

// 建表
create table t_url(
  name String
)ENGINE=URL('http://node3:8080/users',JSONEachRow)

// 查询数据
select * from t_url;

// 写入数据
insert into t_url values("xiaoming");

hnbian

https://www.hnbian.cn/posts/515238b4.html

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 hnbian !

clickhouse olap

ClickHouse 10.数据查询

1. 介绍 clickhouse 对 SQL 语句的解析时大小写敏感的，这就意味这，select a 和 select A 表达的语义是不同的 clickhouse 支持的查询语法如下： [with expr | ( subquery

2021-07-10 clickhouse

clickhouse olap

ClickHouse表引擎 7.内存类型表引擎

1. 介绍内存类型表引擎有 Memory、Set、Join、Buffer 等。Memory 类型表引擎，直接将数据保存到内存当中，其余几种表引擎都会将数据写入磁盘，这样可以防止数据丢失，作为一种故障恢复手段，在数据表被加载时，它们会将数据全

2021-06-25 clickhouse

clickhouse olap

ClickHouse表引擎 8.其他类型引擎

1. 日志类型表引擎

1.1 TinyLog

1.2 StripeLog

1.3 Log

2. 接口类型表引擎

2.1 Merge

2.2 Dictionary

2.2.1 介绍

2.2.2 使用介绍

2.2.3 Dictionary 库引擎

2.3 Distributed

3. 其他类型表引擎

3.1 Live View

3.2 Null

3.3 URL

你的赏识是我前进的动力