1. 异常说明
打算重装集群里面的 kafka,但是删除服务之后重新安装时在选择主机时报错了,然后就卡在了这个地方。截图如下:
同时在日志中也出现了异常信息,日志中的异常信息如下:
# 查看日志文件内容
less /var/log/ambari-server/ambari-server.log
...
2020-12-05 11:24:27,501 ERROR [ambari-client-thread-7299] HostImpl:1106 - Config inconsistency exists: unknown configType=kafka-broker
2020-12-05 11:24:27,503 ERROR [ambari-client-thread-7299] ReadHandler:102 - Caught a runtime exception executing a query
java.lang.NullPointerException
at org.apache.ambari.server.state.HostConfig.hashCode(HostConfig.java:57)
at java.util.TreeMap$Entry.hashCode(TreeMap.java:2112)
at java.util.AbstractMap.hashCode(AbstractMap.java:530)
at java.util.Collections$SynchronizedMap.hashCode(Collections.java:2634)
at java.util.TreeMap$Entry.hashCode(TreeMap.java:2112)
at java.util.AbstractMap.hashCode(AbstractMap.java:530)
at java.util.Collections$SynchronizedMap.hashCode(Collections.java:2634)
at org.apache.ambari.server.controller.internal.ResourceImpl.hashCode(ResourceImpl.java:163)
...
2. 问题定位与解决
HostImpl:1106 - Config inconsistency exists: unknown configType=kafka-broker,大概原因是因为已经在ambari-server中将kafka服务删除掉,出现该错误怀疑上次删除完关闭ambari-server时,没有来得及更新数据库导致的,所以需要手动变更数据库才能解决
2.1 源码解析
定位到异常错误的地方,在HostImpl中,看到如下的代码:
@Override
public Map<String, HostConfig> getDesiredHostConfigs(Cluster cluster,
Map<String, DesiredConfig> clusterDesiredConfigs) throws AmbariException {
Map<String, HostConfig> hostConfigMap = new HashMap<String, HostConfig>();
if( null == cluster ){
clusterDesiredConfigs = new HashMap<String, DesiredConfig>();
}
// per method contract, fetch if not supplied
if (null == clusterDesiredConfigs) {
clusterDesiredConfigs = cluster.getDesiredConfigs();
}
if (clusterDesiredConfigs != null) {
for (Map.Entry<String, DesiredConfig> desiredConfigEntry
: clusterDesiredConfigs.entrySet()) {
HostConfig hostConfig = new HostConfig();
hostConfig.setDefaultVersionTag(desiredConfigEntry.getValue().getTag());
hostConfigMap.put(desiredConfigEntry.getKey(), hostConfig);
}
}
// 怀疑是这里`cluster.getConfigGroupsByHostname(getHostName())`引入了没有删除的数据
Map<Long, ConfigGroup> configGroups = (cluster == null) ? new HashMap<Long, ConfigGroup>() : cluster.getConfigGroupsByHostname(getHostName());
if (configGroups != null && !configGroups.isEmpty()) {
for (ConfigGroup configGroup : configGroups.values()) {
for (Map.Entry<String, Config> configEntry : configGroup
.getConfigurations().entrySet()) {
String configType = configEntry.getKey();
// HostConfig config holds configType -> versionTag, per config group
HostConfig hostConfig = hostConfigMap.get(configType);
if (hostConfig == null) {
hostConfig = new HostConfig();
hostConfigMap.put(configType, hostConfig);
if (cluster != null) {
Config conf = cluster.getDesiredConfigByType(configType);
if(conf == null) {
// 报错出现在这个地方,说明该config已经被清理,但是循环还是走到了这里!
LOG.error("Config inconsistency exists:"+
" unknown configType="+configType);
} else {
hostConfig.setDefaultVersionTag(conf.getTag());
}
}
}
Config config = configEntry.getValue();
hostConfig.getConfigGroupOverrides().put(configGroup.getId(),
config.getTag());
}
}
}
return hostConfigMap;
}
在ambari中,同数据库的连接是基于jpa完成的,找到ClusterEntity
下的configGroupEntities定义,可以看出ClusterEntity同ConfigGroupEntity之间的关系,
@OneToMany(mappedBy = "clusterEntity", cascade = CascadeType.ALL)
private Collection<ConfigGroupEntity> configGroupEntities;
在ConfigGroupEnity
中,可以发现其依赖一下两个表:
@OneToMany(mappedBy = "configGroupEntity", cascade = CascadeType.ALL)
private Collection<ConfigGroupHostMappingEntity> configGroupHostMappingEntities;
@OneToMany(mappedBy = "configGroupEntity", cascade = CascadeType.ALL)
private Collection<ConfigGroupConfigMappingEntity> configGroupConfigMappingEntities;
在数据库中找到如下对应关系,定位到问题:
mysql> select * from configgroup;
+----------+------------+------------+-------+--------------+------------------+--------------+
| group_id | cluster_id | group_name | tag | description | create_timestamp | service_name |
+----------+------------+------------+-------+--------------+------------------+--------------+
| 52 | 2 | ndoe1 | KAFKA | node1 配置 | 1595388553861 | KAFKA |
| 53 | 2 | node2 | KAFKA | node2 配置 | 1595388828405 | KAFKA |
| 54 | 2 | node3 | KAFKA | node3 配置 | 1595388901365 | KAFKA |
+----------+------------+------------+-------+--------------+------------------+--------------+
3 rows in set (0.00 sec)
mysql> select * from confgroupclusterconfigmapping;
+-----------------+------------+--------------+--------------------------------------+-----------+------------------+
| config_group_id | cluster_id | config_type | version_tag | user_name | create_timestamp |
+-----------------+------------+--------------+--------------------------------------+-----------+------------------+
| 53 | 2 | kafka-broker | 43acdb7f-168d-4fde-b7f9-f3c24421427f | _db | 1595388874577 |
| 54 | 2 | kafka-broker | 791da62e-0b20-43f1-a3a3-f6e3f19eb856 | _db | 1595389019850 |
+-----------------+------------+--------------+--------------------------------------+-----------+------------------+
2 rows in set (0.00 sec)
2.2 问题解决
- 删除引起问题的数据,由于 confgroupclusterconfigmapping 中的group_id为configgroup的外键,存在依赖关系,所以操作如下:
mysql> delete from confgroupclusterconfigmapping;
Query OK, 2 rows affected (0.05 sec)
mysql> select * from confgroupclusterconfigmapping;
Empty set (0.00 sec)
//由于外键约束删除失败
mysql> delete from configgroup where group_id=52;
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`ambari`.`configgrouphostmapping`, CONSTRAINT `FK_cghm_cgid` FOREIGN KEY (`config_group_id`) REFERENCES `configgroup` (`group_id`))
//取消外键约束
mysql> SET FOREIGN_KEY_CHECKS=0;
Query OK, 0 rows affected (0.00 sec)
mysql> delete from configgroup;
Query OK, 3 rows affected (0.06 sec)
//启动外键约束
mysql> SET FOREIGN_KEY_CHECKS=1;
Query OK, 0 rows affected (0.00 sec)
- 重启 ambari-server
[root@node1 ~]# ambari-server restart
Using python /usr/bin/python
Restarting ambari-server
Waiting for server stop...
Ambari Server stopped
Ambari Server running with administrator privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Ambari database consistency check started...
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Waiting for server start...................................................................................
Server started listening on 8080
- 增加 kafka 服务
- Kafka 服务已存在异常
在配置完成之后安装阶段报了一个 kafka 服务已经存在的问题,在 MySQL 中查看clusterservices 表确实有 kafka 服务,原因大概是在删除 kafka 时 ambari 服务没有更新元数据信息,这里手动删除这些元数据信息。
mysql> select * from clusterservices;
+-------------------+------------+-----------------+
| service_name | cluster_id | service_enabled |
+-------------------+------------+-----------------+
| AMBARI_INFRA_SOLR | 2 | 0 |
| AMBARI_METRICS | 2 | 0 |
| HBASE | 2 | 0 |
| HDFS | 2 | 0 |
| HIVE | 2 | 0 |
| KAFKA | 2 | 0 |
| KNOX | 2 | 0 |
| MAPREDUCE2 | 2 | 0 |
| PIG | 2 | 0 |
| RANGER | 2 | 0 |
| RANGER_KMS | 2 | 0 |
| SPARK2 | 2 | 0 |
| SQOOP | 2 | 0 |
| STORM | 2 | 0 |
| TEZ | 2 | 0 |
| YARN | 2 | 0 |
| ZOOKEEPER | 2 | 0 |
+-------------------+------------+-----------------+
17 rows in set (0.00 sec)
// 因为有主键关联的约束,删除失败了这里可以关闭约束检查删除或者是把关联的数据一起删除,
// 因为怕关联的数据还会引起其他问题,我选择了一起删除关联数据
mysql> delete from clusterservices where service_name = 'KAFKA'
-> ;
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`ambari`.`servicedesiredstate`, CONSTRAINT `servicedesiredstateservicename` FOREIGN KEY (`service_name`, `cluster_id`) REFERENCES `clusterservices` (`service_name`, `cluster_id`))
// 因为关联了 servicedesiredstate 与servicecomponentdesiredstate 两张表,所以先删除这两张表里面的关联数据
mysql> delete from servicedesiredstate where service_name = 'KAFKA';
Query OK, 1 row affected (0.03 sec)
mysql> delete from servicecomponentdesiredstate where service_name = 'KAFKA';
Query OK, 1 row affected (0.02 sec)
mysql> delete from clusterservices where service_name = 'KAFKA';
Query OK, 1 row affected (0.02 sec)
删除了上述数据之后重新安装成功
总结
Ambari 中很多流程处理都需要依赖于数据库,当数据库中的数据出现异常就会导致很多功能的异常,后续会将整理 Ambari在 MySQL 中的元数据结构,整理成文档发出来。