kudu-master节点迁移

kudu遇到问题:master节点配置较低,需要迁移到性能高的节点上,迁移比较麻烦,特此记录

迁移思路:

1)先添加kudu-master(在新节点上初始化数据目录,从已有master上同步过来元数据,刷新 Raft 配置,启动所有master)

2)删除要迁移的master(停掉所有进程,删除目标master,在新节点上重写 master 的 Raft 配置,再启动所有的)

迁移前准备

1. 识别存储目录,kudu的master同tablet一样配置有两个目录

  • fs_wal_dir:write-ahead-logs目录 /data/kudu/master/wal
  • fs_data_dirs:数据目录 /data/kudu/master/data (线上是/data1/kudu/master/data …)

2. 识别master的PRC端口,默认端口值为 7051

3. 识别master的UUID

1
2
3
4
5
6
7
8
9
10
11
打开kudu-master web页面:http://node102.bigdata.dmp.local.com:8051/masters

各master的uuid

85e0c097fcf747d286f59acf2ae3cfef LEADER node102.bigdata.dmp.local.com
c4fc5ceda4454e00ad1257a6489cedcf FOLLOWER node101.bigdata.dmp.local.com
c7873360d8404fe8bcb3b999b8bd3c2a FOLLOWER node103.bigdata.dmp.local.com

rpc_addresses { host: "node102.bigdata.dmp.local.com" port: 7051 }
rpc_addresses { host: "node101.bigdata.dmp.local.com" port: 7051 }
rpc_addresses { host: "node103.bigdata.dmp.local.com" port: 7051 }

迁移

1. 停掉所有kudu进程

2. 在新节点上格式化数据目录

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
mkdir -p /data/kudu/master/
sudo -u kudu kudu fs format --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data
输出:
I1031 15:38:54.215034 3151 env_posix.cc:1460] Not raising process file limit of 1000000; it is already as high as it can go
I1031 15:38:54.215198 3151 file_cache.cc:463] Constructed file cache lbm with capacity 400000
I1031 15:38:54.218797 3151 fs_manager.cc:377] Generated new instance metadata in path /data/kudu/master/data/instance:
uuid: "607c73cbf5484411a6be7fb0fc0b1554"
format_stamp: "Formatted at 2018-10-31 07:38:54 on node104.bigdata.dmp.local.com"
I1031 15:38:54.220115 3151 fs_manager.cc:377] Generated new instance metadata in path /data/kudu/master/wal/instance:
uuid: "607c73cbf5484411a6be7fb0fc0b1554"
format_stamp: "Formatted at 2018-10-31 07:38:54 on node104.bigdata.dmp.local.com"

sudo -u kudu kudu fs dump uuid --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 2>/dev/null

输出:
607c73cbf5484411a6be7fb0fc0b1554

3. 在新节点上重写 master 的 Raft 配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 85e0c097fcf747d286f59acf2ae3cfef:node102.bigdata.dmp.local.com:7051 c4fc5ceda4454e00ad1257a6489cedcf:node101.bigdata.dmp.local.com:7051 c7873360d8404fe8bcb3b999b8bd3c2a:node103.bigdata.dmp.local.com:7051

直接执行,会报错:
Not found: /data/kudu/master/data/consensus-meta/00000000000000000000000000000000: No such file or directory (error 2)

从正常到master节点上scp过来:
scp 00000000000000000000000000000000 root@node104.bigdata.dmp.local.com:/data/kudu/master/data/consensus-meta/
chown kudu:kudu 00000000000000000000000000000000

再次执行输出:
I1031 15:56:30.394467 7935 env_posix.cc:1460] Not raising process file limit of 1000000; it is already as high as it can go
I1031 15:56:30.394745 7935 file_cache.cc:463] Constructed file cache lbm with capacity 400000
I1031 15:56:30.398227 7935 fs_report.cc:345] Block manager report
--------------------
1 data directories: /data/kudu/master/data/data
Total live blocks: 0
Total live bytes: 0
Total live bytes (after alignment): 0
Total number of LBM containers: 0 (0 full)
Did not check for missing blocks
Did not check for orphaned blocks
Total full LBM containers with extra space: 0 (0 repaired)
Total full LBM container extra space in bytes: 0 (0 repaired)
Total incomplete LBM containers: 0 (0 repaired)
Total LBM partial records: 0 (0 repaired)
I1031 15:56:30.398293 7935 fs_manager.cc:263] Time spent opening block manager: real 0.001s user 0.000s sys 0.001s
I1031 15:56:30.398685 7935 fs_manager.cc:266] Opened local filesystem: /data/kudu/master/data,/data/kudu/master/wal
uuid: "607c73cbf5484411a6be7fb0fc0b1554"
format_stamp: "Formatted at 2018-10-31 07:38:54 on node104.bigdata.dmp.local.com"
I1031 15:56:30.401140 7935 tool_action_local_replica.cc:257] Backed up current config to /data/kudu/master/data/consensus-meta/00000000000000000000000000000000.pre_rewrite.1540972590398731

4. 启动现有的 master

5. 使用以下命令将 master 数据复制到每个新 master,在每台新master上执行,只需要连接到一个master上:

此步是关键

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
在执行前,确保所有/data/kudu/master/目录下内容都是kudu:kudu用户

chown -R kudu:kudu /data/kudu/master/
sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 node102.bigdata.dmp.local.com:7051

输出:
I1031 16:08:49.705765 11161 env_posix.cc:1460] Not raising process file limit of 1000000; it is already as high as it can go
I1031 16:08:49.706120 11161 file_cache.cc:463] Constructed file cache lbm with capacity 400000
I1031 16:08:49.709882 11161 fs_report.cc:345] Block manager report
--------------------
1 data directories: /data/kudu/master/data/data
Total live blocks: 0
Total live bytes: 0
Total live bytes (after alignment): 0
Total number of LBM containers: 0 (0 full)
Did not check for missing blocks
Did not check for orphaned blocks
Total full LBM containers with extra space: 0 (0 repaired)
Total full LBM container extra space in bytes: 0 (0 repaired)
Total incomplete LBM containers: 0 (0 repaired)
Total LBM partial records: 0 (0 repaired)
I1031 16:08:49.709954 11161 fs_manager.cc:263] Time spent opening block manager: real 0.002s user 0.000s sys 0.001s
I1031 16:08:49.710440 11161 fs_manager.cc:266] Opened local filesystem: /data/kudu/master/data,/data/kudu/master/wal
uuid: "607c73cbf5484411a6be7fb0fc0b1554"
format_stamp: "Formatted at 2018-10-31 07:38:54 on node104.bigdata.dmp.local.com"
I1031 16:08:49.733937 11161 tablet_copy_client.cc:166] T 00000000000000000000000000000000 P 607c73cbf5484411a6be7fb0fc0b1554: Tablet Copy client: Beginning tablet copy session from remote peer at address node102.bigdata.dmp.local.com:7051
I1031 16:08:49.755112 11161 tablet_copy_client.cc:422] T 00000000000000000000000000000000 P 607c73cbf5484411a6be7fb0fc0b1554: Tablet Copy client: Starting download of 908 data blocks...
I1031 16:08:51.402842 11161 tablet_copy_client.cc:385] T 00000000000000000000000000000000 P 607c73cbf5484411a6be7fb0fc0b1554: Tablet Copy client: Starting download of 1 WAL segments...
I1031 16:08:52.008639 11161 tablet_copy_client.cc:292] T 00000000000000000000000000000000 P 607c73cbf5484411a6be7fb0fc0b1554: Tablet Copy client: Tablet Copy complete. Replacing tablet superblock.

3. 在新的和老的上面 重写 master 的 Raft 配置

此步是关键

1
sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 85e0c097fcf747d286f59acf2ae3cfef:node102.bigdata.dmp.local.com:7051 c4fc5ceda4454e00ad1257a6489cedcf:node101.bigdata.dmp.local.com:7051 c7873360d8404fe8bcb3b999b8bd3c2a:node103.bigdata.dmp.local.com:7051 607c73cbf5484411a6be7fb0fc0b1554:node104.bigdata.dmp.local.com:7051

6. 在cm中添加kudu-master,启动所有master

至此,已经添加新的master,之后就可以删除不想用的master

7. 停掉所有进程,在cm中删除kudu-master

8. 在新的和老的上面 重写 master 的 Raft 配置(排除已删除的)

1
sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data/kudu/master/data 00000000000000000000000000000000 85e0c097fcf747d286f59acf2ae3cfef:node102.bigdata.dmp.local.com:7051 c4fc5ceda4454e00ad1257a6489cedcf:node101.bigdata.dmp.local.com:7051 607c73cbf5484411a6be7fb0fc0b1554:node104.bigdata.dmp.local.com:7051

9. 分别启动master,Tablet Server

线上迁移记录

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
-- 迁移前准备

36c7f0a6a98a45ce8472fa747d5ac1dd FOLLOWER
rpc_addresses { host: "node1.ikh.bigdata.dmp.com" port: 7051 }

7b0e8b0afc934038ac4afccb05372bb7 LEADER
rpc_addresses { host: "node3.ikh.bigdata.dmp.com" port: 7051 }

86961f7799e94afa97c2f2be6773141d FOLLOWER
rpc_addresses { host: "node2.ikh.bigdata.dmp.com" port: 7051 }

-- 迁移

1. 停掉所有kudu进程
2. 在新节点上格式化数据目录
mkdir -p /data/kudu/master/
chown -R kudu:kudu master/

mkdir -p /data1/kudu/master/data/
mkdir -p /data2/kudu/master/data/
mkdir -p /data3/kudu/master/data/
mkdir -p /data4/kudu/master/data/
mkdir -p /data5/kudu/master/data/
mkdir -p /data6/kudu/master/data/
mkdir -p /data7/kudu/master/data/
mkdir -p /data8/kudu/master/data/
mkdir -p /data9/kudu/master/data/
mkdir -p /data10/kudu/master/data/
mkdir -p /data11/kudu/master/data/
mkdir -p /data12/kudu/master/data/

chown -R kudu:kudu /data1/kudu/master/data/
chown -R kudu:kudu /data2/kudu/master/data/
chown -R kudu:kudu /data3/kudu/master/data/
chown -R kudu:kudu /data4/kudu/master/data/
chown -R kudu:kudu /data5/kudu/master/data/
chown -R kudu:kudu /data6/kudu/master/data/
chown -R kudu:kudu /data7/kudu/master/data/
chown -R kudu:kudu /data8/kudu/master/data/
chown -R kudu:kudu /data9/kudu/master/data/
chown -R kudu:kudu /data10/kudu/master/data/
chown -R kudu:kudu /data11/kudu/master/data/
chown -R kudu:kudu /data12/kudu/master/data/

sudo -u kudu kudu fs format --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data1/kudu/master/data/,/data2/kudu/master/data/,/data3/kudu/master/data/,/data4/kudu/master/data/,/data5/kudu/master/data/,/data6/kudu/master/data/,/data7/kudu/master/data/,/data8/kudu/master/data/,/data9/kudu/master/data/,/data10/kudu/master/data/,/data11/kudu/master/data/,/data12/kudu/master/data/

从输出中获取uuid:
c6a414aeddaa469b9953df0c71fbe245 node51.ikh.bigdata.dmp.com
da6b236ddd48464eb064c2b8e859ce1e node52.ikh.bigdata.dmp.com
86ec16fbeaca455b840c800631ec14c1 node53.ikh.bigdata.dmp.com

3. 从正常的master节点上把Raft配置scp过来:
scp /data1/kudu/master/data/consensus-meta/00000000000000000000000000000000 root@node51.ikh.bigdata.dmp.com:/data1/kudu/master/data/consensus-meta/
scp /data1/kudu/master/data/consensus-meta/00000000000000000000000000000000 root@node52.ikh.bigdata.dmp.com:/data1/kudu/master/data/consensus-meta/
scp /data1/kudu/master/data/consensus-meta/00000000000000000000000000000000 root@node53.ikh.bigdata.dmp.com:/data1/kudu/master/data/consensus-meta/

chown kudu:kudu /data1/kudu/master/data/consensus-meta/00000000000000000000000000000000

4. 把已有的Raft配置写入新的master节点
sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data1/kudu/master/data/,/data2/kudu/master/data/,/data3/kudu/master/data/,/data4/kudu/master/data/,/data5/kudu/master/data/,/data6/kudu/master/data/,/data7/kudu/master/data/,/data8/kudu/master/data/,/data9/kudu/master/data/,/data10/kudu/master/data/,/data11/kudu/master/data/,/data12/kudu/master/data/ 00000000000000000000000000000000 36c7f0a6a98a45ce8472fa747d5ac1dd:node1.ikh.bigdata.dmp.com:7051 7b0e8b0afc934038ac4afccb05372bb7:node3.ikh.bigdata.dmp.com:7051 86961f7799e94afa97c2f2be6773141d:node2.ikh.bigdata.dmp.com:7051

5. cm中启动现有的master

6. 将现有的master中的文件块数据copy到新master上,只需连接其中一个现有master
sudo -u kudu kudu local_replica copy_from_remote --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data1/kudu/master/data/,/data2/kudu/master/data/,/data3/kudu/master/data/,/data4/kudu/master/data/,/data5/kudu/master/data/,/data6/kudu/master/data/,/data7/kudu/master/data/,/data8/kudu/master/data/,/data9/kudu/master/data/,/data10/kudu/master/data/,/data11/kudu/master/data/,/data12/kudu/master/data/ 00000000000000000000000000000000 node1.ikh.bigdata.dmp.com:7051

7. 停掉master

8. 在新的和老的所有master上面重写 Raft 配置,其中新master的uuid是第2步中的
sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data1/kudu/master/data/,/data2/kudu/master/data/,/data3/kudu/master/data/,/data4/kudu/master/data/,/data5/kudu/master/data/,/data6/kudu/master/data/,/data7/kudu/master/data/,/data8/kudu/master/data/,/data9/kudu/master/data/,/data10/kudu/master/data/,/data11/kudu/master/data/,/data12/kudu/master/data/ 00000000000000000000000000000000 36c7f0a6a98a45ce8472fa747d5ac1dd:node1.ikh.bigdata.dmp.com:7051 7b0e8b0afc934038ac4afccb05372bb7:node3.ikh.bigdata.dmp.com:7051 86961f7799e94afa97c2f2be6773141d:node2.ikh.bigdata.dmp.com:7051 c6a414aeddaa469b9953df0c71fbe245:node51.ikh.bigdata.dmp.com:7051 da6b236ddd48464eb064c2b8e859ce1e:node52.ikh.bigdata.dmp.com:7051 86ec16fbeaca455b840c800631ec14c1:node53.ikh.bigdata.dmp.com:7051

9. 在cm中添加新的kudu-master角色,启动所有master

10. 停掉所有进程,在cm中删除kudu-master

11. 在新的和老的上面 重写 master 的 Raft 配置(排除已删除的)
sudo -u kudu kudu local_replica cmeta rewrite_raft_config --fs_wal_dir=/data/kudu/master/wal --fs_data_dirs=/data1/kudu/master/data/,/data2/kudu/master/data/,/data3/kudu/master/data/,/data4/kudu/master/data/,/data5/kudu/master/data/,/data6/kudu/master/data/,/data7/kudu/master/data/,/data8/kudu/master/data/,/data9/kudu/master/data/,/data10/kudu/master/data/,/data11/kudu/master/data/,/data12/kudu/master/data/ 00000000000000000000000000000000 c6a414aeddaa469b9953df0c71fbe245:node51.ikh.bigdata.dmp.com:7051 da6b236ddd48464eb064c2b8e859ce1e:node52.ikh.bigdata.dmp.com:7051 86ec16fbeaca455b840c800631ec14c1:node53.ikh.bigdata.dmp.com:7051

12. 启动master,启动tablet server


--注意:
迁移后之前在impala上建的kudu"外部表"就不能读了,因为它们默认指定的是老的kudu-master地址

解决办法:

- 在impala上删掉这些"外部表"(删除过程可能需要等几分钟)
- 之后再重新建表

drop table prod.ods_kudu_liyue_dsp_issue_log_1d

create EXTERNAL table prod.ods_kudu_liyue_dsp_issue_log_1d stored as kudu
TBLPROPERTIES('EXTERNAL'='TRUE','kudu.table_name' = 'ods_kudu_liyue_dsp_issue_log_1d')
;
© 2019 GuoYL's Notes All Rights Reserved. 本站访客数人次 本站总访问量
Theme by hiero