2020-04-19Operations14 分钟读完 (大约 2152 个字)

[Openstack] 重部署记录

~~数据中心集群断电后，openstack没法正常连接vm，本身系统也是前人部署，留下来的资料有限，debug比较麻烦，加之instance基本都没人在用，索性重装了。~~

UPDATE 2020/04/25

为了修个后视镜，把车都给拆咯

记录一下，以后再出问题时，稍微提供一些参考。

网络架构

其实之前工作的主要难点就是没有整个网络架构的资料，每个地方都得像“盲人摸象”一样一点一点去尝试，再加上对 OpenStack 缺少系统性的了解，走了很多弯路。

疫情缓和，终于是从学校拿到了系统网络架构的资料，也得益于tripleZ的工作，整个系统终于可以顺利的恢复了。

network structure

相关

网络解释：

Provider Network （提供者网络）：外网 IP 网络通道，绑定公网 IP 后可直接访问 Internet ；

Cluster External Network （集群外部网络）：集群内网，可直接访问 Internet ；

Tunnel Network （隧道网络）：实例/系统网络组件之间的网络通道，通过 VLAN 实现不同自服务网络间的二层网络隔离；

Management Network（管理网络）：用于 OpenStack 系统组件内部的连接通信；

Storage Network（存储专用网络）：高速的存储专线网络以获得较好的实例使用体验。

虚拟网络组件：

虚拟网桥是二层 SDN 的主要实现方式。

br-int ：集成网桥，对主机来说是整个 SDN 网络实现的核心，可理解为一个大型的内部虚拟交换机；

br-tun ：隧道网桥，用于实现实例间的网络通信；

br-ex ：外部网桥，直接连接在可访问外部网络的网卡上，实例可通过该网桥访问外部网络。

数据中心网络拓扑：

network topology in DC

Openstack重新部署

从头开始部署请参照 Quick Start，由于大部分环境之前已经配置好了，所以为了避免麻烦，还是摸索着前人留下来的工具直接用，尽量别引入新的问题了，速战速决。

移除容器

$ cd /root/Kolla-ansible ，该路径下
- multinode 为ansible的inventory，记录了节点角色信息，由于 compute11-13 节点离线，先备份 (multinode-datetime.bak)，再将这三个节点从 multinode 中移除
- scripts 中为ansible一些操作的脚本，将日志输入到 logs 中，一般建议用这里面的脚本而非直接使用命令
$ ./scripts/stop，stop过程中遇到了 fatal: nova_libvirt cnotainers are running 的问题，如遇相似问题，可参考文末 Troubleshoot 章节
$ ./scripts/destroy，删除容器重新部署

Kolla方式部署

$ cd /root/Kolla-ansible
$ ./scripts/prechecks，部署前预先检查各个节点是否有问题，(其实在这一步之前一般需要 kolla-ansible -i multinode bootstrap-servers 将所有节点都安装依赖，不过好在已经搞好，就不用这个，以免依赖版本更新造成一些奇怪的问题)，解决这一步中出现的所有问题(error)后，方可进行下一步；
$ ./scripts/deploy，注意由于 gathering facts 收集各个节点统计数据，这一步极为耗时，且会多次出现（所以 log 流停止的时候不要以为出问题了），要么可以直接将 gathering facts 关掉（不建议这么做），要么使用一个Ansible的插件 https://github.com/jlafon/ansible-profile ，对统计信息进行缓存，大大减少等待的时间，~~更简单的做法：放这，干别的去，这也是推荐你用脚本而不是命令的原因~~ ；
$ ./scripts/post-deploy，漫长等待之后，可以看到在各个节点上，容器已经正常工作了；
这时需要设置下 admin user 和环境变量，执行这步后 $ source /etc/kolla/admin-openrc.sh 获取操作权限；

运行脚本创建示例网络，镜像等init-runonce:

1
2
3

EXT_NET_CIDR='211.65.102.1/24'
EXT_NET_RANGE='start=211.65.102.42,end=211.65.102.69'
EXT_NET_GATEWAY='211.65.102.1'

执行：

1	. /usr/share/kolla-ansible/init-runonce

创建external网络：

1 2	$ openstack network create --external --provider-physical-network physnet1 \ --provider-network-type flat external

创建子网：

$ openstack subnet create --no-dhcp \
    --allocation-pool start=10.10.0.100,end=10.10.5.255 --network external \
    --subnet-range 10.10.0.0/16 --gateway 10.10.0.1 external-share

$ openstack subnet create --no-dhcp \
    --allocation-pool start=211.65.102.42,end=211.65.102.69 --network external \
    --subnet-range 211.65.102.0/24 --gateway 211.65.102.1 public-subnet

创建 internal 网络：

1
2
3

$ openstack network create --provider-network-type vxlan internal
$ openstack subnet create --subnet-range 192.168.0.1/24 --network internal \
    --gateway 192.168.0.1 --dns-nameserver 8.8.8.8 internal-subnet

创建路由：

1
2
3

$ openstack router create admin-router
$ openstack router add subnet admin-router internal-subnet
$ openstack router set --external-gateway external admin-router

创建浮动IP：

$ openstack floating ip create --subnet external-share --project admin external
导入镜像，镜像文件位于 /openstack-images
1
2
openstack image create --disk-format qcow2 --container-format bare --public \
--file ./${IMAGE} ${IMAGE_NAME}
关于镜像的制作，可以参考另外一篇文章：OpenStack镜像定制

参考

数据中心部署相关资料

Troubleshoot

`stop/destory`无效：nova_libvirt cnotainers are running:

使用 $ kolla-ansible -i multinode stop 中发现，有部分节点上的容器无法正常停止，这是由于其中有正在运行的实例，一般可以用 $ openstack server delete INSTANCE_NAME 这种方式先将实例删除，再重新运行一遍 stop 命令即可，特殊情况下也可以手动删除(如下)。

报错如下：

TASK [stop : Stopping Kolla containers] *****************************************************************************************************************************************************************
fatal: [compute02]: FAILED! => {"changed": true, "cmd": ["/tmp/kolla-stop/tools/stop-containers"], "delta": "0:00:00.031605", "end": "2020-04-17 19:36:07.141247", "msg": "non-zero return code", "rc": 1, "start": "2020-04-17 19:36:07.109642", "stderr": "", "stderr_lines": [], "stdout": "Some qemu processes were detected.\nDocker will not be able to stop the nova_libvirt container with those running.\nPlease clean them up before rerunning this script.", "stdout_lines": ["Some qemu processes were detected.", "Docker will not be able to stop the nova_libvirt container with those running.", "Please clean them up before rerunning this script."]}
fatal: [compute03]: FAILED! => {"changed": true, "cmd": ["/tmp/kolla-stop/tools/stop-containers"], "delta": "0:00:00.032837", "end": "2020-04-17 07:36:07.289828", "msg": "non-zero return code", "rc": 1, "start": "2020-04-17 07:36:07.256991", "stderr": "", "stderr_lines": [], "stdout": "Some qemu processes were detected.\nDocker will not be able to stop the nova_libvirt container with those running.\nPlease clean them up before rerunning this script.", "stdout_lines": ["Some qemu processes were detected.", "Docker will not be able to stop the nova_libvirt container with those running.", "Please clean them up before rerunning this script."]}

......omit......

PLAY RECAP **********************************************************************************************************************************************************************************************
cinder01                   : ok=5    changed=1    unreachable=0    failed=0
cinder02                   : ok=4    changed=0    unreachable=0    failed=1
cinder03                   : ok=4    changed=0    unreachable=0    failed=1
cinder04                   : ok=4    changed=0    unreachable=0    failed=1
cinder05                   : ok=4    changed=0    unreachable=0    failed=1
cinder06                   : ok=4    changed=0    unreachable=0    failed=1
cinder07                   : ok=5    changed=1    unreachable=0    failed=0
cinder08                   : ok=4    changed=0    unreachable=0    failed=1
cinder09                   : ok=4    changed=0    unreachable=0    failed=1
cinder11                   : ok=5    changed=1    unreachable=0    failed=0
cinder12                   : ok=4    changed=0    unreachable=0    failed=1
cinder13                   : ok=5    changed=1    unreachable=0    failed=0
cinder14                   : ok=5    changed=1    unreachable=0    failed=0
cinder15                   : ok=5    changed=1    unreachable=0    failed=0
cinder16                   : ok=5    changed=1    unreachable=0    failed=0
cinder17                   : ok=5    changed=1    unreachable=0    failed=0
cinder18                   : ok=5    changed=1    unreachable=0    failed=0
cinder19                   : ok=5    changed=1    unreachable=0    failed=0
cinder20                   : ok=4    changed=0    unreachable=0    failed=1
cinder21                   : ok=5    changed=1    unreachable=0    failed=0
cinder22                   : ok=4    changed=0    unreachable=0    failed=1
compute01                  : ok=5    changed=4    unreachable=0    failed=0
compute02                  : ok=4    changed=3    unreachable=0    failed=1
compute03                  : ok=4    changed=3    unreachable=0    failed=1
compute04                  : ok=4    changed=3    unreachable=0    failed=1
compute05                  : ok=4    changed=3    unreachable=0    failed=1
compute06                  : ok=4    changed=3    unreachable=0    failed=1
compute07                  : ok=5    changed=4    unreachable=0    failed=0
compute08                  : ok=4    changed=3    unreachable=0    failed=1
compute09                  : ok=4    changed=3    unreachable=0    failed=1
compute10                  : ok=5    changed=4    unreachable=0    failed=0
compute14                  : ok=5    changed=4    unreachable=0    failed=0
compute15                  : ok=5    changed=4    unreachable=0    failed=0
compute16                  : ok=4    changed=3    unreachable=0    failed=1
compute17                  : ok=5    changed=4    unreachable=0    failed=0
compute18                  : ok=5    changed=4    unreachable=0    failed=0
compute19                  : ok=5    changed=4    unreachable=0    failed=0
compute20                  : ok=5    changed=4    unreachable=0    failed=0
compute21                  : ok=5    changed=4    unreachable=0    failed=0
compute22                  : ok=5    changed=4    unreachable=0    failed=0
compute23                  : ok=5    changed=4    unreachable=0    failed=0
compute24                  : ok=4    changed=3    unreachable=0    failed=1
compute25                  : ok=5    changed=4    unreachable=0    failed=0
compute26                  : ok=4    changed=3    unreachable=0    failed=1
controller01               : ok=5    changed=4    unreachable=0    failed=0

Command failed ansible-playbook -i ./multinode -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla  /usr/share/kolla-ansible/ansible/stop.yml

解决方法：

进入对应的宿主机，进入 nova_libvirt 容器：$ docker exec -it nova_libvirt /bin/sh
手动执行删除实例

(nova-libvirt)[root@Compute02 /]$ virsh list
 Id    Name                           State
----------------------------------------------------
 2     instance-000000f5              running
 3     instance-0000010e              running
 4     instance-0000011d              running
(nova-libvirt)[root@Compute02 /]$ virsh destroy 2
Domain 2 destroyed

(nova-libvirt)[root@Compute02 /]$ virsh destroy 3
Domain 3 destroyed

(nova-libvirt)[root@Compute02 /]$ virsh destroy 4
Domain 4 destroyed

(nova-libvirt)[root@Compute02 /]$ virsh list
 Id    Name                           State
----------------------------------------------------

(nova-libvirt)[root@Compute02 /]$ exit
exit

#Openstack

[Openstack] 重部署记录

网络架构

Openstack重新部署

移除容器

相关文件

Kolla方式部署

参考

Troubleshoot

`stop/destory`无效：nova_libvirt cnotainers are running:

目录

最新文章

[Openstack] 重部署记录

网络架构

Openstack重新部署

移除容器

相关文件

Kolla方式部署

参考

Troubleshoot

stop/destory无效：nova_libvirt cnotainers are running:

目录

最新文章

`stop/destory`无效：nova_libvirt cnotainers are running: