树莓派MPI安装和测试
CentOS下MPI安装和测试
服务器列表
10.105.79.74 master1 CentOS Linux release 7.6.1810 (Core) 2核4G
10.105.90.27 worker1 CentOS Linux release 7.6.1810 (Core) 2核4G
10.154.6.123 worker2 CentOS Linux release 7.6.1810 (Core) 2核4G
HOST设置
三台服务器都配置好下面的HOST列表,后面自己用名字来代表服务
vim /etc/hosts ,添加下面的行
10.105.79.74 master1
10.105.90.27 worker1
10.154.6.123 worker2
或者下面的命令行也可以
echo 10.105.79.74 master1 >> /etc/hosts
echo 10.105.90.27 worker1 >> /etc/hosts
echo 10.154.6.123 worker2 >> /etc/hosts
防火墙设置和SELinux
echo "stop firewall"
systemctl stop firewalld && systemctl disable firewalld
echo "disable swap"
swapoff -a || true
echo "disable SELINUX "
setenforce 0 || true
或者
vim /etc/selinux/config
修改 SELINUX=enforcing 为 SELINUX=disabled
编译器安装
yum install epel-release -y
yum groupinstall "Development Tools" -y
免密码互访
在Master机器上运行
ssh-keygen -t rsa
ssh-copy-id root@master1
ssh-copy-id root@worker1
ssh-copy-id root@worker2
在Master测试一下
ssh root@worker1
ssh root@worker2
NFS搭建
主要设置共享目录,实现共同的运行程序和配置
目录挂载在master机器上 /opt 目录共享
在Master节点10.105.79.74上来安装 NFS 服务,数据目录:/opt
(1) 关闭防火墙
$ systemctl stop firewalld.service
$ systemctl disable firewalld.service
(2) 安装配置 nfs
$ yum -y install nfs-utils rpcbind
(3) 共享目录设置权限:
$ mkdir -p /opt
$ chmod 755 /opt
(4) 配置 nfs
$ vi /etc/exports
/opt *(rw,sync,no_root_squash)
配置说明:
/opt:是共享的数据目录
*:表示任何人都有权限连接,当然也可以是一个网段,一个 IP,也可以是域名
rw:读写的权限
sync:表示文件同时写入硬盘和内存
no_root_squash:当登录 NFS 主机使用共享目录的使用者是 root 时,其权限将被转换成为匿名使用者,通常它的 UID 与 GID,都会变成 nobody 身份
(5) 启动和查看状态
$ systemctl start rpcbind.service
$ systemctl enable rpcbind
$ systemctl status rpcbind
$ systemctl start nfs.service
$ systemctl enable nfs
$ systemctl status nfs
(6) 查看NFS是否建立好
[root@t1 ~]# rpcinfo -p|grep nfs
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100227 3 tcp 2049 nfs_acl
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100227 3 udp 2049 nfs_acl
(7) 查看NFS是否建立好
[root@t1 ~]# cat /var/lib/nfs/etab
/opt *(rw,sync,wdelay,hide,nocrossmnt,secure,no_root_squash,no_all_squash,no_subtree_check,secure_locks,acl,no_pnfs,anonuid=65534,anongid=65534,sec=sys,rw,secure,no_root_squash,no_all_squash)
(8) 另外一台机器上测试是否确实可挂载(非必须)
在worker1和worker2节点上
$ yum -y install nfs-utils rpcbind
$ systemctl start rpcbind.service
$ systemctl enable rpcbind.service
$ systemctl start nfs.service
$ systemctl enable nfs.service
[root@t2 ~]# showmount -e master1
Export list for master1:
/opt *
[root@t2 ~]# mount -t nfs master1:/opt /opt
[root@t2 ~]# cd /opt
[root@t2 opt]# ls
[root@t2 opt]# touch test.txt
[root@t2 opt]# echo "111" >> test.txt
查看 master的/opt目录是否有对应的文件和文件内容
MPI编译和环境变量设置
(1) download mpich
cd /opt
wget http://www.mpich.org/static/downloads/3.3.2/mpich-3.3.2.tar.gz
tar -xzvf mpich-3.3.2.tar.gz
cd mpich-3.3.2
./configure --prefix=/opt/mpich
make && make install
(2) 配置env path(三台机器都需要配置)
vim /etc/profile
PATH=$PATH:/opt/mpich/bin
source /etc/profile
(3) 单机测试
cd /opt/mpich-3.3.2/examples
[root@master1 examples]# mpirun -np 4 ./cpi
Process 0 of 4 is on master1
Process 3 of 4 is on master1
Process 1 of 4 is on master1
Process 2 of 4 is on master1
pi is approximately 3.1415926544231239, Error is 0.0000000008333307
wall clock time = 0.011902
[root@master1 examples]# mpirun -np 1 ./cpi
Process 0 of 1 is on master1
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000073
[root@master1 examples]# mpirun -np 2 ./cpi
Process 0 of 2 is on master1
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.000572
Process 1 of 2 is on master1
(4) 集群测试
建立server列表
touch machinefile
[root@master1 opt]# cat machinefile
master1:1
worker1:2
worker2:2
[root@master1 opt]# mpirun -n 20 -f machinefile ./cpi
Process 0 of 20 is on master1
Process 5 of 20 is on master1
Process 10 of 20 is on master1
Process 15 of 20 is on master1
Process 13 of 20 is on worker2
Process 18 of 20 is on worker2
Process 1 of 20 is on worker1
Process 4 of 20 is on worker2
Process 16 of 20 is on worker1
Process 9 of 20 is on worker2
Process 11 of 20 is on worker1
Process 8 of 20 is on worker2
Process 17 of 20 is on worker1
Process 14 of 20 is on worker2
Process 2 of 20 is on worker1
Process 3 of 20 is on worker2
Process 6 of 20 is on worker1
Process 19 of 20 is on worker2
Process 7 of 20 is on worker1
Process 12 of 20 is on worker1
pi is approximately 3.1415926544231279, Error is 0.0000000008333347
wall clock time = 0.242260
官方推荐使用 mpiexec -n 20 -f machinefile ./cpi
(5)编译C并行程序和编译C++并行程序
//编译C并行程序
cd /opt/mpich-3.3.2/examples/
mpicc -o cpi cpi.c
mpicc -o hellow hellow.c
cp cpi hellow /opt/
//编译C++并行程序
cd /opt/mpich-3.3.2/examples/cxx
mpicxx -o cxxpi cxxpi.cxx -DHAVE_CXX_IOSTREAM -DHAVE_NAMESPACE_STD
cp cxxpi /opt/
mpiexec -n 4 -f machinefile ./cpi
mpiexec -n 4 -f machinefile ./cxxpi
(6) 执行命令行的其他形式
//指定计算节点执行程序
mpiexec -n 2 -host worker2 ./cpi
mpiexec -n 2 -host worker2 ./cxxpi
//指定节点执行两个程序
mpiexec -f machinefile -n 4 ./cpi : -n 4 ./hellow
//mpiexec.hydra 执行,hydra为 process manager
mpiexec.hydra -f machinefile -n 10 ./cpi
mpiexec.hydra -f machinefile -n 4 ./cpi : -n 4 ./hellow
//指定通信网络
mpiexec -f machinefile -iface eth0 -n 4 ./hellow
//指定调度器
mpiexec -rmk pbs ./cpi
更多请参考:
https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager
(7) mpiexec mpirun的区别?
http://blog.sina.com.cn/s/blog_46422d810101g9w8.html
推荐使用mpiexec,功能更全
(8) mpiexec.hydra是什么?
mpiexec.hydra -f machinefile -n 10 ./cpi
(9)其他:自动共享NFS挂载
所有的客户端服需要安装autofs
yum -y install autofs
在/etc/auto.master最下面一行加入,/opt表示挂载点,auto.nfs表示挂载规则
vim /etc/auto.master
/opt /etc/auto.nfs
创建/etc/auto.nfs,配置如下,表示自动挂载opt下面的目录到本地指定挂载点
touch /etc/auto.nfs
vim /etc/auto.nfs
# /etc/auto.nfs
* -fstype=nfs master:/opt/&
自动启动
$ systemctl start autofs
$ systemctl enable autofs
测试一下
在master上创建目录、
mkdir -p /opt/test
touch /opt/test/t.txt
在work节点上
cd /opt/test
即自动挂载(autofs必须访问,才能触发它知道挂载)
df -Th #通过下面的命令查看
参考文献
http://www.mpich.org/static/downloads/3.3.2/mpich-3.3.2-userguide.pdf http://www.mpich.org/static/downloads/3.3.2/mpich-3.3.2-installguide.pdf https://blog.csdn.net/smart9527_zc/article/details/85174102