CentOS下MPI安装和测试

服务器列表

10.105.79.74 master1  CentOS Linux release 7.6.1810 (Core)  2核4G
10.105.90.27 worker1  CentOS Linux release 7.6.1810 (Core)  2核4G
10.154.6.123 worker2  CentOS Linux release 7.6.1810 (Core)  2核4G

HOST设置

三台服务器都配置好下面的HOST列表,后面自己用名字来代表服务
vim /etc/hosts ,添加下面的行
10.105.79.74 master1
10.105.90.27 worker1
10.154.6.123 worker2

或者下面的命令行也可以
echo 10.105.79.74 master1 >> /etc/hosts
echo 10.105.90.27 worker1 >> /etc/hosts
echo 10.154.6.123 worker2 >> /etc/hosts

防火墙设置和SELinux

echo "stop firewall"
systemctl stop firewalld && systemctl disable firewalld

echo "disable swap"
swapoff -a || true

echo "disable SELINUX "
setenforce 0 || true

或者
vim /etc/selinux/config 
修改 SELINUX=enforcing 为 SELINUX=disabled

编译器安装

yum install epel-release -y 
yum groupinstall "Development Tools" -y

免密码互访

在Master机器上运行
ssh-keygen -t rsa
ssh-copy-id root@master1
ssh-copy-id root@worker1
ssh-copy-id root@worker2

在Master测试一下
ssh root@worker1
ssh root@worker2

NFS搭建

主要设置共享目录,实现共同的运行程序和配置

目录挂载在master机器上 /opt 目录共享

在Master节点10.105.79.74上来安装 NFS 服务,数据目录:/opt

(1) 关闭防火墙
$ systemctl stop firewalld.service
$ systemctl disable firewalld.service
(2) 安装配置 nfs
$ yum -y install nfs-utils rpcbind
(3) 共享目录设置权限:
$ mkdir -p /opt
$ chmod 755 /opt
(4) 配置 nfs
$ vi /etc/exports
/opt  *(rw,sync,no_root_squash)
配置说明:
/opt:是共享的数据目录
*:表示任何人都有权限连接,当然也可以是一个网段,一个 IP,也可以是域名
rw:读写的权限
sync:表示文件同时写入硬盘和内存
no_root_squash:当登录 NFS 主机使用共享目录的使用者是 root 时,其权限将被转换成为匿名使用者,通常它的 UID 与 GID,都会变成 nobody 身份
(5) 启动和查看状态
$ systemctl start rpcbind.service
$ systemctl enable rpcbind
$ systemctl status rpcbind
$ systemctl start nfs.service
$ systemctl enable nfs
$ systemctl status nfs
(6) 查看NFS是否建立好
[root@t1 ~]# rpcinfo -p|grep nfs
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100227    3   tcp   2049  nfs_acl
    100003    3   udp   2049  nfs
    100003    4   udp   2049  nfs
    100227    3   udp   2049  nfs_acl
(7) 查看NFS是否建立好
[root@t1 ~]# cat /var/lib/nfs/etab
/opt       *(rw,sync,wdelay,hide,nocrossmnt,secure,no_root_squash,no_all_squash,no_subtree_check,secure_locks,acl,no_pnfs,anonuid=65534,anongid=65534,sec=sys,rw,secure,no_root_squash,no_all_squash)
(8) 另外一台机器上测试是否确实可挂载(非必须)
在worker1和worker2节点上
$ yum -y install nfs-utils rpcbind
$ systemctl start rpcbind.service 
$ systemctl enable rpcbind.service 
$ systemctl start nfs.service    
$ systemctl enable nfs.service
[root@t2 ~]# showmount -e master1
Export list for master1:
/opt *
[root@t2 ~]# mount -t nfs master1:/opt /opt
[root@t2 ~]# cd /opt
[root@t2 opt]# ls
[root@t2 opt]# touch test.txt
[root@t2 opt]# echo  "111" >> test.txt 
查看 master的/opt目录是否有对应的文件和文件内容

MPI编译和环境变量设置

(1) download mpich

cd /opt
wget http://www.mpich.org/static/downloads/3.3.2/mpich-3.3.2.tar.gz
tar -xzvf mpich-3.3.2.tar.gz
cd mpich-3.3.2
./configure --prefix=/opt/mpich
make && make install

(2) 配置env path(三台机器都需要配置)

vim /etc/profile
PATH=$PATH:/opt/mpich/bin
source /etc/profile

(3) 单机测试

cd /opt/mpich-3.3.2/examples

[root@master1 examples]# mpirun -np 4 ./cpi
Process 0 of 4 is on master1
Process 3 of 4 is on master1
Process 1 of 4 is on master1
Process 2 of 4 is on master1
pi is approximately 3.1415926544231239, Error is 0.0000000008333307
wall clock time = 0.011902


[root@master1 examples]# mpirun -np 1 ./cpi
Process 0 of 1 is on master1
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000073


[root@master1 examples]# mpirun -np 2 ./cpi
Process 0 of 2 is on master1
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.000572
Process 1 of 2 is on master1

(4) 集群测试


建立server列表
touch machinefile 

[root@master1 opt]# cat machinefile 
master1:1
worker1:2
worker2:2

[root@master1 opt]# mpirun -n 20 -f machinefile ./cpi
Process 0 of 20 is on master1
Process 5 of 20 is on master1
Process 10 of 20 is on master1
Process 15 of 20 is on master1
Process 13 of 20 is on worker2
Process 18 of 20 is on worker2
Process 1 of 20 is on worker1
Process 4 of 20 is on worker2
Process 16 of 20 is on worker1
Process 9 of 20 is on worker2
Process 11 of 20 is on worker1
Process 8 of 20 is on worker2
Process 17 of 20 is on worker1
Process 14 of 20 is on worker2
Process 2 of 20 is on worker1
Process 3 of 20 is on worker2
Process 6 of 20 is on worker1
Process 19 of 20 is on worker2
Process 7 of 20 is on worker1
Process 12 of 20 is on worker1
pi is approximately 3.1415926544231279, Error is 0.0000000008333347
wall clock time = 0.242260

官方推荐使用 mpiexec -n 20 -f machinefile ./cpi

(5)编译C并行程序和编译C++并行程序

//编译C并行程序
cd /opt/mpich-3.3.2/examples/
mpicc -o cpi cpi.c 
mpicc -o hellow hellow.c 
cp cpi hellow /opt/
//编译C++并行程序
cd /opt/mpich-3.3.2/examples/cxx
mpicxx -o cxxpi cxxpi.cxx -DHAVE_CXX_IOSTREAM -DHAVE_NAMESPACE_STD 
cp cxxpi /opt/

mpiexec -n 4 -f machinefile ./cpi
mpiexec -n 4 -f machinefile ./cxxpi

(6) 执行命令行的其他形式

//指定计算节点执行程序
mpiexec -n 2 -host worker2 ./cpi
mpiexec -n 2 -host worker2 ./cxxpi
//指定节点执行两个程序
mpiexec -f machinefile -n 4  ./cpi : -n 4 ./hellow
//mpiexec.hydra 执行,hydra为 process manager
mpiexec.hydra -f machinefile -n 10 ./cpi
mpiexec.hydra -f machinefile -n 4  ./cpi : -n 4 ./hellow
//指定通信网络
mpiexec -f machinefile -iface eth0 -n 4 ./hellow
//指定调度器
mpiexec -rmk pbs ./cpi

更多请参考:
https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager

(7) mpiexec mpirun的区别?

http://blog.sina.com.cn/s/blog_46422d810101g9w8.html
推荐使用mpiexec,功能更全

(8) mpiexec.hydra是什么?

mpiexec.hydra -f machinefile -n 10 ./cpi

(9)其他:自动共享NFS挂载

所有的客户端服需要安装autofs

yum -y install autofs

在/etc/auto.master最下面一行加入,/opt表示挂载点,auto.nfs表示挂载规则

vim /etc/auto.master

/opt   /etc/auto.nfs

创建/etc/auto.nfs,配置如下,表示自动挂载opt下面的目录到本地指定挂载点

touch /etc/auto.nfs
vim /etc/auto.nfs

# /etc/auto.nfs
*  -fstype=nfs  master:/opt/&

自动启动

$ systemctl start autofs
$ systemctl enable autofs

测试一下

在master上创建目录、

mkdir -p /opt/test 
touch /opt/test/t.txt

在work节点上
cd /opt/test
即自动挂载(autofs必须访问,才能触发它知道挂载)

df -Th  #通过下面的命令查看

参考文献

http://www.mpich.org/static/downloads/3.3.2/mpich-3.3.2-userguide.pdf http://www.mpich.org/static/downloads/3.3.2/mpich-3.3.2-installguide.pdf https://blog.csdn.net/smart9527_zc/article/details/85174102