Difference between revisions of "Eosc/easybuild/andreku"
(→Status) |
|||
Line 25: | Line 25: | ||
= To use: = | = To use: = | ||
'''module use -a /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/modules/all/''' | '''module use -a /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/modules/all/''' | ||
+ | '''module load NLPL-TensorFlow/1.15.2-gomkl-2019b-Python-3.7.4''' | ||
+ | |||
+ | = Remaining issues = | ||
+ | *In order to keep the modules NLPL-branded, [https://source.coderefinery.org/nlpl/easybuild/-/issues/4#note_13895 environment variables must be added manually to the module files]. Without that, modules load fine, but cannot be used as dependencies in building other modules. | ||
+ | *TensorFlow is built with CUDA 10.1.243, not CUDA 10.0.130. Attempts to use the latter, [https://source.coderefinery.org/nlpl/easybuild/-/issues/9 failed]. Should find a way to make EasyBuild look for a non-standard CUDA location. | ||
+ | * Check whether using '''gompi''' instead of '''gompic''' (with CUDA) [https://source.coderefinery.org/nlpl/easybuild/-/issues/9#note_13894 leads to problems with multi-node training]. Multi-GPU training on a single node is confirmed to work. |
Revision as of 15:36, 21 November 2020
Important stuff to remember
export EB_PYTHON=python3
module load EasyBuild/4.3.0
Playground on Saga: /cluster/shared/nlpl/software/easybuild_ak
export EASYBUILD_ROBOT_PATHS=/cluster/software/EasyBuild/4.3.0/easybuild/easyconfigs:/cluster/shared/nlpl/software/easybuild_ak
(or just source PATH.local)
Repository: https://source.coderefinery.org/nlpl/easybuild/-/tree/ak-dev
Status
03/11/2020: successfully built cython-0.29.21-foss-2019b-Python-3.7.4, numpy-1.18.1-foss-2019b-Python-3.7.4, SciPy-bundle-2020.03-foss-2019b-Python-3.7.4, Bazel-0.26.1-foss-2019b, h5py-2.10.0-foss-2019b-Python-3.7.4.
04/11/2020: TensorFlow 1.15.2 successfully built and installed, using CUDA 10.1.243
19/11/2020: gomkl toolchain built with Intel MKL 2019.1.144
21/11/2020: successfully built everything (including TensorFlow 1.15.2) with the gomkl toolchain.
To use:
module use -a /cluster/shared/nlpl/software/easybuild_ak/easybuild/install/modules/all/ module load NLPL-TensorFlow/1.15.2-gomkl-2019b-Python-3.7.4
Remaining issues
- In order to keep the modules NLPL-branded, environment variables must be added manually to the module files. Without that, modules load fine, but cannot be used as dependencies in building other modules.
- TensorFlow is built with CUDA 10.1.243, not CUDA 10.0.130. Attempts to use the latter, failed. Should find a way to make EasyBuild look for a non-standard CUDA location.
- Check whether using gompi instead of gompic (with CUDA) leads to problems with multi-node training. Multi-GPU training on a single node is confirmed to work.