For my masters thesis I'm using (py)NEST for the first time. I have some questions about parallel computing. first a short intro to give you the context:
My research group has access to 4 clusters; 2 with a focus on RAM/CPU on 2 with focus on GPU. These are shared systems; resources are dynamically allocated depending on the number of users doing calculations.
The way these clusters are set up allows me to simply set "local_num_threads" to a value of 10, and if no one else is using it the cluster manager will say I get 1000% CPU power, and the speed will increase about 8-fold as expected. (in case you find that number confusing: the manager rates 1 core as 100%, so 56 cores translates to 5600% CPU power)
However, problems start to occur when others are using the cluster as well. I might get 500% CPU power one moment, and just 300% the next. At this point it is actually FASTER to only request 1 thread and get the full 100% power. I'm guessing this is due to the architecture; something with nodes A and B waiting on node C which has been temporarily allocated to my colleague?
So, the questions:
1. Does NEST prefer CPU or GPU? Is it RAM intensive? I couldn't find this on your website, or anywhere really. Are there ballpark numbers for the memory required for 10^9 connections?
2. Am I abusing the "local_num_threads" mechanic by applying it to a cluster like this? Why is 1 thread sometimes faster than 10 threads?
3. Would switching to MPI help?
4. If so: the documentation says I need to run $NEST_SOURCE_DIR/configure --with-mpi, but there is no "configure" folder or command to be found anywhere, and this line fails. What and where is this "configure"?
Any help would be greatly appreciated. Thank you for your time.