Hi,
there is not "the right cluster". No matter what you buy, there is always another bottleneck. Currently (Q3/2016) the best price / performance ratio I found for Telemac is in the Xeon e5-26?? family. Don't go for more than 10 cores per CPU, as the RAM won't feed the additional data eaters.
The v3 and v4 generation will give you a speed up of up to 50% over v2, but for a price which is up to 5 times higher on the street market. If your not running it 365days / 24hours, using the older generation might result in much better economic efficiency.
I'm currently setting up several small clusters with 64 to 128 cores (Xeon E5) for 2 customers and my own office.
Above 8 nodes = 128 cores the electricity supply in my office is not strong enough ... 3600W max. I would need a new cable ...
My verdict for T2D and T3D (strongly varying with your models parameters!):
Small problems, lets say < 2000 mesh nodes per cpu core are strongly memory speed and memory channel width dependent.
But once you are on quad channel memory, the frequency is only important for the really small problems.
Large problems > 750 000 indeed are CPU frequency dependent. But this can be solved with just using: ... more CPUS ...
In between: Its is mostly the network latency that breaks your performance.
I tried 1G Ethernet vs. 10G Ethernet vs. QDR4 Infiniband.
First and second are only slightly different in latency. Last is 10 times better. Practical results: Medium size problems can be accelerated by up to 30% and more with QDR Infiniband, but the effect is best around 3 to 8 nodes with 16 cores.
Infiniband is the worst configuration hell ... Use Mellanox not Qlogic, as the first has much more resources in the www.
Partel and Gretel perform much better on SSDs with high IOPS rates than on raid systems. Paraview and other Postprocessors aswell. Eventhough the bandwidth is equal.
Think about the pcie-> m:2 ssd cards!
Parallel file systems like glusterfs are very easy to install.
This is great if you try to print out many time steps for videos.
If you are lucky, you can get a 4 or 8 node server (64 to 128 core Xeon E5) for 3500€ to 7000€. Used, leasing return hardware, with 1 year manufacturer warranty. Not a big risk.
Testing suitable hardware has a lot to do with trail and error. I ordered and returned almost anything twice .. Expect a lot of configuration problems. Or just ignore that you spend thousands of Euros on the wrong components.
If you need more infos, just contact me.
Kind regards,
Uwe