Hi Lian,
I have had the same question before, but from what I now understand, TELEMAC (and most other hydrodynamic simulations) aren't well suited to massive parallelization. There is too much interdependence at the boundaries of mesh partitions, let alone within each partition, which introduces overhead where different sub-processes have to communicate.
For example, I ran a test of how long it took to run the same simulation with 1, 2, 3, ... , 12 cores. The domain was relatively small, so in this case, the quickest simulation ran with 5 or 6 cores.
That said, my understanding of GPU vs CPU could well be incorrect, maybe each mesh node could be treated in parallel on a GPU? I'm not a computer scientist...
André Renault