Welcome, Guest
Username: Password: Remember me

TOPIC: Artemis and direct solver in parallel (MUMPS)

Artemis and direct solver in parallel (MUMPS) 9 years 6 days ago #18952

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Oops! I just realized that I had forgot to change an environmental variable that pointed to the v7p0r1 version. Switched to the trunk version and now parallel (SOLVER=3) works. I have been so pre-occupied with the compilation issues that I forget simple things. I apologise for the trouble.

Costas
The administrator has disabled public write access.

Artemis and direct solver in parallel (MUMPS) 9 years 6 days ago #18954

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Hello all,

I believe I have good news. ScaLAPACK is compiled and all built-in tests passed. MUMPS is compiled and simple test completed successfully. The test case ile_para seems to run OK:

File Attachment:

File Name: output_smpd_2015-11-19-3.txt
File Size: 8 KB

Will do more tests tomorrow, to confirm.

Regards,
Costas
The administrator has disabled public write access.

Artemis and direct solver in parallel (MUMPS) 9 years 5 days ago #18958

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Hello,

I can confirm that MUMPS is working. I tested on a real case and got the following solution times (Intel Corei7 860):

YALE solver (option 8) in 1 core: 5157 s
MUMPS solver (option 9) in 2 cores: 950 s
MUMPS solver (option 9) in 4 cores: 738 s

The speed increase is substantial, which means that the MUMPS solver is more efficient than the YALE.

On the other side, I am facing a memory leak problem with the MUMPS implementation. The more the computation lasts, the more memory it consumes (it grows from 200mb to over 3Gb per core depending on the computation duration). So it works, but it's not done yet.

I am open to suggestions.

Regards,
Costas
The administrator has disabled public write access.

Artemis and direct solver in parallel (MUMPS) 9 years 5 days ago #18960

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi Costas

Thanks for these good news about Mumps.

On the bad aspect, this means there is probably a memory allocation problem somewhere in the codes. This will need more investigation to evaluate where this problem occurs.

If you want to investigate more this particular problem, you could look on the allocation/deallocation in the codes as this kind of phenomenon occur when there is some allocation at each time steps without deallocation.
Nevertheless, this sometimes machine or compiler dependant as we had seen similar problem for a long time in our cluster with telemac3D in parallel (with intel MPI) and this was not observed at EDF on the same model...

In the blue sky objective, we would like to implement tools like valgrind in our tests to identify the memory leaks but it's not evident particularly in case of parallel computation...

Kind regards
Christophe
The administrator has disabled public write access.

Artemis and direct solver in parallel (MUMPS) 9 years 5 days ago #18961

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Hello,

Since the MUMPS implementation has been sorted out quite recently, can you verify that your installation is "memory leak free"? ile_para case is too short to portray this problem, you will need a larger domain.

Can you also verify that the ARTEMIS interface to MUMPS is compatible with the latest version (v5.0.1)?

My setup consists of openBLAS-0.2.15, LAPACK-3.6.0, ScaLAPACK-2.0.2 and MUMPS-5.0.1.

Regards,
Costas
The administrator has disabled public write access.

Artemis and direct solver in parallel (MUMPS) 9 years 2 days ago #18969

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Hello,

I did some reading on the MUMPS solver and it appears that the increasing memory consumption, although it looks like a 'memory leak', it is in fact the way this type of solver works. Increased performance comes at the cost of memory consumption.

Some notes:
A test on a node with less memory, soon resulted in heavy disk swapping an eventual computation stop with a memory related error. Using METIS for reordering of the sparse matrix seems to reduce the maximum memory used, but results in increased computation time (almost twice the time in my case).

Regards,
Costas
The administrator has disabled public write access.

Artemis and direct solver in parallel (MUMPS) 9 years 2 days ago #18973

  • jmhervouet
  • jmhervouet's Avatar
Hello Costas,

We can admit that MUMPS needs a lot of memory, but it should not increase with time, for solving always a linear system of the same size. I think the problem is in solve_mumps_par.F where the authors have left a comment and some commented DEALLOCATE. As it is programmed, I think a general SAVE is missing. Without SAVE, we should always deallocate the arrays that are probably allocated at every call because they are 'forgotten' or destroyed due to the absence of SAVE. So I would rather add a general SAVE or, alternatively, add at the end:

DEALLOCATE(TEMP1)
DEALLOCATE(TEMP2)
DEALLOCATE(TEMP3)
and the deallocate of the MUMPS structure components as suggested by the authors.

If you see a difference of behaviour by doing this we can look more closely at what should be deallocated.

With best regards,

Jean-Michel Hervouet
The administrator has disabled public write access.

Artemis and direct solver in parallel (MUMPS) 9 years 2 days ago #18976

  • cyamin
  • cyamin's Avatar
  • OFFLINE
  • openTELEMAC Guru
  • Posts: 997
  • Thank you received: 234
Aah, you are right. De-commenting the deallocation statements at the end did stop the (eventually :( ) memory leak issue. Now the memory consumption is comparable to the other solvers and above all, constant.

Thank you for your help. If you make any more improvements, please let me know.

Best Regards,
Costas
The administrator has disabled public write access.
The following user(s) said Thank You: sebourban

Artemis and direct solver in parallel (MUMPS) 9 years 2 days ago #18978

  • sebourban
  • sebourban's Avatar
  • OFFLINE
  • Administrator
  • Principal Scientist
  • Posts: 814
  • Thank you received: 219
A-ha !

This is brilliant news. Thank you all, but especially Costas who made the effort in investigating the Windows/Intel/MSMPI option.

Sébastien.
The administrator has disabled public write access.

Artemis and direct solver in parallel (MUMPS) 9 years 2 days ago #18980

  • jmhervouet
  • jmhervouet's Avatar
Hello,

OK I commit this correction for version 7.1.

Regards,

JMH
The administrator has disabled public write access.

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.