Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1
  • 2

TOPIC: mask3d error in parallel run

mask3d error in parallel run 8 years 10 months ago #19263

  • victor
  • victor's Avatar
Good afternoon

I'm happy to run v7p0r1 now. I found an error when using mask3d for a 3d simulation.

When I run using only 1 core it runs fine but adding more core it stops with the following error

APPEL DE MASK3D
_____________
runcode::main:
:
|runCode: Fail to run
|/usr/bin/mpiexec -wdir /home/victor/Ronda1/Modpreliminar/tel/cas00_2016-01-05-16h28min22s -n 4 /home/victor/Ronda1/Modpreliminar/tel/cas00_2016-01-05-16h28min22s/out_tel3dv700
|~~~~~~~~~~~~~~~~~~
|Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
|
|Backtrace for this error:
|
|Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
|
|Backtrace for this error:
|
|Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
|
|Backtrace for this error:
|
|Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
|
|Backtrace for this error:
|#0 0x7FDC6D47A777
|#1 0x7FDC6D47AD7E
|#2 0x7FDC6CBB6D3F
|#3 0x6895BD in ovbd_
|#4 0x40A789 in mask3d_
|#5 0x40D830 in telemac3d_
|#6 0x479AC5 in MAIN__ at homere_telemac3d.f:?
|#0 0x7F143149D777
|#0 0x7F036A73E777
|#1 0x7F143149DD7E
|#1 0x7F036A73ED7E
|#2 0x7F1430BD9D3F
|#2 0x7F0369E7AD3F
|#3 0x6895BD in ovbd_
|#3 0x6895BD in ovbd_
|#4 0x40A789 in mask3d_
|#4 0x40A789 in mask3d_
|#5 0x40D830 in telemac3d_
|#5 0x40D830 in telemac3d_
|#6 0x479AC5 in MAIN__ at homere_telemac3d.f:?
|#6 0x479AC5 in MAIN__ at homere_telemac3d.f:?
|#0 0x7FD62AA4F777
|#1 0x7FD62AA4FD7E
|#2 0x7FD62A18BD3F
|#3 0x6895BD in ovbd_
|#4 0x40A789 in mask3d_
|#5 0x40D830 in telemac3d_
|#6 0x479AC5 in MAIN__ at homere_telemac3d.f:?
|
|mpiexec noticed that process rank 1 with PID 6548 on node victor-T5500 exited on signal 11 (Segmentation fault).
|
|~~~~~~~~~~~~~~~~~~
victor@victor-T5500:~/Ronda1/Modpreliminar/tel$

I read similar post in the forum but I cannot find a solution.

These are the modified subroutines that i use with the keyword mask=1. I want to mask all land elements.



Any suggestion or idea will be welcome

Tlazocamati (thanks in mexican nahuatl)
The administrator has disabled public write access.

mask3d error in parallel run 8 years 10 months ago #19264

  • victor
  • victor's Avatar
fortran files

File Attachment:

File Name: mask3dvrs.f
File Size: 8 KB
The administrator has disabled public write access.

mask3d error in parallel run 8 years 10 months ago #19280

  • jmhervouet
  • jmhervouet's Avatar
Hello Victor,

If you can send your case or a similar simplified case I could try it and see what happens.

Que te vaya bien y feliz ano nuevo !

Jean-Michel
The administrator has disabled public write access.

mask3d error in parallel run 8 years 10 months ago #19295

  • victor
  • victor's Avatar
Hola Jean Michel

It's nice to have a reply from you

These are the files. I'm using particular boundary condition. You need to tell Telemac the location of the BC files uhycom.csv and vhycom.csv in line 220 and 221 of the fortran. Then all my changes are commented "VRS". In the maskob file I write a condition to mask the land elements greater than 1 m. In the cas file I have 1 core, it runs fine, when I change 2 or more it gets the error.

Hope to see you soon

Thanks

Victor
The administrator has disabled public write access.

mask3d error in parallel run 8 years 10 months ago #19296

  • victor
  • victor's Avatar
I have to send by parts, its a little too big

files 1/3


File Attachment:

File Name: OpenTelHycom.zip
File Size: 1,394 KB
The administrator has disabled public write access.

mask3d error in parallel run 8 years 10 months ago #19297

  • victor
  • victor's Avatar
I have to send by parts, its a little too big

files 2/3
The administrator has disabled public write access.

mask3d error in parallel run 8 years 10 months ago #19298

  • victor
  • victor's Avatar
Finally I simplified the case to avoid complex boundary conditions.

If you change the number of cores to 1 it runs fine, 2 or more it stops.


File Attachment:

File Name: OpenTelMask.zip
File Size: 1,263 KB


Gracias

Victor
The administrator has disabled public write access.

mask3d error in parallel run 8 years 10 months ago #19300

  • jmhervouet
  • jmhervouet's Avatar
Hello Victor,

I looked at your case. There is not a single compiler here to allow what you are doing. Actually you cannot have the module declarations_telemac3d in the middle of your fortran file, because the subroutines in the library may be compiled with the other module in the library. This triggers problems of addresses in the memory which can cause crashes, and this may be your problem in parallel. You can put your extra variables in another module which should be placed at the beginning of your Fortran file, and you can write : USE MY_MODULE when you want to get your new variables. Only after doing this we can start debugging if the problem remains.

With best regards,

Jean-Michel
The administrator has disabled public write access.

mask3d error in parallel run 8 years 10 months ago #19310

  • victor
  • victor's Avatar
Thanks for your reply.

Now I removed Declarations_Telemac3d from the fortran file and the problem remains. I would like to note that maskob.f is a telemac2d source file and that I added to maskob.f the variable ZFE in the call from mask3d.f and in the subroutine maskob.f. Should this may generate the problem?

Add new fortran file (Declarations_Telemac3d removed) and cas01 updated


File Attachment:

File Name: tel3dv702.f
File Size: 108 KB




Have a nice soirée

Victor
The administrator has disabled public write access.

mask3d error in parallel run 8 years 10 months ago #19328

  • jmhervouet
  • jmhervouet's Avatar
Hola Victor,

I have now reproduced the problem, and it is probably our mistake. The array NELBOR is seemingly not initialised for points followed by a segment in another subdomain. However I have to look further, this is normally well known and well handled, so I have to understand why it pops up again with you, so you need to wait a while. Scalar mode is correct.

Jean-Michel
The administrator has disabled public write access.
  • Page:
  • 1
  • 2
Moderators: pham

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.