Welcome, Guest
Username: Password: Remember me

TOPIC: Parallel Segmentation Fault

Parallel Segmentation Fault 13 years 3 months ago #2019

  • olslewfoot
  • olslewfoot's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 132
  • Thank you received: 3
Dear all

I have a model domain over which I have successfully run a 2D simulation with several different variations, in single and parallel modes.
I modified boundary conditions by programming the Bord subroutine and the new model compiles, links and runs perfectly in single mode, but in parallel mode it compiles and links successfully but fails with a segmentation fault on the first iteration. The message line is "forrtl: severe (174): SIGSEGV, segmentation fault occured".

The partel log suggests that the mesh division has been successfully achieved and there are no other error messages.

Can you suggest where the problem could lie or how I may debug this further?

Thanks

John
The administrator has disabled public write access.

Re:Parallel Segmentation Fault 13 years 3 months ago #2022

  • jmhervouet
  • jmhervouet's Avatar
Hello,

This is a classical case, you probably work with global numbers of points in Bord, and in parallel mode these numbers are changed. The numbering of points on boundaries is also changed. So programming Bord in parallel is a bit tricky. You can rely on array KNOLG. MESH%KNOLG%I(N) gives you the original global number of point which has number N in the local sub-domain. Generally it costs a loop on all points or on all boundary points. Suppose you did some action on original global point 10352, you would now have:

! loop on local boundary points
DO K=1,NPTFR
IF(MESH%KNOLG%I(NBOR(K)).EQ.10352) THEN
!
here boundary point K is in your sub-domain
you can apply your boundary treatment
!
ENDIF
ENDDO

There is no equivalent of KNOLG for boundary points, so it is better to know points by their global number (last but one column in boundary conditions file, the last being the boundary point number which is just actually the line number in this file).

I hope this will shed some light on your problem.

With best regards,

Jean-Michel Hervouet
The administrator has disabled public write access.

Re:Parallel Segmentation Fault 13 years 3 months ago #2025

  • olslewfoot
  • olslewfoot's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 132
  • Thank you received: 3
Thank-you Jean-Michel.

That is a very clear description of the problem and how I may solve it.

John
The administrator has disabled public write access.

Re:Parallel Segmentation Fault 4 years 3 months ago #36559

  • rosparsw
  • rosparsw's Avatar
Hello Jean-Michel,

I am facing the same issue as John so I tried to use your advice and I made the following code (I am trying to initialize a thermohaline stratification) :

!     BEGIN OF PART SPECIFIC TO TETRA OR STRATIFICATION CASE
!
      DO K=1,NPTFR2
	    IF(MESH%KNOLG%I(NBOR(K)).EQ.KENT) THEN
          DO NP=1,NPLAN
            IBORD = (NP-1)*NPTFR2+K
            IF(LITABL%ADR(1)%P%I(IBORD).EQ.KENT) THEN
!! BEGINNING OF SPECIFIC TO TETRA CASE
!!            IF(NP.LE.4) THEN
!!              TABORL%ADR(1)%P%R(IBORD) = 40.D0
!!            ELSE
!!              TABORL%ADR(1)%P%R(IBORD) = 30.D0
!!            ENDIF
!! END OF SPECIFIC TO TETRA CASE
!!
!! BEGINNING OF SPECIFIC TO STRATIFICATION CASE
!!           STRATIFICATION PUT AT THE ENTRANCE
              IF(NP.GT.1) THEN
                TABORL%ADR(1)%P%R(IBORD) = 3.9E-3 !Salinite ups
			    TABORL%ADR(2)%P%R(IBORD) = 10.D0 !Temperature deg C
			  ELSEIF(NP.GT.2) THEN
                TABORL%ADR(1)%P%R(IBORD) = 3.9E-3
			    TABORL%ADR(2)%P%R(IBORD) = 10.D0
			  ELSEIF(NP.GT.3) THEN
                TABORL%ADR(1)%P%R(IBORD) = 3.9E-3
			    TABORL%ADR(2)%P%R(IBORD) = 10.D0
			  ELSEIF(NP.GT.4) THEN
                TABORL%ADR(1)%P%R(IBORD) = 7.8E-3
			    TABORL%ADR(2)%P%R(IBORD) = 13.D0
			  ELSEIF(NP.GT.5) THEN
                TABORL%ADR(1)%P%R(IBORD) = 12E-3
			    TABORL%ADR(2)%P%R(IBORD) = 14.D0
			  ELSEIF(NP.GT.6) THEN
                TABORL%ADR(1)%P%R(IBORD) = 15E-3
			    TABORL%ADR(2)%P%R(IBORD) = 14.D0
			  ELSEIF(NP.GT.7) THEN
                TABORL%ADR(1)%P%R(IBORD) = 17E-3
			    TABORL%ADR(2)%P%R(IBORD) = 17.D0
			  ELSEIF(NP.GT.8) THEN
                TABORL%ADR(1)%P%R(IBORD) = 23E-3
			    TABORL%ADR(2)%P%R(IBORD) = 18.D0
			  ELSEIF(NP.GT.9) THEN
                TABORL%ADR(1)%P%R(IBORD) = 23E-3
			    TABORL%ADR(2)%P%R(IBORD) = 20.D0
			  ELSEIF(NP.GT.10) THEN
                TABORL%ADR(1)%P%R(IBORD) = 23E-3
			    TABORL%ADR(2)%P%R(IBORD) = 22.D0
			  ELSE
                TABORL%ADR(1)%P%R(IBORD) = 23E-3
			    TABORL%ADR(2)%P%R(IBORD) = 26.D0
              ENDIF
!! END OF SPECIFIC TO STRATIFICATION CASE
!          ENDIF
!        ENDDO
!      ENDDO
!      DO K=1,NPTFR2
!        DO NP=1,NPLAN
!          IBORD = (NP-1)*NPTFR2+K
!          IF(LITABL%ADR(1)%P%I(IBORD).EQ.KENT) THEN
! BEGINNING OF SPECIFIC TO TETRA CASE
!            IF(NP.LE.4) THEN
!              TABORL%ADR(1)%P%R(IBORD) = 40.D0
!            ELSE
!              TABORL%ADR(1)%P%R(IBORD) = 30.D0
!            ENDIF
! END OF SPECIFIC TO TETRA CASE
!
! BEGINNING OF SPECIFIC TO STRATIFICATION CASE
!           STRATIFICATION PUT AT THE ENTRANCE
!            IF(NP.GT.18) THEN
!              TABORL%ADR(1)%P%R(IBORD) = 28.D0
!            ENDIF
! END OF SPECIFIC TO STRATIFICATION CASE

            ENDIF
		  ENDDO
		ENDIF
	  ENDDO

However, when I try to run, I get the following errors :

Sanstitre.png



I also attached my partel log even though evry thing seems ok on this side.


Could you give a look to my code ?


Regards


William
Attachments:
The administrator has disabled public write access.

Re:Parallel Segmentation Fault 4 years 3 months ago #36564

  • pham
  • pham's Avatar
  • OFFLINE
  • Administrator
  • Posts: 1559
  • Thank you received: 602
Hello William,

After many years working for TELEMAC, Jean-Michel retired a few years ago and no long works on TELEMAC. That is why he appears in grey on this forum.

Anyway, I do not think you face the same issue as John.
Please read carefully the error message:
Error: Symbol 'mesh' at (1) has no IMPLICIT type means that you that variable MESH has not been declared in the current subroutine, as there must be a declaration IMPLICIT NONE.
If starting to implement some FORTRAN lines, I would strongly recommend you to read a FORTRAN manual because this forum is not intended to FORTRAN lessons.

I wonder why you want to use KNOLG as you do not seem to need any global node numbers. What your write is to compare a global number and a code to know what kind of boundary conditions you have (KENT = 5).

I think you just have to start from the stratification or tetra examples as you did and adapt it to what you want to do.

Do not forget that NP is an integer, so comparisons with .GT. would be equivalent to .GE. e.g. (and easier to read). You can also directly use .EQ. to define the tracer values for some specific horizontal planes.
You can adapt the following lines, I did not test it, maybe id does not compile, I have not checked.
I thing you can also define plane number 1, otherwise with your implementation,
TABORL%ADR(1)%P%R(IBORD) = 23.D-3
TABORL%ADR(2)%P%R(IBORD) = 26.D0
      DO K=1,NPTFR2
        DO NP=1,NPLAN
          IBORD = (NP-1)*NPTFR2+K
          IF(LITABL%ADR(1)%P%I(IBORD).EQ.KENT) THEN
            IF(NP.GT.1.AND.NP.LE.4) THEN
              TABORL%ADR(1)%P%R(IBORD) = 3.9D-3 !Salinite ups
              TABORL%ADR(2)%P%R(IBORD) = 10.D0 !Temperature deg C
            ELSEIF(NP.EQ.5) THEN
              TABORL%ADR(1)%P%R(IBORD) = 7.8D-3
              TABORL%ADR(2)%P%R(IBORD) = 13.D0
            ELSEIF(NP.EQ.6) THEN
              TABORL%ADR(1)%P%R(IBORD) = 12.D-3
              TABORL%ADR(2)%P%R(IBORD) = 14.D0
            ELSEIF(NP.EQ.7) THEN
              TABORL%ADR(1)%P%R(IBORD) = 15.D-3
              TABORL%ADR(2)%P%R(IBORD) = 14.D0
            ELSEIF(NP.EQ.8) THEN
              TABORL%ADR(1)%P%R(IBORD) = 17.D-3
              TABORL%ADR(2)%P%R(IBORD) = 17.D0
            ELSEIF(NP.EQ.9) THEN
              TABORL%ADR(1)%P%R(IBORD) = 23.D-3
              TABORL%ADR(2)%P%R(IBORD) = 18.D0
            ELSEIF(NP.EQ.10) THEN
              TABORL%ADR(1)%P%R(IBORD) = 23.D-3
              TABORL%ADR(2)%P%R(IBORD) = 20.D0
            ELSEIF(NP.GT.10) THEN
              TABORL%ADR(1)%P%R(IBORD) = 23.D-3
              TABORL%ADR(2)%P%R(IBORD) = 22.D0
            ELSE
              TABORL%ADR(1)%P%R(IBORD) = 23.D-3
              TABORL%ADR(2)%P%R(IBORD) = 26.D0
            ENDIF
          ENDIF
        ENDDO
      ENDDO

Hope this helps (and please stop using tabulations as already told in another topic, your FORTRAN code is unreadable and for sure once at least it will not compile),

Chi-Tuan
The administrator has disabled public write access.

Re:Parallel Segmentation Fault 4 years 3 months ago #36567

  • rosparsw
  • rosparsw's Avatar
Hello Chi-Tuan,

I didn't know that J-M Hervouet retired, thanks for telling me.

Actually, my case of stratification works well (I used the routines of the stratification case in the telemac3d examples folder) when I run it in serial mode. However, when I try to use the same code but with the parallel mode of TELEMAC, it says that there's a SIGSEGV error, so same issue as John isn't it ?

That's why I tried to modify my lines as suggested by J-M Hervouet at the time but not very sure how to code it as I'm quite a beginner in Fortran coding.
As you suggested, my definition of my planes were not very clear and I changed it.

Also, I don't really understand your observation about the tabulation, I'm not doing any tabulation, just spacing (with the space bar).



Regards.

William
The administrator has disabled public write access.

Re:Parallel Segmentation Fault 4 years 3 months ago #36570

  • rosparsw
  • rosparsw's Avatar
Edit to my previous post : I now understand what you meant when you talked about the tabulation but I don't know why they were in my fortran file.


William
The administrator has disabled public write access.

Re:Parallel Segmentation Fault 4 years 3 months ago #36574

  • rosparsw
  • rosparsw's Avatar
After taking into account your advices, here is the error message i get :

================================================================================

ITERATION        0 TIME    0 D  0 H  0 MN   0.0000 S   (          0.0000 S)
================================================================================

--------------------------------------------------------------------------------

                MASS BALANCE

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

job aborted:
rank: node: exit code[: error message]
0: J0E94ET-5.intranet.cabinet-merlin.fr: 123
1: J0E94ET-5.intranet.cabinet-merlin.fr: 123
2: J0E94ET-5.intranet.cabinet-merlin.fr: 3: process 2 exited without calling fin
alize
3: J0E94ET-5.intranet.cabinet-merlin.fr: 123
Traceback (most recent call last):
  File "c:\opentelemac-mascaret\v8p1r1\scripts\python3\telemac3d.py", line 7, in
 <module>
    main('telemac3d')
  File "c:\opentelemac-mascaret\v8p1r1\scripts\python3\runcode.py", line 272, in
 main
    run_study(cas_file, code_name, options)
  File "c:\opentelemac-mascaret\v8p1r1\scripts\python3\execution\run_cas.py", li
ne 157, in run_study
    run_local_cas(my_study, options)
  File "c:\opentelemac-mascaret\v8p1r1\scripts\python3\execution\run_cas.py", li
ne 65, in run_local_cas
    my_study.run(options)
  File "c:\opentelemac-mascaret\v8p1r1\scripts\python3\execution\study.py", line
 610, in run
    self.run_local()
  File "c:\opentelemac-mascaret\v8p1r1\scripts\python3\execution\study.py", line
 445, in run_local
    run_code(self.run_cmd, self.sortie_file)
  File "c:\opentelemac-mascaret\v8p1r1\scripts\python3\execution\run.py", line 1
82, in run_code
    raise TelemacException('Fail to run\n'+exe)
utils.exceptions.TelemacException: Fail to run
C:\opentelemac-mascaret\mpich2\bin\mpiexec.exe -wdir C:\Users\invite-0108\Deskto
p\04-TELEMAC\TELEMAC_Abidjan\02-Telemac_simulation_files\01-Modele_partiel\03-TE
LEMAC_3D\t3d_Abidjan.cas_2020-08-12-14h33min25s -n 4 C:\Users\invite-0108\Deskto
p\04-TELEMAC\TELEMAC_Abidjan\02-Telemac_simulation_files\01-Modele_partiel\03-TE
LEMAC_3D\t3d_Abidjan.cas_2020-08-12-14h33min25s\out_user_fortran.exe
The administrator has disabled public write access.

Re:Parallel Segmentation Fault 4 years 3 months ago #36575

  • pham
  • pham's Avatar
  • OFFLINE
  • Administrator
  • Posts: 1559
  • Thank you received: 602
Hello William,

When you get segmentation fault or NaN, not a PLANTE with an error message (foreseen error), the best way to investigate is to run a debug configuration. You should add one to your configuration file (e.g. systel.cfg). You can find some helps with the search feature of this forum.

In practice, depending on the compiler and OS you use, you can add some options such as:
- for gfortran: -g -Wall -fcheck=all -fbacktrace -fbounds-check -finit-integer=-1 -finit-real=nan -ffpe-trap=invalid,zero,overflow
- for intel: -debug all -check all -traceback

It works for distributions like debian (you can have a look at systel.edf.cfg in directory configs). For Windows, I do not know if all options are available (I do not use Windows). Try them all and if it is told it does not one of them, remove it.

The debug options will check many things, in particular NaN, segmentation faults, bad initialisations etc. and is advised when starting a new model, just to be sure it is OK with all issues described above. The drawback is that the release is slower because of all checks and you should then turn to standard release when it is OK with debug after a few time steps.

Hope this helps,

Chi-Tuan
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.