Welcome, Guest
Username: Password: Remember me

TOPIC: Memory problem with this compiler

Memory problem with this compiler 12 years 8 months ago #3935

  • nilsoberg
  • nilsoberg's Avatar
Hello,

I've installed the Perl version of Telemac on a Linux cluster using Intel MPI and Fortran. The cluster is managed by Sun Grid Engine (SGE). It appears that the installation is working but the simulation crashes right away. Here is a partial output:
================================================================================
ITERATION        0 TIME    0 D  0 H  0 MN   0.0000 S   (          0.0000 S)
================================================================================
An mpd is already running with console at /tmp/mpd2.console_noberg on node02.
Start mpd with the -n option for a second mpd on same host.
--------------------------------------------------------------------------------
                MASS BALANCE
 INITIAL MASS OF WATER IN THE DOMAIN :   7783727.18915503
 THE LIQUID BOUNDARIES FILE CONTAINS
         721  LINES WITH:
 Q(2)    Q(3)    SL(1)
 STREAMLINE: USING PARALLEL VERSION OF CHARACTERISTICS 6.1
  @STREAMLINE::ORG_CHARAC_TYPE:
  MEMORY PROBLEM WITH THIS COMPILER:
  ILB=  1275068429  NOT EQUAL TO CH_DELTA(1)=           0
  OR
  IUB= -1744828372  NOT EQUAL TO CH_DELTA(12)=         136

I looked at the code and this seems to be coming from SUBROUTINE ORG_CHARAC_TYPE1. I don't know much about MPI so I'm not sure why this is happening. I'm hoping that someone will be able to provide me with some advice. Attached is my systel.ini file.

Thanks,

Nils
The administrator has disabled public write access.

Re: Memory problem with this compiler 12 years 8 months ago #3937

  • jmhervouet
  • jmhervouet's Avatar
Hello,

It is the first time we see this, it is related to an assumption we do on how numbers in a Fortran structure are stored in memory by a compiler. We look more at this and will report. What compiler do you use ? (there is perhaps a compiler option to force contiguity of numbers in the memory ?)
This can only happen when you try parallelism, so a test in scalar mode would be useful in the while.

With best regards,

Jean-Michel Hervouet
The administrator has disabled public write access.

Re: Memory problem with this compiler 12 years 8 months ago #3941

  • ails
  • ails's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 140
  • Thank you received: 17
Hello,

We already encountered this problem when switching from MPI-1 to MPI-2 for dealing with 64-Bits adress. So, this can be a machine-dependant problem.

How many cores are you using? Is this bug reproducible when using different number of cores (sequential, 2, 4...).
Which release of Intel and Intel-MPI are you using?

Best regards,

Fabien

PS : More information at : publib.boulder.ibm.com/infocenter/clresc...oc%2Fam107_igadd.htm

For Fortran 64-bit codes, an INTEGER may not be enough to represent the upper bound. When the upper bound is known to be representable by an INTEGER, this subroutine remains usable at your own risk. New codes should always use MPI_TYPE_GET_EXTENT.
The administrator has disabled public write access.

Re: Memory problem with this compiler 12 years 8 months ago #3943

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi
I agree with fabien. It looks like the problem i encountered in version 6.0 when we try to run telemac on a 64 bits machine with the code using MPI1 compatiblity.
This has been solved in version 6.1 by upgrading the parallelism programmation for using MPI2 compatibility which manage the 64 bits address.
It's strange you obtain this message on your cluster if you use recent version of intel compiler...

Maybe youcould try 2 different tests.
  1. try to change the characteristic method for velocities in the steering file
  2. maybe you should focus on the first message in the screen output you copied. this maybe related to the mpd problem (not sure about this point but we never observed such message on our cluster)
hope this helps
Christophe
The administrator has disabled public write access.

Re: Memory problem with this compiler 12 years 8 months ago #3980

  • nilsoberg
  • nilsoberg's Avatar
First, thank you very much for the help. I've tried some of the things you suggested but I get the same error with 4 and 2 cores. I'm trying 1 core right now. However, is 1 core the same as running in serial? It still says "USING PARALLEL VERSION OF CHARACTERISTICS". If not, do I need to do a serial installation of telemac in addition to my parallel one?

My compilers are older. Fortran is 10.1 and MPI is 3.1:

Intel(R) MPI Library for Linux Version 3.1
Build 20080320 Platform Intel(R) 64 64-bit applications
Copyright (C) 2003-2008 Intel Corporation. All rights reserved

Unfortunately for me, upgrading is expensive and not an option at this point.

The MPD error message seems to be present with all of my Intel MPI jobs but I'm working on trying to remove it. Also I will try changing the characteristic method for velocities also. I will report back.

Thanks again.
The administrator has disabled public write access.

Re: Memory problem with this compiler 12 years 8 months ago #3982

  • jmhervouet
  • jmhervouet's Avatar
Hello,

Partial answers:

The difference between:

PARALLEL PROCESSORS : 0 and 1

was initially to test the scalar and parallel branches in algorithms and check that it did no differences, but as most algorithms do tests like :

IF(NCSIZE.GT.1) THEN...

it should make no difference, except perhaps calling CARACT or SCARACT for the method of characteristics, which may explain your message 'USING PARALLEL...', unless you have Thompson boundary conditions, in which case SCARACT is always called (in a near future CARACT will be suppressed anyway).

Regarding the memory problem, if you look at subroutine org_charac_type1.f there is an assumption that an integer is stored in the memory with the same size as a REAL 8 number, this could be a hint, and this could be an option of the compiler. This subroutine is very tricky and tries to overcome the fact that Fortran does not specify how numbers are stored in a structure.

With best regards,

Jean-Michel Hervouet

JMH
The administrator has disabled public write access.

Re: Memory problem with this compiler 12 years 8 months ago #4040

  • nilsoberg
  • nilsoberg's Avatar
Some more information:

1. Running with PARALLEL PROCESSORS = 1, the code ran fine (I ran it for a week and then stopped it)

2. I tried setting -i8 (integer size = 8) for ifort but that didn't work

3. Setting velocity characteristics = 2 allowed the code to run.

Now, though, I've recompiled telemac with gfortran+Intel MPI which avoids this issue for me.

Thanks for the help.

PS, everytime I install I have to edit the makefiles in parallel and spartacus2d, duplicating the all: block to parallel: block. I think this was supposed to be fixed per another post I saw on the forum but it apparently isn't in the subversion repository.
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.