Welcome, Guest
Username: Password: Remember me

TOPIC: uncontroled error from python:: OSError(17, 'File exists')

uncontroled error from python:: OSError(17, 'File exists') 11 years 2 weeks ago #10907

  • 716469
  • 716469's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 303
  • Thank you received: 6
Dear All,

I have tried to run postel3d on cluster in parallel but got errors. The cluster is BLue Ice 2 HPC, Platform - Linux, Library - OpenMPI, Language - Python, Compiler - Gfortran. Please see attached erorr files. Could anyone advise please. Thanks.

Kind Regards!

Violet
Attachments:
The administrator has disabled public write access.

uncontroled error from python:: OSError(17, 'File exists') 11 years 2 weeks ago #10908

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi Violet

I'm not a specialist of postel3d but with such error message, I was wondering if postel3d is able to run in parallel ...
Did you have similar message if you just try to run it in parallel but with only 1 processor?

Regards
Christophe
The administrator has disabled public write access.

uncontroled error from python:: OSError(17, 'File exists') 11 years 2 weeks ago #10909

  • 716469
  • 716469's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 303
  • Thank you received: 6
Hello,

Thanks for coming back to me. I got only postel3d error file with 1 processor when I tried to run in parallel but no output files at all. I think there is might be some missing link somewhere with extra environment variables.

Kind Regards!

Violet
Attachments:
The administrator has disabled public write access.

uncontroled error from python:: OSError(17, 'File exists') 11 years 2 weeks ago #10910

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi
Really strange!
Could you copy the command line you enter to run postel3d in both case?

In your parallel test, could you check the partition step is well achieve in your temporary directory?

In a normal run, the copy is done by 1 processor and then the partition is made (also by 1 process) to have the each file with an extension according to the number of parallel process. After that the mpirun command appear but no other copy are made after this moment.

But once again, after just checking quickly the source file, I'm not sure that postel3d could run in parallel.

Regards
Christophe
The administrator has disabled public write access.

uncontroled error from python:: OSError(17, 'File exists') 11 years 2 weeks ago #10911

  • 716469
  • 716469's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 303
  • Thank you received: 6
Hello,

Thanks for your reply. On PBS script (for both cases) I have:
${MPIRUN} -machinefile $PBS_NODEFILE -np ${NPROCS} runcode.py postel3d -s p3d.cas

For some reasons python runcode.py .... never worked for me even on desktop pc on Linux, it worked only if I enter directly runcode.py telemac3d.....

On terminal I normally just enter: qsub myrunall-pbs.sh

I am sorry, but my knowledge of IT are limited so I am not sure about partition step. But I just added qsub to my PATH directory and after running with 1 processor got another error message: P_INIT: FILE PARAL NOT FOUND. It has started to run and produced some files but does not produce result due to the error (please see attached).

I run all modules on linux on my desktop pc with no problems and postel as well. But my files are so huge now (very fine mesh on big scale) that I cannot do it on my pc anymore and have to move to the parallel process. I can run telemac cas files on cluster with very fine mesh but I need to get cross section and even if I save result file on an extension drive and run postel on my pc from the extension drive then terminal just shuts down as it cannot cope, just always same message RunCas: fail to run.

I am having same postel problem with new version v6p3 as well.

Thank you. I still have to figure out how to get the cross sections from the cluster.

Kind Regards!

Violet
Attachments:
The administrator has disabled public write access.

uncontroled error from python:: OSError(17, 'File exists') 11 years 2 weeks ago #10912

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi
it looks like a little bit better.
The problem with python runcode.py is not important if it's work with the command runcode.py directly.

Maybe my advise are not relevant but could you try to just have runcode.py postel3d -s p3d.cas in your PBS script. I hope this will run postel3d on scalar mode.
The problem with paral seems confirm that postel couldn't run in parallel.

Good luck
Christophe
The administrator has disabled public write access.
The following user(s) said Thank You: 716469

uncontroled error from python:: OSError(17, 'File exists') 11 years 2 weeks ago #10913

  • 716469
  • 716469's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 303
  • Thank you received: 6
Thank you very much for trying to help me. I have changed the script but still have some errors. I have attached the script and new errors I got. It could be anything there, I would not be surprised if I missed some extra path set up or special compilation. Just in case I attache my config file as it is very possible that i made mistake there too.

Thanks a mil. Have a nice evening! I would not want to keep you late in work trying to sort my problems.

Kind Regards!

Violeta
Attachments:
The administrator has disabled public write access.

uncontroled error from python:: OSError(17, 'File exists') 11 years 2 weeks ago #10916

  • 716469
  • 716469's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 303
  • Thank you received: 6
I just realised that had not attached the config file.

Kind Regards!

Violet
Attachments:
The administrator has disabled public write access.

uncontroled error from python:: OSError(17, 'File exists') 11 years 2 weeks ago #10919

  • c.coulet
  • c.coulet's Avatar
  • OFFLINE
  • Moderator
  • Posts: 3722
  • Thank you received: 1031
Hi Violeta

It seems that the problem is in the runcode.py command in your pbs script.
I think this is due to $ character.
Are you running your case in your root directory?
Maybe you should have a command like $HOME/runcode.py or $HOME/<PATH>/runcode.py where <PATH> is the real place you stored your case file.

About your config file, you have 2 differents configurations in it. How did you manage the choice between the both?
I don't see any -c option in the launching script so I suppose it was fixed in your profile. Is it right?

Finally, did you try to run postel without qsub? or is it impossible?

Good luck
Christophe
The administrator has disabled public write access.

uncontroled error from python:: OSError(17, 'File exists') 11 years 2 weeks ago #10920

  • 716469
  • 716469's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 303
  • Thank you received: 6
Dear Coulet (sorry I am not sure if it is your first name),

You are right that machine is confused on which configuration to pick. I thought to have both just in case as I need gfortran as well. I removed the ubugfortrans now and left ubugfopenmpi only, and I did not use -c in my compilation or runcode command. When I compiled then it compiled both configurations but when to run case then system probably was confused.

Also I was never sure what to enter on line "cmd_obj:" in config file for ubugfopenmpi configuration. I left it as gfortran but should not it be mpi90, same as for cmd_exe:? My root of the problem could be in configuration file, I hope so, then everything would be easy:). Thanks a mil.

After leaving one configuration only, recompiling and running postel3d I got another error. If I run it with pbs script through qsub or without qsub directly on command line, I get the same error. But I can run other modules on cluster terminal command directly without qsub but not postel3d, as you mention earlier it might not be possible at all.

But I have other ongoing problem that I try to fix as well, I think they all might have one source root: my telemac3d and 2d jobs are running in parallel but not really well, they go on queue, start to run and they stack, but at least I can run them on cluster terminal by executing runcode.py telemac3d .... I was persistent to sort the postel problem as I needed to get cross section from my result files either way: parallel or serial. I attache the config and error files.

Thanks again, your help and advice are much appreciated.

Kind Regards!

Violeta
Attachments:
The administrator has disabled public write access.
Moderators: borisb

The open TELEMAC-MASCARET template for Joomla!2.5, the HTML 4 version.