Hi all,
Sometime back we looked into Partel Optimisation. What we found was that the “partel” partitioning tool processes each partition sequentially and re-reads the entire input dataset from disk in every instance. This leads to very long runtimes when the overall number of partitions is large (hundreds). A straightforward optimisation in “parres.f” allocates a very large array that holds the entire input data file in RAM; it is populated when the first partition is processed. This approach minimises code changes (highlighted in red):
[…]
DOUBLE PRECISION,ALLOCATABLE :: VAL(:),VAL_INP(:)
! VERY LARGE MEMORY BUFFER TO AVOID RE-READING INPUT DATA
DOUBLE PRECISION,ALLOCATABLE :: INPUTBUF(:,:,:)
! GEOMETRY INFORMATION
INTEGER NPOIN_GEO,TYP_ELM_GEO,NELEM_GEO,NPTFR_GEO,NPTIR_GEO,
& NDP_GEO,NPLAN_GEO
[…]
ALLOCATE(VAL(NPOIN_P),STAT=IERR)
CALL CHECK_ALLOCATE(IERR,'PARRES:VAL')
! FIRST PARTITION: ALLOCATE BUFFER FOR INPUT DATA
IF ( IPART .EQ. 1 ) THEN
WRITE(LU,*) 'ALLOCATING LARGE MEMORY BUFFER WITH',
& NPOIN_INP*NVAR_INP*NTIMESTEP,
& 'ELEMENTS'
ALLOCATE(INPUTBUF(NPOIN_INP, NVAR_INP, NTIMESTEP), STAT=IERR)
CALL CHECK_ALLOCATE(IERR,'PARRES:INPUTBUF')
END IF
! LOOPING ON THE TIMESTEP AND VARIABLE OF INP FILE
DO ITIME=1,NTIMESTEP
CALL GET_DATA_TIME(INPFORMAT,NINP,ITIME-1,TIMES,IERR)
CALL CHECK_CALL(IERR,'PARTEL:GET_DATA_TIME:NINP')
WRITE(LU,*) ' -- WRITING TIMESTEP',ITIME-1,' AT',REAL(TIMES)
! Loop on all the variables
DO IVAR=1,NVAR_INP
! POPULATE BUFFER
IF ( IPART .EQ. 1 ) THEN
CALL GET_DATA_VALUE(INPFORMAT,NINP,ITIME-1,
& VARLIST(IVAR)(1:16),VAL_INP,
& NPOIN_INP,IERR)
INPUTBUF(:, IVAR, ITIME) = VAL_INP(:)
ENDIF
! GETTING THE VALUE NEEDED FOR THAT PARTITION
IF(NPLAN_INP.GT.1) THEN
DO I=1,NPOIN_P
VAL(I) = INPUTBUF(KNOLG3D(I), IVAR, ITIME)
ENDDO
ELSE
DO I=1,NPOIN_P
VAL(I) = INPUTBUF(KNOLG(I), IVAR, ITIME)
ENDDO
ENDIF
CALL ADD_DATA(INPFORMAT,NINP_PAR,VARLIST(IVAR),TIMES,
& ITIME-1,IVAR.EQ.1,VAL,NPOIN_P,IERR)
CALL CHECK_CALL(IERR,'PARRES:ADD_DATA:NINP_PAR')
ENDDO
ENDDO
The runtime machine will evidently need to have sufficiently large memory in case of high-resolution meteorology files with long timeseries, but the speedup is significant.
For example, running partel in its original and optimised versions to split an 1.4 GiB atmospheric data file holding 48 timesteps with temperature, pressure, and two wind component fields on 1,842,047 mesh nodes and 3,614,079 cells (triangles) into 200 partitions resulted in the following runtimes:
• Original version: 1259s
• Optimised version: 20s (x63 speed-up)
Both versions produced bit-identical output files. The optimised version required around 8 Bytes x 1842047 mesh nodes x 4 fields x 48 timesteps = 2.6 GiB of memory - around twice the size of the atmospheric data file, as the latter stores data in single precision format, while partel operates in double precision internally. A more elaborate implementation of the data caching mechanism would avoid this doubling of RAM requirements, at the expense of having to implement more significant code changes.