Jump to content


Photo

Distributed Computing - How to?!


  • Please log in to reply
14 replies to this topic

#1 MojDa01

MojDa01

    Member

  • Members
  • PipPip
  • 13 posts

Posted 17 May 2018 - 04:07 PM

Dear All,

 

at the moment I have limited computing capacity with my 4 core laptop. I am running covariates searches and bootstraps with the "Local_MPI_4" option. As one can imagine it takes quite some time.
I am interested in the grid computing presented here:

 

https:// www. youtube. com/watch?v=SLWuczDIZhM

 

I aimm at setting up a local multi-core server for computing, but the question is, what do I need? And what are the requirments of Phoenix NLME?

 

* Is there a limit to the number of cores that can simultaneously work?

* Does it matter if I choose windows or unix?

* how do you set up the local server to do the computations which I sent form my laptop?

 

--> Grid computing is quite new to me and I haven't set up such a system yet.

 

Hope you can help!

 

Best,

Daniel


Edited by MojDa01, 17 May 2018 - 04:10 PM.

  • Thomasgaks and Davidpogma like this

#2 f_yc

f_yc

    Newbie

  • Members
  • Pip
  • 6 posts

Posted 18 May 2018 - 01:15 AM

Dear All,

 

at the moment I have limited computing capacity with my 4 core laptop. I am running covariates searches and bootstraps with the "Local_MPI_4" option. As one can imagine it takes quite some time.
I am interested in the grid computing presented here:

 

https:// www. youtube. com/watch?v=SLWuczDIZhM

 

I aimm at setting up a local multi-core server for computing, but the question is, what do I need? And what are the requirments of Phoenix NLME?

 

* Is there a limit to the number of cores that can simultaneously work?

* Does it matter if I choose windows or unix?

* how do you set up the local server to do the computations which I sent form my laptop?

 

--> Grid computing is quite new to me and I haven't set up such a system yet.

 

Hope you can help!

 

Best,

Daniel

 

Dear Daniel,

 

Phoenix's help file contains a description of this and you can check it out.
 
For Phoenix 8.0, this chapter is "Phoenix NLME User's Guide", "Job Control Setup"
 
 
Best,
f_yc


#3 MojDa01

MojDa01

    Member

  • Members
  • PipPip
  • 13 posts

Posted 18 May 2018 - 06:54 AM

Hi f_yc,

 

thanks! Reading at the moment...

 

best

Daniel



#4 bwendt@certara.com

bwendt@certara.com

    Advanced Member

  • Administrators
  • 282 posts

Posted 18 May 2018 - 07:35 AM

Hi Daniel,

 

we just had a webinar around this topic yesterday:

 

youtu.be/pGOnRqKaDS0?a 

 

 

Let me know if the descriptions in the manual are sufficient. We can talk over the phone if there are questions.

 

 

Bernd



#5 Gilles TUFFAL

Gilles TUFFAL

    Member

  • Members
  • PipPip
  • 14 posts

Posted 27 May 2019 - 02:54 PM

Hi Bern, Hi, all, 

 

It seems I may be given THE chance to test and use Phoenix in a Multiprocessing environment.

 

I think the key thing is that IT folks undertstood that only a Linux compiled NLME module is replicated in the Linux cluster (the understanding we have of the Phoenix docs). 

 

When discusssing between (my) internal IT support and Linux cluster holder (outsourced),  they exchanged the following.

 

 

____________________________________________

 

The key question is how your client and the cluster needs to be connected.
 
We have several constraint due to sanofi  security requirement, and one of them is that direct connection from sanofi network to the YYYYYYY platform is not possible : that means if your job submission goes through other protocols than http/https, it won’t work : FTP is not possible.
 
Does Certara have technical documentation regarding the connection between Phoenix and the grid?
 
________________________________________________
 
Do you have other documents than the ones available through the help.
 
Many Thanks.
 
Gilles


#6 Simon Davis

Simon Davis

    Advanced Member

  • Administrators
  • 1,329 posts

Posted 29 May 2019 - 01:50 PM

Hi Gilles,  Phoenix uses sftp i.e. (SSH File Transfer Protocol) so this should be possible. Does this help your IT group make a SSH connection between client and server.

 

   Simon.



#7 0521

0521

    Advanced Member

  • Members
  • PipPipPip
  • 46 posts

Posted 10 June 2019 - 08:46 AM

Hi All,

I tried to build a Linux node myself, but I encountered problems.

 

My environment:

Phoenix client:

Operating system: Windows10

Phoenix version: Phoenix 8.1

 

 

Linux server:

Operating system: CentOS Linux 7

IP: 192.168.31.130

Installed software: epel-release, gcc, R, ksh, libxml2-devel, nfs-utils, rpcbind, torque-4.2.9.tar.gz, openssl-devel, boost-devel, libtool-y

R version: 3.5.2

The following R packages are installed: batchtools, XML, reshape, Certara.NLME8

Mounted shared directory: mount -t nfs 192.168.31.130: /var/tmp/nlme /mnt

The TORQUE job control software is installed.

 

[root@master /]# qnodes

cn1

     state = free

     np = 2

     ntype = cluster

     status = rectime=1560150776,varattr=,jobs=,state=free,netload=491676,gres=,loadave=0.00,ncpus=2,physmem=3865308kb,availmem=5562644kb,totmem=5962456kb,idletime=655,nusers=2,nsessions=4,sessions=1543 1555 1605 1677,uname=Linux cn1 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64,opsys=linux

     mom_service_port = 15002

     mom_manager_port = 15003

 

master

     state = free

     np = 2

     ntype = cluster

     status = rectime=1560150774,varattr=,jobs=,state=free,netload=284510000,gres=,loadave=0.00,ncpus=2,physmem=3865308kb,availmem=4256656kb,totmem=5962456kb,idletime=74353,nusers=3,nsessions=3,sessions=1654 1498 29642,uname=Linux master 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64,opsys=linux

     mom_service_port = 15002

     mom_manager_port = 15003

 

scene 1:

Configuration: 192.168.31.130|Linux|MultiCore|test3||/mnt|/bin/R|2|

 

1.1 In the "simple" and "Predictive"  Run mode, I can submit the NLME task to Linux. Linux can complete the task calculation, and the result can be returned to Phoenix from Linux.

 

1.2 In the "Bootstrap" Run mode, I can submit the NLME task to Linux. Linux can complete the task calculation, but the Phoenix client confirms that it is running, and the result is not returned to Phoenix.

You can see all the results of the calculation in the Linux directory.

1.png

 

But the Phoenix client always shows "Running NLME on system"

2.png

 

View the file "DME_BO~1.619-496/NlmeRemote.LOG" in the Linux shared directory to get the following information:

nohup: 忽略输入

/usr/bin/R

Rscript /mnt/InstallDirNLME/bootstrap.r MultiCore /mnt /mnt/DME_BO~1.619-496 3 1000 2 2 test.mdl cols1.txt data1.txt 9316 nlmeargs.txt nlmeargs.txt test.mdl nlmeargs.txt cols1.txt data1.txt test.mdl 2 95

WORKING_DIR=/mnt/NLME173ac7f6fe137/NLME173ac12b7b585,MPIFLAG=MPINO, LOCAL_HOST=NO,NUM_NODES=1,SHARED_DRIVE=

model=test.mdl, nlmeDir=/mnt/InstallDirNLME

Deleting files

-------------------------------------------------------------

--------------------  Translating  --------------------------

/mnt/InstallDirNLME/TDL4 /hash 1408304074 /L ./test.mdl ./Work

Done

-------------------------------------------------------------

------------------- Compliling *.c  -------------------------

-------------------------------------------------------------

----------------------- Linking -----------------------------

-------------------------------------------------------------

ln: 无法创建符号链接"/mnt/NLME173ac7f6fe137/NLME173ac12b7b585/NLME7.exe": 文件已存在

NULL

Warning messages:

1: In stuff[row] <- currentList : 被替换的项目不是替换值长度的倍数

2: In stuff[row] <- currentList : 被替换的项目不是替换值长度的倍数

 

 

1.3 In the "Cov.Srch.Stepwise" Run mode, I can submit the NLME task to Linux. Linux can complete the task calculation, but the Phoenix client confirms that it is running, and the result is not returned to Phoenix.

You can see all the results of the calculation in the Linux directory.

3.png

 

 

But the Phoenix client always shows "Running NLME on system"

 

View the file "DME_SI~1.512-480/NlmeRemote.LOG" in the Linux shared directory to get the following information:

nohup: 忽略输入

/usr/bin/R

Rscript /mnt/InstallDirNLME/stepwise_covarsrch.r MultiCore /mnt/InstallDirNLME /mnt /mnt/DME_SI~1.512-480 test.mdl nlmeargs.txt test.mdl cols1.txt data1.txt nlmeargs.txt 3 V-wt V-apgr Ke-wt  -2LL:1,1,1 0.01 0.001 2 Pheno Model

WORKING_DIR=/mnt/NLME16aa821e9ed02/NLME16aa87e37534d,MPIFLAG=MPINO, LOCAL_HOST=NO,NUM_NODES=1,SHARED_DRIVE=

model=test.mdl, nlmeDir=/mnt/InstallDirNLME

Deleting files

-------------------------------------------------------------

--------------------  Translating  --------------------------

/mnt/InstallDirNLME/TDL4 /hash 1408304609 /L ./test.mdl ./Work

Done

-------------------------------------------------------------

------------------- Compliling *.c  -------------------------

-------------------------------------------------------------

----------------------- Linking -----------------------------

-------------------------------------------------------------

unix2dos: converting file /mnt/NLME16aa821e9ed02/NLME16aa87e37534d/jobs/01/1//out000.txt to DOS format ...

unix2dos: converting file /mnt/NLME16aa821e9ed02/NLME16aa87e37534d/jobs/02/2//out100.txt to DOS format ...

unix2dos: converting file /mnt/NLME16aa821e9ed02/NLME16aa87e37534d/jobs/03/3//out010.txt to DOS format ...

unix2dos: converting file /mnt/NLME16aa821e9ed02/NLME16aa87e37534d/jobs/04/4//out001.txt to DOS format ...

WORKING_DIR=/mnt/NLME16aa8771a8725/NLME16aa83f361009,MPIFLAG=MPINO, LOCAL_HOST=NO,NUM_NODES=1,SHARED_DRIVE=

model=test.mdl, nlmeDir=/mnt/InstallDirNLME

Deleting files

-------------------------------------------------------------

--------------------  Translating  --------------------------

/mnt/InstallDirNLME/TDL4 /hash 1408304609 /L ./test.mdl ./Work

Done

-------------------------------------------------------------

------------------- Compliling *.c  -------------------------

-------------------------------------------------------------

----------------------- Linking -----------------------------

-------------------------------------------------------------

unix2dos: converting file /mnt/NLME16aa8771a8725/NLME16aa83f361009/jobs/01/1//out110.txt to DOS format ...

unix2dos: converting file /mnt/NLME16aa8771a8725/NLME16aa83f361009/jobs/02/2//out101.txt to DOS format ...

[1] "/mnt/NLME16aa821e9ed02" "/mnt/NLME16aa8771a8725"

 

 

Scene 2:

Configuration: 192.168.31.130|Linux|TORQUE|test4||/mnt|/bin/R|2|

In this scenario, any "run mode" cannot be completed.

 

In the "simple" and "Predictive" modes, I can submit the NLME task to Linux. Linux can complete the task calculation, but the Phoenix client confirms that it is running and the result is not returned to Phoenix..

 

2.1 In the simple and Predictive mode, I can submit the NLME task to Linux. Linux can complete the task calculation, but the calculation result cannot return to Phoenix from Linux.

You can see all the results of the calculation in the Linux directory.

4.png

 

But the Phoenix client always shows "Running NLME on system"

 

5.png

 

DME_SI~1.113-909/NlmeRemote.LOG:

/usr/bin/R

Rscript /mnt/InstallDirNLME/generic_run.r COVAR_SEARCH TORQUE /mnt/InstallDirNLME /mnt /mnt/DME_SI~1.113-909 nlmeControlFile.txt 2 SingleNlme

载入需要的程辑包:data.table

No readable configuration file found

Created registry in '/mnt/NLME157711491531c/NLME15771147abe8c/registry' using cluster functions 'Interactive'

WORKING_DIR=/mnt/NLME157711491531c/NLME15771147abe8c,MPIFLAG=MPINO, LOCAL_HOST=NO,NUM_NODES=1,SHARED_DRIVE=

model=test.mdl, nlmeDir=/mnt/InstallDirNLME

Deleting files

-------------------------------------------------------------

--------------------  Translating  --------------------------

/mnt/InstallDirNLME/TDL4 /hash 1408305253 /L ./test.mdl ./Work

Done

-------------------------------------------------------------

------------------- Compliling *.c  -------------------------

-------------------------------------------------------------

----------------------- Linking -----------------------------

-------------------------------------------------------------

Adding 1 jobs ...

Submitting 1 jobs in 1 chunks using cluster functions 'TORQUE' ...

unix2dos: converting file /mnt/NLME157711491531c/NLME15771147abe8c/../out000001.txt to DOS format ...

unix2dos: converting file /mnt/NLME157711491531c/NLME15771147abe8c/../nlme7engine.log to DOS format ...

[1] "removeRegistry() AGAIN"

[1] "/mnt/NLME157711491531c"

 

 

What caused this problem?
How can I solve it?
 
Best,
0521

Attached Thumbnails

  • 1.png
  • 2.png
  • 3.png
  • 4.png
  • 5.png


#8 fsoltanshahi

fsoltanshahi

    Newbie

  • Members
  • Pip
  • 4 posts

Posted 17 June 2019 - 07:50 PM

Can you attach the content of progress.xml?

 

Cheers,

 

Fred



#9 0521

0521

    Advanced Member

  • Members
  • PipPipPip
  • 46 posts

Posted 18 June 2019 - 02:52 AM

Can you attach the content of progress.xml?

Cheers,

Fred

Hi Fred,

Thank you very much for replying to me.

This is the corresponding "progress.xml" file for 1.2:
<progress><MachineName>LocalHost</MachineName><ParallelProtocol>MultiCore</ParallelProtocol><StartTime>6月 2019 10 07时55分01秒</StartTime><EndTime>6月 2019 10 07时55分09秒</EndTime><Status>Finished</Status><NumOfSamples>2</NumOfSamples><NumOfSamplesCompleted>2</NumOfSamplesCompleted><NumOfSamplesFailed>0</NumOfSamplesFailed><NumOfSamplesExpired>0</NumOfSamplesExpired><NumOfSamplesErrored>0</NumOfSamplesErrored><ProgressStage/><DetailInfoLine1>Preparing files for Bootstrap run</DetailInfoLine1><DetailInfoLine2/><DetailInfoLine3/></progress>

I attached all the files under the “DME~” folder corresponding to the 1.2 scene and the 2.1 scene.

Best,
0521

Attached Files


Edited by 0521, 18 June 2019 - 02:52 AM.


#10 fsoltanshahi

fsoltanshahi

    Newbie

  • Members
  • Pip
  • 4 posts

Posted 18 June 2019 - 03:51 PM

Hi Fred,

Thank you very much for replying to me.

This is the corresponding "progress.xml" file for 1.2:
<progress><MachineName>LocalHost</MachineName><ParallelProtocol>MultiCore</ParallelProtocol><StartTime>6月 2019 10 07时55分01秒</StartTime><EndTime>6月 2019 10 07时55分09秒</EndTime><Status>Finished</Status><NumOfSamples>2</NumOfSamples><NumOfSamplesCompleted>2</NumOfSamplesCompleted><NumOfSamplesFailed>0</NumOfSamplesFailed><NumOfSamplesExpired>0</NumOfSamplesExpired><NumOfSamplesErrored>0</NumOfSamplesErrored><ProgressStage/><DetailInfoLine1>Preparing files for Bootstrap run</DetailInfoLine1><DetailInfoLine2/><DetailInfoLine3/></progress>

I attached all the files under the “DME~” folder corresponding to the 1.2 scene and the 2.1 scene.

Best,
0521

 

Please check progress.xml file on the desktop, it should be in %TMP%/Phoenix/DME_xxxxx to see if it matches progress.xml on the Linux side.

 

I would also check Phoenix log files for any connections errors, this sound to me like Phoenix is loosing connection to remote system and cannot get updated on job's status(completion).



#11 0521

0521

    Advanced Member

  • Members
  • PipPipPip
  • 46 posts

Posted 18 June 2019 - 04:22 PM

Please check progress.xml file on the desktop, it should be in %TMP%/Phoenix/DME_xxxxx to see if it matches progress.xml on the Linux side.

 

I would also check Phoenix log files for any connections errors, this sound to me like Phoenix is loosing connection to remote system and cannot get updated on job's status(completion).

Hi Fred,

 

The "progress.xml" in the Windows "%TMP%/Phoenix/DME_xxxxx"  is this:

<progress>
  <MachineName>LocalHost</MachineName>
  <ParallelProtocol>TORQUE</ParallelProtocol>
  <StartTime>6月 2019 18 16时10分22秒</StartTime>
  <EndTime>6月 2019 18 16时10分29秒</EndTime>
  <Status>Finished</Status>
  <NumOfSamples>1</NumOfSamples>
  <NumOfSamplesCompleted>1</NumOfSamplesCompleted>
  <NumOfSamplesFailed>0</NumOfSamplesFailed>
  <NumOfSamplesExpired>0</NumOfSamplesExpired>
  <NumOfSamplesErrored>0</NumOfSamplesErrored>
  <ProgressStage></ProgressStage>
  <DetailInfoLine1></DetailInfoLine1>
  <DetailInfoLine2></DetailInfoLine2>
  <DetailInfoLine3></DetailInfoLine3>

 

</progress>
2019-06-19_00-12-44.png

 

Phoenix log files:

2019-06-19 00:07:38.1600|Error|Cannot load C:\Users\HASEE\AppData\Local\Temp\Phoenix\DME_PR~1.492\progress.xml 该字符串未被识别为有效的 DateTime。|Application|||||

 

2019-06-19_00-11-01.png

 

I have attached all the files in the Windows directory "C:\Users\*****\AppData\Local\Temp\Phoenix\DME_PredCheck_12-05-50.492__234ae21c-2ae8-46b3-b49d-562946965154" .

 

"该字符串未被识别为有效的" in English is "The string is not recognized as valid"

The reason for the error is because my date is a Chinese character, is it?

 

Who decided the format of the date?
1. Windows where the Phoenix is located?
2.Linux Host?
3.Linux node?

 

Thanks

0521

 

Attached Thumbnails

  • 2019-06-19_00-11-01.png
  • 2019-06-19_00-12-44.png

Attached Files


Edited by 0521, 18 June 2019 - 04:46 PM.


#12 fsoltanshahi

fsoltanshahi

    Newbie

  • Members
  • Pip
  • 4 posts

Posted 18 June 2019 - 06:07 PM

 

Hi Fred,

 

The "progress.xml" in the Windows "%TMP%/Phoenix/DME_xxxxx"  is this:

<progress>
  <MachineName>LocalHost</MachineName>
  <ParallelProtocol>TORQUE</ParallelProtocol>
  <StartTime>6月 2019 18 16时10分22秒</StartTime>
  <EndTime>6月 2019 18 16时10分29秒</EndTime>
  <Status>Finished</Status>
  <NumOfSamples>1</NumOfSamples>
  <NumOfSamplesCompleted>1</NumOfSamplesCompleted>
  <NumOfSamplesFailed>0</NumOfSamplesFailed>
  <NumOfSamplesExpired>0</NumOfSamplesExpired>
  <NumOfSamplesErrored>0</NumOfSamplesErrored>
  <ProgressStage></ProgressStage>
  <DetailInfoLine1></DetailInfoLine1>
  <DetailInfoLine2></DetailInfoLine2>
  <DetailInfoLine3></DetailInfoLine3>

 

</progress>
 

 

Phoenix log files:

2019-06-19 00:07:38.1600|Error|Cannot load C:\Users\HASEE\AppData\Local\Temp\Phoenix\DME_PR~1.492\progress.xml 该字符串未被识别为有效的 DateTime。|Application|||||

 

 

 

I have attached all the files in the Windows directory "C:\Users\*****\AppData\Local\Temp\Phoenix\DME_PredCheck_12-05-50.492__234ae21c-2ae8-46b3-b49d-562946965154" .

 

"该字符串未被识别为有效的" in English is "The string is not recognized as valid"

The reason for the error is because my date is a Chinese character, is it?

 

Who decided the format of the date?
1. Windows where the Phoenix is located?
2.Linux Host?
3.Linux node?

 

Thanks

0521

 

 

 

Progress.xml is created on the Linux side.  The date is acquired in R by calling:

format(as.POSIXlt(Sys.time(), "UTC"),"%b %Y %d %X")

You should be able to change your Linux settings to report date/time in English.

 

Good luck,

 

Fred



#13 0521

0521

    Advanced Member

  • Members
  • PipPipPip
  • 46 posts

Posted 19 June 2019 - 04:46 AM

Progress.xml is created on the Linux side.  The date is acquired in R by calling:

format(as.POSIXlt(Sys.time(), "UTC"),"%b %Y %d %X")

You should be able to change your Linux settings to report date/time in English.

 

Good luck,

 

Fred

 

Thank you very much Fred!
The problem with 1.2, 1.3, 2.1 has been resolved.
 
But there are new problems:
In the case of scenario 2, the "Bootstrap" and "Cov.Srch.Stepwise" modes of operation report an error.

 

 

Scene 2:

2.2 In Bootstrap Run mode, I can submit NLME tasks to Linux. An error occurred during the calculation of Linux (a part of the iteration was completed), and Phoenix gave "Error mesage":

 

Phoenix gave "Error mesage":

---------------------------

Execution Error

---------------------------

There was an error while executing Workflow.Pheno Stdev Covar

 

Model execution failed.

Unable to run bootstrap

See Remote Execution Log for possible explaination

---------------------------

OK  

---------------------------

 

Linux host file

2019-06-19_12-30-03.png

 

/NlmeRemote.LOG:

nohup: ignoring input

 

/usr/bin/R

 

Rscript /mnt/InstallDirNLME/bootstrap.r TORQUE /mnt /mnt/DME_BO~1.818-288 3 1000 10 2 test.mdl cols1.txt data1.txt 28844 nlmeargs.txt nlmeargs.txt test.mdl nlmeargs.txt cols1.txt data1.txt test.mdl 2 95

 

Loading required package: data.table

 

No readable configuration file found

 

Created registry in '/mnt/NLMEd65e1f1dc513/NLMEd65e1e13f372/registry' using cluster functions 'Interactive'

 

No readable configuration file found

 

Created registry in '/mnt/NLMEd65e1f1dc513/NLMEd65e39cd6bf8/registry' using cluster functions 'Interactive'

 

WORKING_DIR=/mnt/NLMEd65e1f1dc513/NLMEd65e39cd6bf8,MPIFLAG=MPINO, LOCAL_HOST=NO,NUM_NODES=1,SHARED_DRIVE=

 

model=test.mdl, nlmeDir=/mnt/InstallDirNLME

 

Deleting files

 

-------------------------------------------------------------

 

--------------------  Translating  --------------------------

 

/mnt/InstallDirNLME/TDL4 /hash 1376165901 /L ./test.mdl ./Work

 

Done

 

-------------------------------------------------------------

 

------------------- Compliling *.c  -------------------------

 

-------------------------------------------------------------

 

----------------------- Linking -----------------------------

 

-------------------------------------------------------------

 

Adding 1 jobs ...

 

Submitting 1 jobs in 1 chunks using cluster functions 'TORQUE' ...

 

Adding 10 jobs ...

 

[1] "Failed to performBootstrap()"

 

[1] "Error is :  Error in chunkIds(reg = gridRegistry, ids = findJobs(reg = gridRegistry), : could not find function \"chunkIds\"\n"

 

          used (Mb) gc trigger (Mb) max used (Mb)

 

Ncells  734206 39.3    1318958 70.5  1318958 70.5

 

Vcells 1373525 10.5    8388608 64.0  2072528 15.9

 

 

I checked the Phoenix log and there was no error log in "Application_Error.txt".

 

 

I attached all the files under the Linux host.

 

 

 

 

2.3 In "Cov.Srch.Stepwise" Run mode, I can submit NLME tasks to Linux. An error occurred during the calculation of Linux, and Phoenix gave "Error mesage":

 

Phoenix gave "Error mesage":

---------------------------

Execution Error

---------------------------

There was an error while executing Workflow.Pheno Stdev Covar

 

Model execution failed.

Failed to run stepwise covariate search

See Remote Execution Log for possible explaination

---------------------------

OK  

---------------------------

 

Linux host file

/NlmeRemote.LOG:

nohup: ignoring input

 

/usr/bin/R

 

Rscript /mnt/InstallDirNLME/stepwise_covarsrch.r TORQUE /mnt/InstallDirNLME /mnt /mnt/DME_SI~1.588-338 test.mdl nlmeargs.txt test.mdl cols1.txt data1.txt nlmeargs.txt 3 V-wt Ke-wt V-apgr  -2LL:1,1,1 0.01 0.001 2 Pheno Stdev Covar

 

Loading required package: data.table

 

No readable configuration file found

 

Created registry in '/mnt/NLME11f757a053d04/NLME11f751f32ab47/registry' using cluster functions 'Interactive'

 

WORKING_DIR=/mnt/NLME11f757a053d04/NLME11f751f32ab47,MPIFLAG=MPINO, LOCAL_HOST=NO,NUM_NODES=1,SHARED_DRIVE=

 

model=test.mdl, nlmeDir=/mnt/InstallDirNLME

 

Deleting files

 

-------------------------------------------------------------

 

--------------------  Translating  --------------------------

 

/mnt/InstallDirNLME/TDL4 /hash 1376170193 /L ./test.mdl ./Work

 

Done

 

-------------------------------------------------------------

 

------------------- Compliling *.c  -------------------------

 

-------------------------------------------------------------

 

----------------------- Linking -----------------------------

 

-------------------------------------------------------------

 

Adding 4 jobs ...

 

<simpleError in chunkIds(reg = gridRegistry, ids = findJobs(reg = gridRegistry),     n.chunks = numberOfChunks): could not find function "chunkIds">

 

Error in get("jobsDirectoryRoot", envir = nlmeEnv) :

 

  object 'jobsDirectoryRoot' not found

 

Calls: print ... performStepwiseCovarSearch -> summarizeStepwiseCovarSearch -> get

 

Execution halted

 

 

 

I checked the Phoenix log and there was no error log in "Application_Error.txt".

 

I attached all the files under the Linux host.

Attached Thumbnails

  • 2019-06-19_12-30-03.png

Attached Files


Edited by 0521, 19 June 2019 - 05:04 AM.


#14 fsoltanshahi

fsoltanshahi

    Newbie

  • Members
  • Pip
  • 4 posts

Posted 19 June 2019 - 01:35 PM

 

Thank you very much Fred!
The problem with 1.2, 1.3, 2.1 has been resolved.
 
But there are new problems:
In the case of scenario 2, the "Bootstrap" and "Cov.Srch.Stepwise" modes of operation report an error.

 

 

Scene 2:

2.2 In Bootstrap Run mode, I can submit NLME tasks to Linux. An error occurred during the calculation of Linux (a part of the iteration was completed), and Phoenix gave "Error mesage":

 

Phoenix gave "Error mesage":

---------------------------

Execution Error

---------------------------

There was an error while executing Workflow.Pheno Stdev Covar

 

Model execution failed.

Unable to run bootstrap

See Remote Execution Log for possible explaination

---------------------------

OK  

---------------------------

 

Linux host file

 

/NlmeRemote.LOG:

nohup: ignoring input

 

/usr/bin/R

 

Rscript /mnt/InstallDirNLME/bootstrap.r TORQUE /mnt /mnt/DME_BO~1.818-288 3 1000 10 2 test.mdl cols1.txt data1.txt 28844 nlmeargs.txt nlmeargs.txt test.mdl nlmeargs.txt cols1.txt data1.txt test.mdl 2 95

 

Loading required package: data.table

 

No readable configuration file found

 

Created registry in '/mnt/NLMEd65e1f1dc513/NLMEd65e1e13f372/registry' using cluster functions 'Interactive'

 

No readable configuration file found

 

Created registry in '/mnt/NLMEd65e1f1dc513/NLMEd65e39cd6bf8/registry' using cluster functions 'Interactive'

 

WORKING_DIR=/mnt/NLMEd65e1f1dc513/NLMEd65e39cd6bf8,MPIFLAG=MPINO, LOCAL_HOST=NO,NUM_NODES=1,SHARED_DRIVE=

 

model=test.mdl, nlmeDir=/mnt/InstallDirNLME

 

Deleting files

 

-------------------------------------------------------------

 

--------------------  Translating  --------------------------

 

/mnt/InstallDirNLME/TDL4 /hash 1376165901 /L ./test.mdl ./Work

 

Done

 

-------------------------------------------------------------

 

------------------- Compliling *.c  -------------------------

 

-------------------------------------------------------------

 

----------------------- Linking -----------------------------

 

-------------------------------------------------------------

 

Adding 1 jobs ...

 

Submitting 1 jobs in 1 chunks using cluster functions 'TORQUE' ...

 

Adding 10 jobs ...

 

[1] "Failed to performBootstrap()"

 

[1] "Error is :  Error in chunkIds(reg = gridRegistry, ids = findJobs(reg = gridRegistry), : could not find function \"chunkIds\"\n"

 

          used (Mb) gc trigger (Mb) max used (Mb)

 

Ncells  734206 39.3    1318958 70.5  1318958 70.5

 

Vcells 1373525 10.5    8388608 64.0  2072528 15.9

 

 

I checked the Phoenix log and there was no error log in "Application_Error.txt".

 

 

I attached all the files under the Linux host.

 

 

 

 

2.3 In "Cov.Srch.Stepwise" Run mode, I can submit NLME tasks to Linux. An error occurred during the calculation of Linux, and Phoenix gave "Error mesage":

 

Phoenix gave "Error mesage":

---------------------------

Execution Error

---------------------------

There was an error while executing Workflow.Pheno Stdev Covar

 

Model execution failed.

Failed to run stepwise covariate search

See Remote Execution Log for possible explaination

---------------------------

OK  

---------------------------

 

Linux host file

/NlmeRemote.LOG:

nohup: ignoring input

 

/usr/bin/R

 

Rscript /mnt/InstallDirNLME/stepwise_covarsrch.r TORQUE /mnt/InstallDirNLME /mnt /mnt/DME_SI~1.588-338 test.mdl nlmeargs.txt test.mdl cols1.txt data1.txt nlmeargs.txt 3 V-wt Ke-wt V-apgr  -2LL:1,1,1 0.01 0.001 2 Pheno Stdev Covar

 

Loading required package: data.table

 

No readable configuration file found

 

Created registry in '/mnt/NLME11f757a053d04/NLME11f751f32ab47/registry' using cluster functions 'Interactive'

 

WORKING_DIR=/mnt/NLME11f757a053d04/NLME11f751f32ab47,MPIFLAG=MPINO, LOCAL_HOST=NO,NUM_NODES=1,SHARED_DRIVE=

 

model=test.mdl, nlmeDir=/mnt/InstallDirNLME

 

Deleting files

 

-------------------------------------------------------------

 

--------------------  Translating  --------------------------

 

/mnt/InstallDirNLME/TDL4 /hash 1376170193 /L ./test.mdl ./Work

 

Done

 

-------------------------------------------------------------

 

------------------- Compliling *.c  -------------------------

 

-------------------------------------------------------------

 

----------------------- Linking -----------------------------

 

-------------------------------------------------------------

 

Adding 4 jobs ...

 

<simpleError in chunkIds(reg = gridRegistry, ids = findJobs(reg = gridRegistry),     n.chunks = numberOfChunks): could not find function "chunkIds">

 

Error in get("jobsDirectoryRoot", envir = nlmeEnv) :

 

  object 'jobsDirectoryRoot' not found

 

Calls: print ... performStepwiseCovarSearch -> summarizeStepwiseCovarSearch -> get

 

Execution halted

 

 

 

I checked the Phoenix log and there was no error log in "Application_Error.txt".

 

I attached all the files under the Linux host.

 

You need to downgrade R library package batchtools from 0.9.11 to 0.9.10.  This is explained in PHX-6979 and is fixed in 8.2.

Fred



#15 0521

0521

    Advanced Member

  • Members
  • PipPipPip
  • 46 posts

Posted 19 June 2019 - 04:13 PM

You need to downgrade R library package batchtools from 0.9.11 to 0.9.10.  This is explained in PHX-6979 and is fixed in 8.2.

Fred

Hi Fred,

 

The problem is solved, thank you very much! ! !

 

Cheers,

0521






2 user(s) are reading this topic

0 members, 2 guests, 0 anonymous users