Patch for using several TSI from one XNJS

View: New views
2 Messages — Rating Filter:   Alert me  

Patch for using several TSI from one XNJS

by Clement COUSSIRAT :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Bernd,

I have found some errors in the TSI code corrected by
"bugfix-tsi.patch". There are 2 typos in the startup script: a if
syntax error and an unexpected quote mark.  There is also an error in
Initialisation.pm, the regexp matching the port number on the socket
doesn't match. The first part of the regexp: '[\w+]' match an
alphanumeric character or '+' rather than a word '\w+'.


I have made an other patch for the XNJS. It was based on my previous
patch but I has been adapted for the actual trunk.
"xnjs-core-multipleTSI.patch" allows to specifies a space separated
TSI hosts list in xnjs_legacy.xml like that:

<eng:Property name="CLASSICTSI.machine" value="TSI1 TSI2 TSI3"/>

The TSI's addresses are stored in a pool, each new TSIConnection is
created on a random TSI from this pool using a roundrobin
algorithm. All requests may be executed on different TSI so they must
use a shared filespace and a batch scheduler. For example, the
TSI_PUTFILES command tails data on the file transferred. By default
this command will be executed to store only 1MB so it will be executed
1000 times on different TSI to transfer a 1GB file. The submitting and
getting status of a job don't cause problems except for the NO_BATCH
scheduler. This problem is caused by $main::qstat_cmd and
$main::pspid_cmd commands which use the system process list. This
behaviour can easily be solved by using ps commands throw ssh like it:

$main::qstat_cmd = "ssh unicore\@TSI1 ps -e -os,args; ssh unicore\@TSI2
ps -e -os,args";
$main::pspid_cmd = "ps -e -opid,args";

This patch also corrects a bug in plain socket transfer where the
listening socket isn't opened and TSI connections are refused.




Regards,
Clément.


Index: src/main/java/de/fzj/unicore/xnjs/legacy/TSISocketFactory.java
===================================================================
--- src/main/java/de/fzj/unicore/xnjs/legacy/TSISocketFactory.java (revision 5556)
+++ src/main/java/de/fzj/unicore/xnjs/legacy/TSISocketFactory.java (working copy)
@@ -49,7 +49,7 @@
  }
 
  protected void createPlainServer()throws IOException{
- server=new ServerSocket();
+ server=new ServerSocket(myPort);
  }
 
  protected void createSSLServer()throws Exception{
Index: src/main/java/de/fzj/unicore/xnjs/legacy/TSIConnectionFactory.java
===================================================================
--- src/main/java/de/fzj/unicore/xnjs/legacy/TSIConnectionFactory.java (revision 5556)
+++ src/main/java/de/fzj/unicore/xnjs/legacy/TSIConnectionFactory.java (working copy)
@@ -61,6 +61,7 @@
 
  private final List<TSIConnection> pool=new ArrayList<TSIConnection>();
  private InetAddress source_addr=null;
+ private final List<InetAddress> source_addr_pool=new ArrayList<InetAddress>();
 
  private TSISocketFactory server=null;
 
@@ -194,6 +195,10 @@
  }
 
  private void signalShepherd(String message) throws Exception {
+ // get an other TSI from the pool
+ source_addr = source_addr_pool.remove(0);
+ source_addr_pool.add(source_addr);
+
  // Signal TSID that we want a new TSI process
  if(log.isDebugEnabled()){
  log.debug("Signalling TSI at "+source_addr+":"+port
@@ -237,8 +242,14 @@
  port=Integer.parseInt(portS);
  String replyportS=getConfiguration().getProperty(TSI_MYPORT);
  replyport=Integer.parseInt(replyportS);
- source_addr = InetAddress.getByName(machine);
+ String [] list = machine.split(" ");
 
+ // parse machine to extract TSI addresses
+ for(int i = 0; i < list.length; ++i) {
+ source_addr = InetAddress.getByName(list[i]);
+ source_addr_pool.add(source_addr);
+ }
+
  bssuser=getConfiguration().getProperty(TSI_BSSUSER);
 
  log.info("\"Legacy TSI\" connection factory starting:\n" +

Index: trunk/tsi/SHARED/Initialisation.pm
===================================================================
--- trunk/tsi/SHARED/Initialisation.pm (revision 5556)
+++ trunk/tsi/SHARED/Initialisation.pm (working copy)
@@ -145,7 +145,7 @@
                                 if (!$njs_port) {
                                     # if $njs_port keeps invalid, try to read it from the NJS
                                     initial_report("Reading NJS port from NJS message");
-                                    $message =~ /^[\w+]\s(\w+)/;
+                                    $message =~ /^\w+\s(\w+)/;
                                     $njs_port = $1;
                                     # if NJS sends a name, try to get port with /etc/services
                                     if($njs_port =~ /\D/) {$njs_port = getservbyname($njs_port, 'tcp')};
Index: trunk/bin/start_tsi
===================================================================
--- trunk/bin/start_tsi (revision 5556)
+++ trunk/bin/start_tsi (working copy)
@@ -167,8 +167,8 @@
   if [ "${TRUSTSTORE}" != "" ]
   then
     echo "Found Truststore File $TRUSTSTORE"
-  done
-done
+  fi
+fi
 echo ""
 
 date=`date +_%Y_%m_%d`
@@ -187,7 +187,7 @@
   echo "perl -d $TSI/tsi $NJS_HOST $NJS_PORT $MY_PORT $KEYSTORE $TRUSTSTORE"
  perl -d $TSI/tsi $NJS_HOST $NJS_PORT $MY_PORT $KEYSTORE $TRUSTSTORE
 else
-  echo "nohup perl $TSI/tsi $NJS_HOST $NJS_PORT $MY_PORT $KEYSTORE $TRUSTSTORE" > $tsilog 2>&1 &"
+  echo "nohup perl $TSI/tsi $NJS_HOST $NJS_PORT $MY_PORT $KEYSTORE $TRUSTSTORE" > $tsilog 2>&1 &
         nohup perl "$TSI/tsi" "$NJS_HOST" "$NJS_PORT" "$MY_PORT" "$KEYSTORE" "$TRUSTSTORE"> $tsilog 2>&1 &
   echo $! >> $TSI_CONF/LAST_TSI_PIDS
 

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Unicore-devel mailing list
Unicore-devel@...
https://lists.sourceforge.net/lists/listinfo/unicore-devel

Re: Patch for using several TSI from one XNJS

by Bernd Schuller :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hi Clément,

the multi-tsi support is nice stuff, and thanks for spotting the typos!
Both are committed to SVN.

Best regards.
Bernd.

On Di, 2009-10-20 at 18:11 +0200, Clement COUSSIRAT wrote:

> Hello Bernd,
>
> I have found some errors in the TSI code corrected by
> "bugfix-tsi.patch". There are 2 typos in the startup script: a if
> syntax error and an unexpected quote mark.  There is also an error in
> Initialisation.pm, the regexp matching the port number on the socket
> doesn't match. The first part of the regexp: '[\w+]' match an
> alphanumeric character or '+' rather than a word '\w+'.
>
>
> I have made an other patch for the XNJS. It was based on my previous
> patch but I has been adapted for the actual trunk.
> "xnjs-core-multipleTSI.patch" allows to specifies a space separated
> TSI hosts list in xnjs_legacy.xml like that:
>
> <eng:Property name="CLASSICTSI.machine" value="TSI1 TSI2 TSI3"/>
>
> The TSI's addresses are stored in a pool, each new TSIConnection is
> created on a random TSI from this pool using a roundrobin
> algorithm. All requests may be executed on different TSI so they must
> use a shared filespace and a batch scheduler. For example, the
> TSI_PUTFILES command tails data on the file transferred. By default
> this command will be executed to store only 1MB so it will be executed
> 1000 times on different TSI to transfer a 1GB file. The submitting and
> getting status of a job don't cause problems except for the NO_BATCH
> scheduler. This problem is caused by $main::qstat_cmd and
> $main::pspid_cmd commands which use the system process list. This
> behaviour can easily be solved by using ps commands throw ssh like it:
>
> $main::qstat_cmd = "ssh unicore\@TSI1 ps -e -os,args; ssh unicore\@TSI2
> ps -e -os,args";
> $main::pspid_cmd = "ps -e -opid,args";
>
> This patch also corrects a bug in plain socket transfer where the
> listening socket isn't opened and TSI connections are refused.
>
>
>
>
> Regards,
> Clément.
>
--
Dr. Bernd Schuller
Distributed Systems and Grid Computing
Juelich Supercomputing Centre, http://www.fz-juelich.de/jsc
Phone: +49 246161-8736 (fax -8556)
Personal blog: www.jroller.com/page/gridhaus


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzende des Aufsichtsrats: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Unicore-devel mailing list
Unicore-devel@...
https://lists.sourceforge.net/lists/listinfo/unicore-devel