40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Wed May 20 20:10:34 2020, rana, John Z, Update, Computer Scripts / Programs, NDS2 server / conf updated - seems OK now 
    Reply  Thu May 21 15:31:26 2020, gautam, Update, Computer Scripts / Programs, NDS2 service restarted 
       Reply  Fri May 22 10:37:41 2020, rana, Update, Computer Scripts / Programs, NDS2 service restarted 
          Reply  Mon May 25 10:54:41 2020, rana, Update, Computer Scripts / Programs, NDS2 service restarted 
Message ID: 15346     Entry time: Mon May 25 10:54:41 2020     In reply to: 15345
Author: rana 
Type: Update 
Category: Computer Scripts / Programs 
Subject: NDS2 service restarted 

so far it has run through the weekend with no problems (except that there are huge log files as usual).

I have started to set up monit to run on megatron to watch this process. In principle this would send us alerts when things break and also give a web interface to watch monit. I'm not sure how to do web port forwarding between megatron and nodus, so for now its just on the terminal. e.g.:

monit>sudo monit status
Monit 5.25.1 uptime: 4m

System 'megatron'
  status                       OK
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  load average                 [0.15] [0.22] [0.25]
  cpu                          0.6%us 1.0%sy 0.2%wa
  memory usage                 1001.4 MB [25.0%]
  swap usage                   107.2 MB [1.9%]
  uptime                       40d 17h 55m
  boot time                    Tue, 14 Apr 2020 17:47:49
  data collected               Mon, 25 May 2020 11:43:03

Process 'nds2'
  status                       OK
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  pid                          25007
  parent pid                   1
  uid                          4666
  effective uid                4666
  gid                          4666
  uptime                       3d 1h 22m
  threads                      53
  children                     0
  cpu                          0.0%
  cpu total                    0.0%
  memory                       19.4% [776.1 MB]
  memory total                 19.4% [776.1 MB]
  security attribute           unconfined
  disk read                    0 B/s [2.3 GB total]
  disk write                   0 B/s [17.9 MB total]
  data collected               Mon, 25 May 2020 11:43:03

 

ELOG V3.1.3-