40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Thu Aug 4 19:01:59 2022, Tega, Update, Computers, Front-end machine in supermicro boxes IMG_20220804_184444473.jpgIMG_20220804_191658206.jpgIMG_20220804_185336240.jpgIMG_20220804_185023002.jpg
    Reply  Mon Aug 8 17:16:51 2022, Tega, Update, Computers, Front-end machine setup IMG_20220808_170349717.jpg
       Reply  Wed Aug 10 20:51:14 2022, Tega, Update, Computers, CDS upgrade Front-end machine setup IMG_20220810_171002928.jpgIMG_20220810_171019633.jpg
          Reply  Tue Aug 16 18:22:59 2022, Tega, Update, Computers, c1teststand rack mounting for CDS upgrade IMG_20220816_180157132.jpgIMG_20220816_175125874.jpg
             Reply  Wed Aug 17 11:10:51 2022, rana, Update, Computers, c1teststand rack mounting for CDS upgrade 
             Reply  Mon Aug 22 19:02:15 2022, Tega, Update, Computers, c1teststand rack mounting for CDS upgrade II IMG_20220822_185437763.jpgIMG_20220822_131340214.jpgc1bhd.jpegc1lsc.jpeg
                Reply  Tue Aug 23 22:30:24 2022, Tega, Update, Computers, c1teststand OS upgrade - I 
                Reply  Fri Aug 26 14:05:09 2022, Tega, Update, Computers, rack reshuffle proposal for CDS upgrade 6x
                   Reply  Sun Aug 28 23:14:22 2022, Jamie, Update, Computers, rack reshuffle proposal for CDS upgrade 
                      Reply  Mon Aug 29 15:15:46 2022, Tega, Update, Computers, 3 FEs from LLO got delivered today IMG_20220829_145533452.jpgIMG_20220829_144801365.jpg
                         Reply  Tue Aug 30 15:21:27 2022, Tega, Update, Computers, 3 FEs from LHO got delivered today IMG_20220830_144925325.jpgIMG_20220830_142307495.jpgIMG_20220830_143059443.jpg
                   Reply  Mon Sep 19 20:21:06 2022, Tega, Update, Computers, 1X7 and 1X6 work IMG_20220919_204013819.jpgIMG_20220919_203541114.jpgIMG_20220919_203458952.jpg
                      Reply  Tue Sep 20 23:06:23 2022, Tega, Update, Computers, Setup the 6 new front-ends to boot off the FB1 clone IMG_20220921_084220465.jpgdolphin_err_init_state.pngdolphin_final_state.png
                         Reply  Wed Sep 21 17:16:14 2022, Tega, Update, Computers, Setup the 6 new front-ends to boot off the FB1 clone 
                            Reply  Thu Sep 22 20:57:16 2022, Tega, Update, Computers, build, install and start 40m models on teststand  dolphin_state_plus_c1shimmer.pngFE_status_overview.png
                               Reply  Fri Sep 23 19:07:03 2022, Tega, Update, Computers, Work to improve stability of 40m models running on teststand  dolphin_state_all_green.pngdolphin_state_IPC_glitch.png
                            Reply  Thu Sep 29 15:12:02 2022, JC, Update, Computers, Setup the 6 new front-ends to boot off the FB1 clone 
                               Reply  Tue Oct 4 21:00:49 2022, Chris, Update, Computers, Failed takeover attempt with the new front ends 
                                  Reply  Thu Oct 6 07:29:30 2022, Chris, Update, Computers, Successful takeover attempt with the new front ends 
Message ID: 17173     Entry time: Thu Oct 6 07:29:30 2022     In reply to: 17172
Author: Chris 
Type: Update 
Category: Computers 
Subject: Successful takeover attempt with the new front ends 

[JC, Chris]

Last night’s CDS upgrade attempt succeeded in taking over the IFO. If the IFO users are willing, let’s try to run with it today.

The new system was left in control of the IFO hardware overnight, to check its stability. All looks OK so far.

The next step will be to connect the new FEs, fb1, and chiara to the martian network, so they’re directly accessible from the control room workstations (currently the system remains quarantined on the teststand network). We’ll also need to bring over the changes to models, scripts, etc that have been made since Tega’s last sync of the filesystem on chiara.

The previous elog noted a mysterious broken state of the OneStop link between FE and IO chassis, where all green LEDs light up on the OneStop board in the IO chassis, except the four next to the fiber link connector. This was seen on c1sus and c1iscex. It was recoverable last night on c1iscex, by fully powering down both FE computer and chassis, waiting a bit, and then powering up chassis and computer again. Currently c1sus is running with a copper OneStop cable because of the fiber link troubles we had, but this procedure should be tried to see if one of the fiber links can be made to work after all.

In order to string the short copper OneStop cable for c1sus, we had to move the teststand rack closer to the IO chassis, up against the back of 1X6/1X7. This is a temporary state while we prepare to move the FEs to their new rack. It hopefully also allows sufficient clearance to the exit door to pass the upcoming fire inspection.

At first, we connected the teststand rack’s power cables to the receptacle in 1X7, but this eventually tripped 1X7’s circuit breaker in the wall panel. Now, half of the teststand rack is on the receptacle in 1X6, and the other half is on 1X7 (these are separate circuits).

After the breaker trip, daqd couldn’t start. It turned out that no data was flowing to it, because the power cycle caused the DAQ network switch to forget a setting I had applied to enable jumbo frames on the network. The configuration has now been saved so that it should apply automatically on future restarts. For future reference, the web interface of this switch is available by running firefox on fb1 and navigating to 10.0.113.254.

When the FE machines are restarted, a GPS timing offset in /sys/kernel/gpstime/offset sometimes fails to initialize. It shows up as an incorrect GPS time in /proc/gps and on the GDS_TP MEDM screens, and prevents the data from getting timestamped properly for the DAQ. This needs to be looked at and fixed soon. In the meantime, it can be worked around by setting the offset manually: look at the value on one of the FEs that got it right, and apply it using sudo sh -c "echo CORRECT_OFFSET >/sys/kernel/gpstime/offset".

In the first ~30 minutes after the system came up last night, there were transient IPC errors, caused by drifting timestamps while the GPS cards in the FEs got themselves resynced to the satellites. Since then, timing has remained stable, and no further errors occurred overnight. However, the timing status is still reported as red in the IOP state vectors. This doesn’t seem to be an operational problem and perhaps can be ignored, but we should check it out later to make sure.

Also, the DAC cards in c1ioo and c1iscey reported FIFO EMPTY errors, triggering their DACKILL watchdogs. This situation may have existed in the old system and gone undetected. To bypass the watchdog, I’ve added the optimizeIO=1 flag to the IOP models on those systems, which makes them skip the empty FIFO check. This too should be further investigated when we get a chance.

ELOG V3.1.3-