40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log  Not logged in ELOG logo
Entry  Thu Aug 4 19:01:59 2022, Tega, Update, Computers, Front-end machine in supermicro boxes IMG_20220804_184444473.jpgIMG_20220804_191658206.jpgIMG_20220804_185336240.jpgIMG_20220804_185023002.jpg
    Reply  Mon Aug 8 17:16:51 2022, Tega, Update, Computers, Front-end machine setup IMG_20220808_170349717.jpg
       Reply  Wed Aug 10 20:51:14 2022, Tega, Update, Computers, CDS upgrade Front-end machine setup IMG_20220810_171002928.jpgIMG_20220810_171019633.jpg
          Reply  Tue Aug 16 18:22:59 2022, Tega, Update, Computers, c1teststand rack mounting for CDS upgrade IMG_20220816_180157132.jpgIMG_20220816_175125874.jpg
             Reply  Wed Aug 17 11:10:51 2022, rana, Update, Computers, c1teststand rack mounting for CDS upgrade 
             Reply  Mon Aug 22 19:02:15 2022, Tega, Update, Computers, c1teststand rack mounting for CDS upgrade II IMG_20220822_185437763.jpgIMG_20220822_131340214.jpgc1bhd.jpegc1lsc.jpeg
                Reply  Tue Aug 23 22:30:24 2022, Tega, Update, Computers, c1teststand OS upgrade - I 
                Reply  Fri Aug 26 14:05:09 2022, Tega, Update, Computers, rack reshuffle proposal for CDS upgrade 6x
                   Reply  Sun Aug 28 23:14:22 2022, Jamie, Update, Computers, rack reshuffle proposal for CDS upgrade 
                      Reply  Mon Aug 29 15:15:46 2022, Tega, Update, Computers, 3 FEs from LLO got delivered today IMG_20220829_145533452.jpgIMG_20220829_144801365.jpg
                         Reply  Tue Aug 30 15:21:27 2022, Tega, Update, Computers, 3 FEs from LHO got delivered today IMG_20220830_144925325.jpgIMG_20220830_142307495.jpgIMG_20220830_143059443.jpg
                   Reply  Mon Sep 19 20:21:06 2022, Tega, Update, Computers, 1X7 and 1X6 work IMG_20220919_204013819.jpgIMG_20220919_203541114.jpgIMG_20220919_203458952.jpg
                      Reply  Tue Sep 20 23:06:23 2022, Tega, Update, Computers, Setup the 6 new front-ends to boot off the FB1 clone IMG_20220921_084220465.jpgdolphin_err_init_state.pngdolphin_final_state.png
                         Reply  Wed Sep 21 17:16:14 2022, Tega, Update, Computers, Setup the 6 new front-ends to boot off the FB1 clone 
                            Reply  Thu Sep 22 20:57:16 2022, Tega, Update, Computers, build, install and start 40m models on teststand  dolphin_state_plus_c1shimmer.pngFE_status_overview.png
                               Reply  Fri Sep 23 19:07:03 2022, Tega, Update, Computers, Work to improve stability of 40m models running on teststand  dolphin_state_all_green.pngdolphin_state_IPC_glitch.png
                            Reply  Thu Sep 29 15:12:02 2022, JC, Update, Computers, Setup the 6 new front-ends to boot off the FB1 clone 
                               Reply  Tue Oct 4 21:00:49 2022, Chris, Update, Computers, Failed takeover attempt with the new front ends 
                                  Reply  Thu Oct 6 07:29:30 2022, Chris, Update, Computers, Successful takeover attempt with the new front ends 
Message ID: 17148     Entry time: Tue Sep 20 23:06:23 2022     In reply to: 17144     Reply to this: 17151
Author: Tega 
Type: Update 
Category: Computers 
Subject: Setup the 6 new front-ends to boot off the FB1 clone 

[Tega, Radhika, JC]

We wired the front-ends for power, DAQ and martian network connections. Then moved the I/O chassis from the buttom of the rack to the middle just above the KVM switch so we can leave the top og the I/O chassis open for access to the ports of OSS target adapter card for testing the extension fiber cables.

Attachment 1 (top -> bottom)

c1sus2

c1iscey

c1iscex

c1ioo

c1sus

c1lsc


When I turned on the test stand with the new front-ends, after a few minutes, the power to 1x7 was cut off due to overloading I assume. This brought down nodus, chiara and FB1. After Paco reset the tripped switch, everything came back without us actually doing anything, which is an interesting observation.


After this event, I moved the test stand power plug to the side wall rail socket. This seems fine so far. I then brought chiara (clone) and FB1 (clone) online. Here are some changes I made to get things going:

Chiara (clone)

  • Edited '/etc/dhcp/dhcpd.conf' to update the MAC address of the front-ends to match the new machines, then run
  • sudo service isc-dhcp-server restart
  • then restart front-ends
  • Edited '/etc/hosts' on chiara to include c1iscex and c1iscey as these were missing

 

FB1 (clone)

Getting the new front-ends booting off FB1 clone:

1. I found that the KVM screen was flooded with setup info about the dolphin cards on the LLO machines. This actually prevented login using the KVM switch for two of these machines.  Strangely, one of them 'c1sus' seemed to be fine, see attachment 2, so I guessed this was bcos the dolphin network was already configured earlier when we were testing the dolphin communications. So I decided to configure the remaining dolphin cards. To do so, we do the following

Dolphin Configuration:

1a. Ideally running

sudo /opt/DIS/sbin/dis_mkconf -fabrics 1 -sclw 8 -stt 1 -nodes c1lsc c1sus c1ioo c1iscex c1iscey c1sus2 -nosessions

should set up all the nodes, but this did not happen. In fact, I could no longer use the '/opt/DIS/sbin/dis_admin' GUI after running this operation and restarting the 'dis_networkmgr.service' via

sudo systemctl restart dis_networkmgr.service

so  I logged into each front-end and configured the dolphin adapter there using

sudo /opt/DIS/sbin/dis_config

After which I shut down FB1 (clone) bcos restarting it earlier didn't work, I waited a few minutes and then started it.  Everything was fine afterward, although I am not quite sure what solved the issue as I tried a few things and I was glad to see the problem go!

1b. I later found after configuring all the dolphin nodes that 2 of them failed the '/opt/DIS/sbin/dis_diag' test with an error message suggesting three possible issues of which one was 'faulty cable'. I looked at the units in question and found that swapping both cables with the remaining spares solved the problem. So it seems like these cables are faulty (need to double-check this). Attachment 3 shows the current state of the dolphin nodes on the front-ends and the dolphin switch.


2. I noticed that the NFS mount service for the mount points '/opt/rtcds' and '/opt/rtapps' in /etc/fstab exited with an error, so I ran 

sudo mount -a

3. edit '/etc/hosts' to include c1iscex and c1iscey as these were missing

 

Front-ends

To test the PCIe extension fiber cables that connect the front-ends to their respective I/O chassis, we run the following command (after booting the machine with the cable connected): 

controls@c1lsc:~$ lspci -vn | grep 10b5:3
    Subsystem: 10b5:3120
    Subsystem: 10b5:3101

If we see the output above, then both the cable and OSS card are fine (We know from previous tests that the OSS card on the I/O chassis is good). Since we only have one I/O chassis, we repeat the step above 8 times, also cycling through the six new front-end as we go so that we are also testing the installed OSS host adapter cards. I was able to test 4 cables and 4 OSS host cards (c1lsc, c1sus, c1ioo, c1sus2), but the remaining results were inconclusive (i.e. it seems to suggest that 3 out of the remaining 5 fiber cables are faulty, which in itself would be considered unfortunate but I found the reliability if the test to be in question when I went back to test the functionality to the 2 remaining OSS host cards using a cable that passed the test earlier and it didn't pass. After a few retries, I decided to call it a day b4 I lose my mind) and need to be redone again tomorrow.

 

Note: We were unable to lay the cables today bcos these tests were not complete, so we are a bit behind the plan. Would see if we can catch up tomorrow.

 

Quote:

Plan for the remainder of the week

Tuesday

  • Setup the 6 new front-ends to boot off the FB1 clone.
  • Test PCIe I/O cables by connecting them btw the front-ends and teststand I/O chassis one at a time to ensure they work
  • Then lay the fiber cables to the various I/O chassis.

 

Attachment 1: IMG_20220921_084220465.jpg  4.248 MB  Uploaded Wed Sep 21 09:45:49 2022  | Hide | Hide all
IMG_20220921_084220465.jpg
Attachment 2: dolphin_err_init_state.png  78 kB  Uploaded Wed Sep 21 09:46:43 2022  | Hide | Hide all
dolphin_err_init_state.png
Attachment 3: dolphin_final_state.png  51 kB  Uploaded Wed Sep 21 09:46:50 2022  | Hide | Hide all
dolphin_final_state.png
ELOG V3.1.3-