40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 225 of 341  Not logged in ELOG logo
ID Date Author Type Categorydown Subject
  16002   Tue Apr 6 21:17:04 2021 KojiSummaryGeneralPSL HEPA investigation

- Last week we found both of the PSL HEPA units were not running.

- I replaced the capacitor of the north unit, but it did not solve the issue. (Note: I reverted the cap back later)
- It was found that the fans ran if the variac was removed from the chain.
- But I'm not certain that we can run the fans in this configuration with no attendance considering fire hazard.

@3AM: UPON LEAVING the lab, I turned off the HEPA. The AC cable was not warm, so it's probably OK, but we should wait for the continuous operation until we replace the scorched AC cable.


The capacitor replacement was not successful. So, the voltages on the fan were checked more carefully. The fan has the three switch states (HIGH/OFF/LOW). If there is no load (SW: OFF), the variac out was as expected. When the load was LOW or HIGH, it looked as if the motor is shorted (i.e. no voltage difference between two wires).

I thought the motors may have been shorted. But if the load resistance was measured with the fluke meter, it showed some resistance

- North Unit: SW LOW 4.6Ohm / HIGH 6.0Ohm
- South Unit: SW LOW 6.0Ohm / HIGH 4.6Ohm (I believe the internal connection is incorrect here)

I believed the motors are alive! Then the fans were switched on with the variac removed... they ran. So I set the switch LOW for the north unit and HIGH for the south unit.

Then I inspected the variac:

  • The AC output has some liquid leaking (oil?) (Attachment 1)
  • The AC plug on the variac out has a scorch mark (Attachments 2/3)

So, this scorched AC plug/cable connected directly to the AC right now. I'm not 100% confident about the safety of this configuration.
Also I am not sure what was wrong with the system.

  • Has the variac failed first? Because of the heat? I believe that the HEPA was running @30% most of the time. Maybe the damage was already there at the failure in Nov 2020?
  • Or has the motor stopped at some point and this made the variac failed?
  • Was the cable bad and the heat made the variac failed (then the problem is still there).

So, while I'm in the lab today, I'll keep the HEPA running, but upon my taking off, I'll turn it off. We'll discuss what to do in the meeting tomorrow.

 

Attachment 1: 20210406211741_IMG_0554.jpeg
20210406211741_IMG_0554.jpeg
Attachment 2: 20210406211840_IMG_0555.jpeg
20210406211840_IMG_0555.jpeg
Attachment 3: 20210406211850_IMG_0556.jpeg
20210406211850_IMG_0556.jpeg
  16025   Wed Apr 14 12:27:10 2021 gautamUpdateGeneralLab left open again

Once again, I found the door to the outside in the control room open when I came in ~1215pm. I closed it.

  16026   Wed Apr 14 13:12:13 2021 AnchalUpdateGeneralSorry, it was me

Sorry about that. It must be me. I'll make sure it doesn't happen again. I was careless to not check back, no further explanation.indecision

  16028   Wed Apr 14 14:52:42 2021 gautamUpdateGeneralIFO State

The C1:IFO-STATE variable is actually a bunch (16 to be precise) of bits, and the byte they form (2 bytes) converted to decimal is what is written to the EPICS channel. It was reported on the call today that the nominal value of the variable when the IMC is locked was "8", while it has become "10" today. In fact, this has nothing to do with the IMC. You can see that the "PMC locked" bit is set in Attachment #1. This is done in the AutoLock.sh PMC autolocker script, which was run a few days ago. Nominally, I just lock the PMC by moving some sliders, and I neglect to set/unset this bit.

Basically, there is no anomalous behavior. This is not to say that the situation cannot be improved. Indeed, we should get rid of the obsolete states (e.g. FSS Locked, MZ locked), and add some other states like "PRMI locked". While there is nothing wrong with setting these bits at the end of execution of some script, a better way would be to configure the EPICS record to automatically set / unset itself based on some diagnostic channels. For example, the "PMC locked" bit should be set if (i) the PMC REFL is < 0.1 AND (ii) PMC TRANS is >0.65 (the exact thresholds are up for debate). Then we are truly recording the state of the IFO and not relying on some script to write to the bit (I haven't thoguht through if there are some edge cases where we need an unreasonable number of diagnostic channels to determine if we are in a certain state or not).

Attachment 1: IFOSTATE.png
IFOSTATE.png
  16029   Wed Apr 14 15:30:29 2021 ranaUpdateGeneralSorry, it was me

Maybe tighten the tensioner on the door closer so that it closes by itself even in the low velocity case. Or maybe just use the front door like everyone else?

  16030   Wed Apr 14 16:46:24 2021 AnchalUpdateGeneralIFO State

That makes sense. I assumed that IFO-STATE is configured as you have proposed it to be configured. This could be implemented in later.

Quote:
 

a better way would be to configure the EPICS record to automatically set / unset itself based on some diagnostic channels. For example, the "PMC locked" bit should be set if (i) the PMC REFL is < 0.1 AND (ii) PMC TRANS is >0.65 (the exact thresholds are up for debate). Then we are truly recording the state of the IFO and not relying on some script to write to the bit (I haven't thoguht through if there are some edge cases where we need an unreasonable number of diagnostic channels to determine if we are in a certain state or not).

 

  16039   Fri Apr 16 00:21:52 2021 KojiUpdateGeneralGlue Freezer completely frozen

I was looking at the laser head/amp and somehow decided to open the glue freezer. And it was stuck. I've managed to open it but the upper room was completely frozen.
Some of the batteries were embedded in a block of ice. I think we should throw them out.

Can the person who comes in the morning work on defrosting?

- Coordinate with Yehonathan and move the amps and the wooden crate so that you can move the freezer.

- Remove the contents to somewhere (it's OK to be room temp for a while)

- Unplug the freezer

- Leave the freezer outside with the door open. After a while, the ice will fall without care.

- At the end of the day, move it back to the lab. Continue defrosting the other day if the ice remains.

Attachment 1: P_20210416_000906.jpg
P_20210416_000906.jpg
Attachment 2: P_20210416_000850.jpg
P_20210416_000850.jpg
  16040   Fri Apr 16 10:58:16 2021 YehonathanUpdateGeneralGlue Freezer completely frozen

{Paco, Anchal, Yehonathan}

We emptied the fridge and moved the amplifier equipment on top of the amplifier crate. We unplugged the freezer and moved it out of the lab to defrost (attachment).

Quote:

I was looking at the laser head/amp and somehow decided to open the glue freezer. And it was stuck. I've managed to open it but the upper room was completely frozen.
Some of the batteries were embedded in a block of ice. I think we should throw them out.

Can the person who comes in the morning work on defrosting?

- Coordinate with Yehonathan and move the amps and the wooden crate so that you can move the freezer.

- Remove the contents to somewhere (it's OK to be room temp for a while)

- Unplug the freezer

- Leave the freezer outside with the door open. After a while, the ice will fall without care.

- At the end of the day, move it back to the lab. Continue defrosting the other day if the ice remains.

 

Attachment 1: 20210416_105048.jpg
20210416_105048.jpg
  16045   Fri Apr 16 19:07:31 2021 YehonathanUpdateGeneralGlue Freezer completely frozen

There is still a huge chunk of unmelted ice in the fridge. I moved the content of that fridge in the main fridge and put "do not eat" warning signs.

I returned the fridge to the lab and plugged it back in to prevent flooding.

Defrosting will have to continue on Monday.

  16048   Mon Apr 19 10:52:27 2021 YehonathanUpdateGeneralGlue Freezer completely frozen

{Anchal, Paco, Yehonathan}

We took the glue fridge outside.

  16051   Mon Apr 19 19:40:54 2021 YehonathanUpdateGeneralGlue Freezer completely frozen

{Paco, Yehonathan}

We broke the last chunk of ice and cleaned the fridge. We move the fridge back inside and plugged it into the wall. The glues were moved back from the main fridge.

The batteries that were found soaking wet are now somewhat dry and were left on the cabinet drawers for future recycling.

Quote:

{Anchal, Paco, Yehonathan}

We took the glue fridge outside.

 

  16058   Wed Apr 21 05:48:47 2021 ChubUpdateGeneralPSL HEPA Maintenance

Yikes!  That's ONE filter.  I'll get another from storage.

  16065   Wed Apr 21 13:10:12 2021 KojiUpdateGeneralPSL HEPA Maintenance

It's probably too late to say but there are/were two boxes. (just for record)

 

  16113   Mon May 3 18:59:58 2021 AnchalSummaryGeneralWeird gas leakagr kind of noise in 40m control room

For past few days, a weird sound of decaying gas leakage comes in the 40m control room from the south west corner of ceiling. Attached is an audio capture. This comes about every 10 min or so. 

Attachment 1: 40mNoiseFinal.mp3
  16115   Mon May 3 23:28:56 2021 KojiSummaryGeneralWeird gas leakagr kind of noise in 40m control room

I also noticed some sound in the control room. (didn't open the MP3 yet)

I'm afraid that the hard disk in the control room iMac is dying.

 

  16119   Tue May 4 19:14:43 2021 YehonathanUpdateGeneralOSEMs from KAGRA

I put the box containing the untested OSEMs from KAGRA near the south flow bench on the floor.

  16121   Wed May 5 13:05:07 2021 ChubUpdateGeneralchassis delivery from De Leone

Assembled chassis from De Leone placed in the 40 Meter Lab, along the west wall and under the display pedestal table.  The leftover parts are in smaller Really Useful boxes, also on the parts pile along the west wall.

Attachment 1: de_leone_del_5-5-21.jpg
de_leone_del_5-5-21.jpg
  16156   Mon May 24 10:19:54 2021 PacoUpdateGeneralZita IOO strip

Updated IOO.strip on Zita to show WFS2 pitch and yaw trends (C1:IOO-WFS2_PIY_OUT16 and C1:IOO-WFS2_YAW_OUT16) and changed the colors slightly to have all pitch trends in the yellow/brown band and all yaw trends in the pink/purple band.

No one says, "Here I am attaching a cool screenshot, becuz else where's the proof? Am I right or am I right?"

Mon May 24 18:10:07 2021 [Update]

After waiting for some traces to fill the screen, here is a cool screenshot (Attachment 1). At around 2:30 PM the MC unlocked, and the BS_Z (vertical) seismometer readout jumped. It has stayed like this for the whole afternoon... The MC eventually caught its lock and we even locked XARM without any issue, but something happened in the 10-30 Hz band. We will keep an eye on it during the evening...

Tue May 25 08:45:33 2021 [Update]

At approximately 02:30 UTC (so 07:30 PM yesterday) the 10-30 Hz seismic step dropped back... It lasted 5 hours, mostly causing BS motion along Z (vertical) as seen by the minute trend data in Attachment 2. Could the MM library have been shaking? Was the IFO snoring during its afternoon nap?

Attachment 1: Screenshot_from_2021-05-24_18-09-37.png
Screenshot_from_2021-05-24_18-09-37.png
Attachment 2: 24and25_05_2021_PEM_BS_10_30.png
24and25_05_2021_PEM_BS_10_30.png
  16206   Wed Jun 16 19:34:18 2021 KojiUpdateGeneralHVAC

I made a flow sensor with a stick and tissue paper to check the airflow.

- The HVAC indicator was not lit, but it was just the blub problem. The replacement bulb is inside the gray box.

- I went to the south arm. There are two big vent ducts for the outlets and intakes. Both are not flowing the air.
  The current temp at 7pm was ~30degC. Max and min were 31degC and 18degC.

- Then I went to the vertex and the east arm. The outlets and intakes are flowing.

Attachment 1: HVAC_Power.jpeg
HVAC_Power.jpeg
Attachment 2: South_Arm.jpeg
South_Arm.jpeg
Attachment 3: South_End_Tenperature.jpeg
South_End_Tenperature.jpeg
Attachment 4: Vertex.jpeg
Vertex.jpeg
Attachment 5: East_Arm.jpeg
East_Arm.jpeg
  16234   Thu Jul 1 11:37:50 2021 PacoUpdateGeneralrestarted c0rga

Physically rebooted c0rga workstation after failing to ssh into it (even as it was able to ping into it...) the RGA seems to be off though. The last log with data on it appears to date back to 2020 Nov 10, but reasonable spectra don't appear until before 11-05 logs. Gautam verified that the RGA was intentionally turned off then.

  16240   Tue Jul 6 17:40:32 2021 KojiSummaryGeneralLab cleaning

We held the lab cleaning for the first time since the campus reopening (Attachment 1).
Now we can use some of the desks for the people to live! Thanks for the cooperation.

We relocated a lot of items into the lab.

  • The entrance area was cleaned up. We believe that there is no 40m lab stuff left.
    • BHD BS optics was moved to the south optics cabinet. (Attachment 2)
    • DSUB feedthrough flanges were moved to the vacuum area (Attachment 3)
  • Some instruments were moved into the lab.
    • The Zurich instrument box
    • KEPCO HV supplies
    • Matsusada HV supplies
  • We moved the large pile of SUPERMICROs in the lab. They are around MC2 while the PPE boxes there were moved behind the tube around MC2 area. (Attachment 4)
  • We have moved PPE boxes behind the beam tube on XARM behind the SUPERMICRO computer boxes. (Attachment 7)
  • ISC/WFS left over components were moved to the pile of the BHD electronics.
    • Front panels (Attachment 5)
    • Components in the boxes (Attachment 6)

We still want to make some more cleaning:

- Electronics workbenches
- Stray setup (cart/wagon in the lab)
- Some leftover on the desks
- Instruments scattered all over the lab
- Ewaste removal

Attachment 1: P_20210706_163456.jpg
P_20210706_163456.jpg
Attachment 2: P_20210706_161725.jpg
P_20210706_161725.jpg
Attachment 3: P_20210706_145210.jpg
P_20210706_145210.jpg
Attachment 4: P_20210706_161255.jpg
P_20210706_161255.jpg
Attachment 5: P_20210706_145815.jpg
P_20210706_145815.jpg
Attachment 6: P_20210706_145805.jpg
P_20210706_145805.jpg
Attachment 7: PXL_20210707_005717772.jpg
PXL_20210707_005717772.jpg
  16245   Wed Jul 14 16:19:44 2021 gautamUpdateGeneralBrrr

Since the repair work, the temperature is significantly cooler. Surprisingly, even at the vertex (to be more specific, inside the PSL enclosure, which for the time being is the only place where we have a logged temperature sensor, but this is not attributable to any change in the HEPA speed), the temperature is a good 3 deg C cooler than it was before the HVAC work (even though Koji's wind vane suggest the vents at the vertex were working). The setpoint for the entire lab was modified? What should the setpoint even be?

Quote:
 

- I went to the south arm. There are two big vent ducts for the outlets and intakes. Both are not flowing the air.
  The current temp at 7pm was ~30degC. Max and min were 31degC and 18degC.

- Then I went to the vertex and the east arm. The outlets and intakes are flowing.

Attachment 1: rmTemp.pdf
rmTemp.pdf
  16246   Wed Jul 14 19:21:44 2021 KojiUpdateGeneralBrrr

Jordan reported on Jun 18, 2021:
"HVAC tech came today, and replaced the thermostat and a coolant tube in the AC unit. It is working now and he left the thermostat set to 68F, which was what the old one was set to."

  16250   Sat Jul 17 00:52:33 2021 KojiUpdateGeneralCanon camera / small silver tripod / macro zoom lens / LED ring light borrowed -> QIL

Canon camera / small silver tripod / macro zoom lens / LED ring light borrowed -> QIL

Attachment 1: P_20210716_213850.jpg
P_20210716_213850.jpg
  16255   Sun Jul 25 18:21:10 2021 KojiUpdateGeneralCanon camera / small silver tripod / macro zoom lens / LED ring light returned / Electronics borrowed

Camera and accesories returned

One HAM-A coildriver and one sat amp borrowed -> QIL

https://nodus.ligo.caltech.edu:8081/QIL/2616

 

  16265   Wed Jul 28 20:20:09 2021 YehonathanUpdateGeneralThe temperature sensors and function generator have arrived in the lab

I put the temperature sensors box on Anchal's table (attachment 1) and the function generator on the table in front of the c1auxey Acromag chassis (attachment 2).

 

Attachment 1: 20210728_201313.jpg
20210728_201313.jpg
Attachment 2: 20210728_201607.jpg
20210728_201607.jpg
  16269   Wed Aug 4 18:19:26 2021 pacoUpdateGeneralAdded infrasensing temperature unit to martian network

[ian, anchal, paco]

We hooked up the infrasensing unit to power and changed its default IP address from 192.168.11.160 (factory default) to 192.168.113.240 in the martian network. The sensor is online with user controls and the usual password for most workstations in that IP address.

  16270   Thu Aug 5 14:59:31 2021 AnchalUpdateGeneralAdded temperature sensors at Yend and Vertex too

I've added the other two temperature sensor modules on Y end (on 1Y4, IP: 192.168.113.241) and in the vertex on (1X2, IP: 192.168.113.242). I've updated the martian host table accordingly. From inside martian network, one can go to the browser and go to the IP address to see the temperature sensor status . These sensors can be set to trigger alarm and send emails/sms etc if temperature goes out of a defined range.

I feel something is off though. The vertex sensor shows temperature of ~28 degrees C, Xend says 20 degrees C and Yend says 26 degrees C. I believe these sensors might need calibration.

Remaining tasks are following:

  • Modbus TCP solution:
    • If we get it right, this will be easiest solution.
    • We just need to add these sensors as streaming devices in some slow EPICS machine in there .cmd file and add the temperature sensing channels in a corresponding database file.
  • Python workaround:
    • Might be faster but dirty.
    • We run a python script on megatron which requests temperature values every second or so from the IP addresses and write them on a soft EPICs channel.
    • We still would need to create a soft EPICs channel fro this and add it to framebuilder data acquisition list.
    • Even shorted workaround for near future could be to just write temperature every 30 min to a log file in some location.

[anchal, paco]

We made a script under scripts/PEM/temp_logger.py and ran it on megatron. The script uses the requests package to query the latest sensor data from the three sensors every 10 minutes as a json file and outputs accordingly. This is not a permanent solution.

  16274   Tue Aug 10 17:24:26 2021 pacoUpdateGeneralFive day trend

Attachment 1 shows a five and a half day minute-trend of the three temperature sensors. Logging started last Thursday ~ 2 pm when all sensors were finally deployed. While it appears that there is a 7 degree gradient along the XARM it seems like the "vertex" (more like ITMX) sensor was just placed on top of a network switch (which feels lukewarm to the touch) so this needs to be fixed. A similar situation is observed in the ETMY sensor. I shall do this later today.


Done. The temperature reading should now be more independent from nearby instruments.


Wed Aug 11 09:34:10 2021 I updated the plot with the full trend before and after rearranging the sensors.

Attachment 1: six_day_minute_trend.png
six_day_minute_trend.png
  16277   Thu Aug 12 11:04:27 2021 PacoUpdateGeneralPSL shutter was closed this morning

Thu Aug 12 11:04:42 2021 Arrived to find the PSL shutter closed. Why? Who? When? How? No elog, no fun. I opened it, IMC is now locked, and the arms were restored and aligned.

  16278   Thu Aug 12 14:59:25 2021 KojiUpdateGeneralPSL shutter was closed this morning

What I was afraid of was the vacuum interlock. And indeed there was a pressure surge this morning. Is this real? Why didn't we receive the alert?

Attachment 1: Screen_Shot_2021-08-12_at_14.58.59.png
Screen_Shot_2021-08-12_at_14.58.59.png
  16279   Thu Aug 12 20:52:04 2021 KojiUpdateGeneralPSL shutter was closed this morning

I did a bit more investigation on this.

- I checked P1~P4, PTP2/3, N2, TP2, TP3. But found only P1a and P2 were affected.

- Looking at the min/mean/max of P1a and P2 (Attachment 1), the signal had a large fluctuation. It is impossible to have P1a from 0.004 to 0 instantaneously.

- Looking at the raw data of P1a and P2 (Attachment 2), the value was not steadily large. Instead it looks like fluctuating noise.

So my conclusion is that because of an unknown reason, an unknown noise coupled only into P1a and P2 and tripped the PSL shutter. I still don't know the status of the mail alert.

Attachment 1: Screen_Shot_2021-08-12_at_20.51.19.png
Screen_Shot_2021-08-12_at_20.51.19.png
Attachment 2: Screen_Shot_2021-08-12_at_20.51.34.png
Screen_Shot_2021-08-12_at_20.51.34.png
  16288   Mon Aug 23 11:51:26 2021 KojiUpdateGeneralCampus Wide Power Glitch Reported: Monday, 8/23/21 at 9:30am

Campus Wide Power Glitch Reported: Monday, 8/23/21 at 9:30am (more like 9:34am according to nodus log)

nodus: rebooted. ELOG/apache/svn is running. (looks like Anchal worked on it)

chiara: survived the glitch thanks to UPS

fb1: not responding -> @1pm open to login / seemed rebooted only at 9:34am (network path recovered???)

megatron: not responding

optimus: no route to host

c1aux: ping ok, ssh not responding -> needed to use telnet (vme / vxworks)
c1auxex: ssh ok
c1auxey: ping ok, ssh not respoding -> needed to use telnet (vme / vxworks)
c1psl: ping NG, power cycled the switch on 1X2 -> ssh OK now
c1iscaux: ping NG -> rebooted the machine -> ssh recovered

c1iscaux2: does not exist any more
c1susaux: ping NG -> responds after 1X2 switch reboot

c1pem1: telnet ok (vme / vxworks)
c1iool0: does not exist any more

c1vac1: ethernet service restarted locally -> responding
ottavia: doesnot exist?
c1teststand: ping ok, ssh not respoding

3:20PM we started restarting the RTS

  16290   Mon Aug 23 19:00:05 2021 KojiUpdateGeneralCampus Wide Power Glitch Reported: Monday, 8/23/21 at 9:30am

Restarting the RTS was unsuccessful because of the timing discrepancy error between the RT machines and the FB. This time no matter how we try to set the time, the IOPs do not run with "DC status" green. (We kept having 0x4000)

We then decided to work on the recovery without the data recorded. After some burtrestores, the IMC was locked and the spot appeared on the AS port. However, IPC seemed down and no WFS could run.

  16291   Mon Aug 23 22:51:44 2021 AnchalUpdateGeneralTime synchronization efforts

Related elog thread: 16286


I didn't really achieve anything but I'm listing what I've tried.

  • I know now that the timesyncd isn't working because systemd-timesyncd is known to have issues when running on a read-only file system. In particular, the service does not have privileges to change the clock or drift settings at /run/systemd/clock or /etc/adjtime.
  • The workarounds to these problems are poorly rated/reviews in stack exchange and require me to change the /etc/systmd/timesyncd.conf file but I'm unable to edit this file.
  • I know that Paco was able to change these files earlier as the files are now changed and configured to follow a debian ntp pool server which won't work as the FEs do not have internet access. So the conf file needs to be restored to using ntpserver as the ntp server.
  • From system messages, the ntpserver is recognized by the service as shown in the second part of 16285. I really think the issue is in file permissions. the file /etc/adjtime has never been updated since 2017.
  • I got help from Paco on how to edit files for FE machines. The FE machines directories are exported from fb1:/diskless/root.jessie/
  • I restored the /etc/systmd/timesyncd.conf file to how it as before with just servers=ntpserver line. Restarted timesyncd service on all FEs,I tried a few su the synchronization did not happen.
  • I tried a few suggestions from stackexchange but none of them worked. The only rated solution creates a tmpfs directory outside of read-only filesystem and uses that to run timesyncd. So, in my opinion, timesyncd  would never work in our diskless read-only file system FE machines.
  • One issue in an archlinux discussion ended by the questioner resorting to use opennptd from openBSD distribution. The user claimed that opennptd is simple enough that it can run ntp synchornization on a read-only file system.
  • Somehwat painfully, I 'kind of' installed the openntpd tool in the fb1:/diskless/root.jessie directory following directions from here. I had to manually add user group and group for the FEs (which I might not have done correctly). I was not able to get the openntpd daemon to start properly after soe tries.
  • I restored everything back to how it was and restarted timesyncd in c1sus even though it would not do anything really.
Quote:

This time no matter how we try to set the time, the IOPs do not run with "DC status" green. (We kept having 0x4000)

 

  16292   Tue Aug 24 09:22:48 2021 AnchalUpdateGeneralTime synchronization working now

Jamie told me to use chroot to log in into the chroot jail of debian os that are exported for the FEs and install ntp there. I took following steps at the end of which, all FEs have NTP synchronized now.

  • I logged into fb1 through nodus.
  • chroot /diskless/root.jessie /bin/bash took me to the bash terminal for debian os that is exported to all FEs.
  • Here, I ran sudo apt-get install ntp which ran without any errors.
  • I then edited the file in /etc/ntp.conf , i removed the default servers and added following lines for servers (fb1 and nodus ip addresses):
    server 192.113.168.201
    server 192.113.168.201
  • I logged into each FE machine and ran following commands:
    sudo systemctl stop systemd-timesyncd.service; sudo systemctl status systemd-timesyncd.service;
    timedatectl; sleep 2;sudo systemctl daemon-reload;  sudo systemctl start ntp; sleep 2; sudo systemctl status ntp; timedatectl
    sudo hwclock -s
    • The first line ensures that systemd-timesyncd.service is not running anymore. I did not uninstall timesyncd and left its configuration file as it is.
    • The second line first shows the times of local and RTC clocks. Then reloads the daemon services to get ntp registered. Then starts ntp.service and shows it's status. Finally, the timedatectl command shows the synchronized clocks and that NTP synchronization has occured.
    • The last line sets the local clock same as RTC clock. Even though this wasn't required as I saw that the clocks were already same to seconds, I just wanted a point where all the local clocks are synchronized to the ntp server.
  • Hopefully, this would resolve our issue of restarting the models anytime some glitch happens or when we need ot update something in one of them.

Edit Tue Aug 24 10:19:11 2021:

I also disabled timesyncd on all FEs using sudo systemctl disable systemd-timesyncd.service

I've added this wiki page for summarizing the NTP synchronization knowledge.

  16293   Tue Aug 24 18:11:27 2021 PacoUpdateGeneralTime synchronization not really working

tl;dr: NTP servers and clients were never synchronized, are not synchronizing even with ntp... nodus is synchronized but uses chronyd; should we use chronyd everywhere?


Spent some time investigating the ntp synchronization. In the morning, after Anchal set up all the ntp servers / FE clients I tried restarting the rts IOPs with no success. Later, with Tega we tried the usual manual matching of the date between c1iscex and fb1 machines but we iterated over different n-second offsets from -10 to +10, also without success.

This afternoon, I tried debugging the FE and fb1 timing differences. For this I inspected the ntp configuration file under /etc/ntp.conf in both the fb1 and /diskless/root.jessie/etc/ntp.conf (for the FE machines) and tried different combinations with and without nodus, with and without restrict lines, all while looking at the output of sudo journalctl -f on c1iscey. Everytime I changed the ntp config file, I restarted the service using sudo systemctl restart ntp.service . Looking through some online forums, people suggested basic pinging to see if the ntp servers were up (and broadcasting their times over the local network) but this failed to run (read-only filesystem) so I went into fb1, and ran sudo chroot /diskless/root.jessie/ /bin/bash to allow me to change file permissions. The test was first done with /bin/ping which couldn't even open a socket (root access needed) by running chmod 4755 /bin/ping then ssh-ing into c1iscey and pinging the fb1 machine successfully. After this, I ran chmod 4755 /usr/sbin/ntpd so that the ntp daemon would have no problem in reaching the server in case this was blocking the synchronization. I exited the chroot shell and the ntp daemon in c1iscey; but the ntpstat still showed unsynchronised status. I also learned that when running an ntp query with ntpq -p if a client has succeeded in synchronizing its time to the server time, an asterisk should be appended at the end. This was not the case in any FE machine... and looking at fb1, this was also not true. Although the fb1 peers are correctly listed as nodus, the caltech ntp server, and a broadcast (.BCST.) server from local time (meant to serve the FE machines), none appears to have synchronized... Going one level further, in nodus I checked the time synchronization servers by running chronyc sources the output shows

controls@nodus|~> chronyc sources
210 Number of sources = 4
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^* testntp1.superonline.net      1  10   377   280  +1511us[+1403us] +/-   92ms
^+ 38.229.59.9                   2  10   377   206  +8219us[+8219us] +/-  117ms
^+ tms04.deltatelesystems.ru     2  10   377   23m    -17ms[  -17ms] +/-  183ms
^+ ntp.gnc.am                    3  10   377   914  -8294us[-8401us] +/-  168ms

I then ran chronyc clients to find if fb1 was listed (as I would have expected) but the output shows this --

Hostname                   Client    Peer CmdAuth CmdNorm  CmdBad  LstN  LstC
=========================  ======  ======  ======  ======  ======  ====  ====
501 Not authorised

So clearly chronyd succeeded in synchronizing nodus' time to whatever server it was pointed at but downstream from there, neither the fb1 or any FE machines seem to be synchronizing properly. It may be as simple as figuring out the correct ntp configuration file, or switching to chronyd for all machines (for the sake of homogeneity?)

  16295   Tue Aug 24 22:37:40 2021 AnchalUpdateGeneralTime synchronization not really working

I attempted to install chrony and run it on one of the FE machines. It didn't work and in doing so, I lost the working NTP client service on the FE computers as well. Following are some details:

  • I added the following two mirrors in the apt source list of root.jessie at /etc/apt/sources.list
    deb http://ftp.us.debian.org/debian/ jessie main contrib non-free
    deb-src http://ftp.us.debian.org/debian/ jessie main contrib non-free
  • Then I installed chrony in the root.jessie using
    sudo apt-get install chrony
    • I was getting an error E: Can not write log (Is /dev/pts mounted?) - posix_openpt (2: No such file or directory) . To fix this, I had to run:
      sudo mount -t devpts none "$rootpath/dev/pts" -o ptmxmode=0666,newinstance
      sudo ln -fs "pts/ptmx" "$rootpath/dev/ptmx"
    • Then, I had another error to resolve.
      Failed to read /proc/cmdline. Ignoring: No such file or directory
      start-stop-daemon: nothing in /proc - not mounted?
      To fix this, I had to exit to fb1 and run:
      sudo mount --bind /proc /diskless/root.jessie/proc
    • With these steps, chrony was finally installed, but I immediately saw an error message saying:
      Starting /usr/sbin/chronyd...
      Could not open NTP sockets
  • I figured this must be due to ntp running in the FE machines.  I logged into c1iscex and stopped and disabled the ntp service:
    sudo systemctl stop ntp
    sudo systemctl disable ntp
    • I saw some error messages from the above coomand as FEs are read only file systems:
      Synchronizing state for ntp.service with sysvinit using update-rc.d...
      Executing /usr/sbin/update-rc.d ntp defaults
      insserv: fopen(.depend.stop): Read-only file system
      Executing /usr/sbin/update-rc.d ntp disable
      update-rc.d: error: Read-only file system
    • So I went back to chroot in fb1 and ran the two command sabove that failed:
      /usr/sbin/update-rc.d ntp defaults
      /usr/sbin/update-rc.d ntp disable
    • The last line gave the output:
      insserv: warning: current start runlevel(s) (empty) of script `ntp' overrides LSB defaults (2 3 4 5).
      insserv: warning: current stop runlevel(s) (2 3 4 5) of script `ntp' overrides LSB defaults (empty).
    • I igored this and moved forward.
  • I copied the chronyd.service from nodus to the chroot in fb1 and configured it to use nodus as the server. The I started the chronyd.service

    sudo systemctl status chronyd.service
    but got the saem issue of NTP sockets.

    â—Â chronyd.service - NTP client/server
       Loaded: loaded (/usr/lib/systemd/system/chronyd.service; disabled)
       Active: failed (Result: exit-code) since Tue 2021-08-24 21:52:30 PDT; 5s ago
      Process: 790 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=1/FAILURE)

    Aug 24 21:52:29 c1iscex systemd[1]: Starting NTP client/server...
    Aug 24 21:52:30 c1iscex chronyd[790]: Could not open NTP sockets
    Aug 24 21:52:30 c1iscex systemd[1]: chronyd.service: control process exited, code=exited status=1
    Aug 24 21:52:30 c1iscex systemd[1]: Failed to start NTP client/server.
    Aug 24 21:52:30 c1iscex systemd[1]: Unit chronyd.service entered failed state.

  • I tried a few things to resolve this, but couldn't get it to work. So I gave up on using chrony and decided to go back to ntp service atleast.

  • I stopped, disabled and checked status of chrony:
    sudo systemctl stop chronyd
    sudo systemctl disable chronyd
    sudo systemctl status chronyd
    This gave the output:

    â—Â chronyd.service - NTP client/server
       Loaded: loaded (/usr/lib/systemd/system/chronyd.service; disabled)
       Active: failed (Result: exit-code) since Tue 2021-08-24 22:09:07 PDT; 25s ago

    Aug 24 22:09:07 c1iscex systemd[1]: Starting NTP client/server...
    Aug 24 22:09:07 c1iscex chronyd[2490]: Could not open NTP sockets
    Aug 24 22:09:07 c1iscex systemd[1]: chronyd.service: control process exited, code=exited status=1
    Aug 24 22:09:07 c1iscex systemd[1]: Failed to start NTP client/server.
    Aug 24 22:09:07 c1iscex systemd[1]: Unit chronyd.service entered failed state.
    Aug 24 22:09:15 c1iscex systemd[1]: Stopped NTP client/server.

  • I went back to fb1 chroot and removed chrony package and deleted the configuration files and systemd service files:
    sudo apt-get remove chrony

  • But when I started ntp daemon service back in c1iscex, it gave error:
    sudo systemctl restart ntp
    Job for ntp.service failed. See 'systemctl status ntp.service' and 'journalctl -xn' for details.

  • Status shows:

    sudo systemctl status ntp
    â—Â ntp.service - LSB: Start NTP daemon
       Loaded: loaded (/etc/init.d/ntp)
       Active: failed (Result: exit-code) since Tue 2021-08-24 22:09:56 PDT; 9s ago
      Process: 2597 ExecStart=/etc/init.d/ntp start (code=exited, status=5)

    Aug 24 22:09:55 c1iscex systemd[1]: Starting LSB: Start NTP daemon...
    Aug 24 22:09:56 c1iscex systemd[1]: ntp.service: control process exited, code=exited status=5
    Aug 24 22:09:56 c1iscex systemd[1]: Failed to start LSB: Start NTP daemon.
    Aug 24 22:09:56 c1iscex systemd[1]: Unit ntp.service entered failed state.

  • I tried to enable back the ntp service by sudo systemctl enable ntp. I got similar error messages of read only filesystem as earlier.
    Synchronizing state for ntp.service with sysvinit using update-rc.d...
    Executing /usr/sbin/update-rc.d ntp defaults
    insserv: warning: current start runlevel(s) (empty) of script `ntp' overrides LSB defaults (2 3 4 5).
    insserv: warning: current stop runlevel(s) (2 3 4 5) of script `ntp' overrides LSB defaults (empty).
    insserv: fopen(.depend.stop): Read-only file system
    Executing /usr/sbin/update-rc.d ntp enable
    update-rc.d: error: Read-only file system

    • I went back to chroot in fb1 and ran:
      /usr/sbin/update-rc.d ntp defaults
      insserv: warning: current start runlevel(s) (empty) of script `ntp' overrides LSB defaults (2 3 4 5).
      insserv: warning: current stop runlevel(s) (2 3 4 5) of script `ntp' overrides LSB defaults (empty).
      and
      /usr/sbin/update-rc.d ntp enable

  • I came back to c1iscex and tried restarting the ntp service but got same error messages as above with exit code 5.

  • I checked c1sus, the ntp was running there. I tested the configuration by restarting the ntp service, and then it failed with same error message. So the remaining three FEs, c1lsc, c1ioo and c1iscey have running ntp service, but they won't be able to restart.

  • As a last try, I rebooted c1iscex to see if ntp comes back online nicely, but it doesn't.

Bottom line, I went to try chrony in the FEs, and I ended up breaking the ntp client services on the computers as well. We have no NTP synchronization in any of the FEs.

Even though Paco and I are learning about the ntp and cds stuff, I think it's time we get help from someone with real experience. The lab is not in a good state for far too long.

Quote:

tl;dr: NTP servers and clients were never synchronized, are not synchronizing even with ntp... nodus is synchronized but uses chronyd; should we use chronyd everywhere?

 

  16306   Wed Sep 1 21:55:14 2021 KojiSummaryGeneralTowards the end upgrade

- Sat amp mod and test: on going (Tega)
- Coil driver mod and test: on going (Tega)

- Acromag: almost ready (Yehonathan)

- IDC10-DB9 cable / D2100641 / IDC10F for ribbon in hand / Dsub9M ribbon brought from Downs / QTY 2 for two ends -> Made 2 (stored in the DSUB connector plastic box)
- IDC40-DB9 cable / D2100640 / IDC40F for ribbon in hand / DB9F solder brought from Downs  / QTY 4 for two ends -> Made 4 0.5m cables (stored in the DSUB connector plastic box)

- DB15-DB9 reducer cable / ETMX2+ETMY2+VERTEX16+NewSOS14 = 34 / to be ordered

- End DAC signal adapter with Dewhitening (with DIFF/SE converter) / to be designed & built
- End ADC adapter (with SE/DIFF converter) / to be designed & built


MISC Ordering

  • 3.5 x Sat Amp Adapter made (order more DSUB25 conns)
    • -> Gave 2 to Tega, 1.5 in the DSUB box
    • 5747842-4 A32100-ND -> ‎5747842-3‎ A32099-ND‎ Qty40
    • 5747846-3 A32125-ND -> ‎747846-3‎ A23311-ND‎ Qty40
  • Tega's sat amp components
    • 499Ω P499BCCT-ND 78 -> Backorder -> ‎RG32P499BCT-ND‎ Qty 100
    • 4.99KΩ TNPW12064K99BEEA 56 -> Qty 100
    • 75Ω YAG5096CT-ND 180 -> Qty 200
    • 1.82KΩ P18391CT-ND 103 -> Qty 120
    • 68 nF P10965-ND 209
  • Order more DB9s for Tega's sat amp adapter 4 units (look at the AA IO BOM) 
    • 4x 8x 5747840-4 DB9M PCB A32092-ND -> 6-747840-9‎ A123182-ND‎ Qty 35
    • 4x 5x 5747844-4 A32117-ND -> Qty 25
    • 4x 5x DB9M ribbon MMR09K-ND -> 8209-8000‎ 8209-8000-ND‎ Qty 25
    • 4x 5x 5746861-4 DB9F ribbon 5746861-4-ND -> 400F0-09-1-00 ‎LFR09H-ND‎ Qty 35
  • Order 18bit DAC AI -> 16bit DAC AI components 4 units
    • 4x 4x 5747150-8 DSUB9F PCB A34072-ND -> ‎D09S24A4PX00LF‎609-6357-ND‎ Qty 20
    • 4x 1x 787082-7 CONN D-TYPE RCPT 68POS R/A SLDR (SCSI Female) A3321-ND -> ‎5787082-7‎ A31814-ND‎ Qty 5
    • 4x 1x 22-23-2021 Connector Header Through Hole 2 position 0.100" (2.54mm)    WM4200-ND -> Qty5

 

 

  16317   Wed Sep 8 19:06:14 2021 KojiUpdateGeneralBackup situation

Tega mentioned in the meeting that it could be safer to separate some of nodus's functions from the martian file system.
That's an interesting thought. The summary pages and other web services are linked to the user dir. This has high traffic and can cause the issure of the internal network once we crash the disk.
Or if the internal system is crashed, we still want to use elogs as the source of the recovery info. Also currently we have no backup of the elog. This is dangerous.

We can save some of the risks by adding two identical 2TB disks to nodus to accomodate svn/elog/web and their daily backup.

host file system or contents condition note
nodus root none or unknown  
nodus home (svn, elog) none  
nodus web (incl summary pages) backed up linked to /cvs/cds
chiara root maybe need to check with Jon/Anchal
chiara /home/cds local copy The backup disk is smaller than the main disk.
chiara /home/cds remote copy - stalled we used to have, but stalled since 2017/11/17
fb1 root maybe need to check with Jon/Anchal
fb1 frame rsync pulled from LDAS according to Tega
       

 

  16319   Mon Sep 13 04:12:01 2021 TegaUpdateGeneralAdded temperature sensors at Yend and Vertex too

I finally got the modbus part working on chiara, so we can now view the temperature data on any machine on the martian network, see Attachment 1. 

I also updated the entries on /opt/rtcds/caltech/c1/chans/daq/C0EDCU.ini, as suggested by Koji, to include the SensorGatway temperature channels, but I still don't see their EPICs channels on https://ldvw.ligo.caltech.edu/ldvw/view. This means the channels are not available via nds so I think the temperature data is not being to be written to frame files on framebuilder but I am not sure what this entails, since I assumed C0EDCU.ini is the framebuilder daq channel list.

When the EPICs channels are available via nds, we should be able to display the temperature data on the summary pages.

Quote:

I've added the other two temperature sensor modules on Y end (on 1Y4, IP: 192.168.113.241) and in the vertex on (1X2, IP: 192.168.113.242). I've updated the martian host table accordingly. From inside martian network, one can go to the browser and go to the IP address to see the temperature sensor status . These sensors can be set to trigger alarm and send emails/sms etc if temperature goes out of a defined range.

I feel something is off though. The vertex sensor shows temperature of ~28 degrees C, Xend says 20 degrees C and Yend says 26 degrees C. I believe these sensors might need calibration.

Remaining tasks are following:

  • Modbus TCP solution:
    • If we get it right, this will be easiest solution.
    • We just need to add these sensors as streaming devices in some slow EPICS machine in there .cmd file and add the temperature sensing channels in a corresponding database file.
  • Python workaround:
    • Might be faster but dirty.
    • We run a python script on megatron which requests temperature values every second or so from the IP addresses and write them on a soft EPICs channel.
    • We still would need to create a soft EPICs channel fro this and add it to framebuilder data acquisition list.
    • Even shorted workaround for near future could be to just write temperature every 30 min to a log file in some location.

[anchal, paco]

We made a script under scripts/PEM/temp_logger.py and ran it on megatron. The script uses the requests package to query the latest sensor data from the three sensors every 10 minutes as a json file and outputs accordingly. This is not a permanent solution.

 

Attachment 1: Screen_Shot_2021-09-13_at_4.16.22_AM.png
Screen_Shot_2021-09-13_at_4.16.22_AM.png
  16334   Wed Sep 15 23:53:54 2021 KojiSummaryGeneralTowards the end upgrade

Ordered compoenents are in.

- Made 36 more Sat Amp internal boards (Attachment 1). Now we can install the adapters to all the 19 sat amp units.

- Gave Tega the components for the sat amp adapter units. (Attachment 2)

- Gave Tega the componennts for the sat amp / coil driver modifications.

- Made 5 PCBs for the 16bit DAC AI rear panel interface (Attachment 3)

Attachment 1: P_20210915_231308.jpg
P_20210915_231308.jpg
Attachment 2: P_20210915_225039.jpg
P_20210915_225039.jpg
Attachment 3: P_20210915_224341.jpg
P_20210915_224341.jpg
  16335   Thu Sep 16 00:00:20 2021 KojiUpdateGeneralRIO Planex 1064 Lasers in the south cabinet

RIO Planex 1064 Lasers in the south cabinet

Property Number C30684/C30685/C30686/C30687

Attachment 1: P_20210915_232426.jpg
P_20210915_232426.jpg
  16336   Thu Sep 16 01:16:48 2021 KojiUpdateGeneralFrozen 2

It happened again. Defrosting required.

Attachment 1: P_20210916_003406_1.jpg
P_20210916_003406_1.jpg
  16337   Thu Sep 16 10:07:25 2021 AnchalUpdateGeneralMelting 2

Put outside.

Quote:

It happened again. Defrosting required.

 

Attachment 1: PXL_20210916_170602832.jpg
PXL_20210916_170602832.jpg
  16340   Thu Sep 16 20:18:13 2021 AnchalUpdateGeneralReset

Fridge brought back inside.

Quote:

Put outside.

Quote:

It happened again. Defrosting required.

 

 

Attachment 1: PXL_20210917_031633702.jpg
PXL_20210917_031633702.jpg
  16341   Fri Sep 17 00:56:49 2021 KojiUpdateGeneralAwesome

The Incredible Melting Man!

 

  16403   Thu Oct 14 16:38:26 2021 Ian MacMillanUpdateGeneralKicking optics in freeSwing measurment

[Ian, Anchal]

We are going to kick the optics tonight at 2am.

The optics we will kick are the PRM BS ITMX ITMY ETMX ETMY

We will kick each one once and record for 2000 seconds and the log files will be placed in users/ian/20211015_FreeSwingTest/logs.

  16405   Thu Oct 14 20:16:22 2021 YehonathanUpdateGeneralPRMI free swinging

{Yehonathan, Raj}

We aligned the IFO in the PRMI state and let it swing freely.

  16406   Fri Oct 15 12:14:27 2021 Ian MacMillanUpdateGeneralKicking optics in freeSwing measurment

[Ian, Anchal]

we ran the free swinging test last night and the results match up with in 1/10th of a Hz. We calculated the peak using the getPeakFreqs2 script to find the peaks and they are close to previous values from 2016.

In attachment 1 you will see the results of the test for each optic.

The peak values are as follows:

Optic POS (Hz) PIT (Hz) YAW (Hz) SIDE (Hz)
PRM 0.94 0.96 0.99 0.99
MC2 0.97 0.75 0.82 0.99
ETMY 0.98 0.98 0.95 0.95
MC1 0.97 0.68 0.80 1.00
ITMX 0.95 0.68 0.68 0.98
ETMX 0.96 0.73 0.85 1.00
BS 0.99 0.74 0.80 0.96
ITMY 0.98 0.72 0.72 0.98
MC3 0.98 0.77 0.84 0.97

The results from 2016 can be found at: /cvs/cds/rtcdt/caltech/c1/scripts/SUS/PeakFit/parameters2.m

Attachment 1: 20211015_Kicktest_plot.pdf
20211015_Kicktest_plot.pdf 20211015_Kicktest_plot.pdf 20211015_Kicktest_plot.pdf 20211015_Kicktest_plot.pdf 20211015_Kicktest_plot.pdf 20211015_Kicktest_plot.pdf 20211015_Kicktest_plot.pdf 20211015_Kicktest_plot.pdf
ELOG V3.1.3-