40m QIL Cryo_Lab CTN SUS_Lab CAML OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 14 of 350  Not logged in ELOG logo
ID Date Authorup Type Category Subject
  16214   Fri Jun 18 14:53:37 2021 AnchalSummaryPEMTemperature sensor network proposal

I propose we set up a temperature sensor network as described in attachment 1.

Here there are two types of units:

  • BASE-GATEWAY
    • Holds the processor to talk to the network through Modbus TCP/IP protocol.
    • This unit itself has a temperature sensor in it. It is powered by a power adaptor or PoE from the switch.
    • Each base unit can have at max 2 extended temperature probes ENV-TEMP.
  • ENV-TEMP
    • This is just a temperature probe with no other capabilities.
    • It is powered via PoE from the BASE-GATEWAY unit.

Proposal is

  • to put 2 ENV-TEMP sensors (#1 and #2) along the Y-arm at the end and midway. These are powered and read through a BASE-GATEWAY (#A) at the vertex. The BASE-GATEWAY (#A) will serve as temperature sensor for the vertex.
  • We put one ENV-TEMP(#3) at the X-end powered and read through by another BASE-GATEWAY (#B) at the midpoint of X-arm.
  • Both BASE-GATEWAY are connected by ethernet cables to the network switch. That's it.

These sensors can be configured over network by going to their assigned IP addresses. I'm not sure at the moment how to configure the dB files to get them to write on slow EPICS channels.

We will have an unused port on the BASE-GATEWAY (#B) which can be used to run another temperature sensor, maybe at an important rack, PSL table or somewhere else.

In future, if more sensors are required, there are expansion (network switch like) options for BASE-GATEWAY or daisy-chain options for the probes.



Edit Fri Jun 18 16:28:13 2021 :

See this [wiki page](https://wiki-40m.ligo.caltech.edu/Physical_Environment_Monitoring/Thermometers) for updated plan and final quote.

Attachment 1: 40mTempSensors.pdf
40mTempSensors.pdf
  16222   Wed Jun 23 09:05:02 2021 AnchalUpdateSUSMC lock acquired back again

MC was unable to acquire lock because the WFS offsets were cleared to zero at some point and because of that MC was very misaligned to be able to catch back lock. In such cases, one wants the WFS to start accumulating offsets as soon as minimal lock is attained so that the mode cleaner can be automatically aligned. So I did following that worked:

  • Made the C1:IOO-WFS_TRIG_WAIT_TIME (delay in WFS trigger) from 3s to 0s.
  • Reduced C1:IOO-WFS_TRIGGER_THRESH_ON (Switchin on threshold) from 5000 to 1000.
  • Then as soon as a TEM00 was locked with poor efficiency, the WFS loops started aligning the optics to bring it back to lock.
  • After robust lock has been acquired, I restored the two settings I changed above.
Quote:

 


At the end, since MC has trouble catching lock after opening PSL shutter, I tried running burt restore the ioo to 2021/Jun/17/06:19/c1iooepics.snap but the problem persists

 

  16231   Wed Jun 30 15:31:35 2021 AnchalSummaryOptical LeversCentered optical levers on ITMY, BS, PRM and ETMY

When both arms were locked, we found that ITMY optical lever was very off-center. This seems to have happened after the c1susaux rebooting we did in June 17th. I opened the ITMY table and realigned the OPLev beam to the center when the arm was locked. I repeated this process for BS, PRM and ETMY. I did PRM because I've known that we have been keeping its OpLev off. The reason was clear once I opened the table. The oplev reflection beam was hitting the PD box instead of the PD. After correcting, I was able to swithc on PRM opLev loops and saw normal functioning.

  16232   Wed Jun 30 18:44:11 2021 AnchalSummaryLSCTried fixing ETMY QPD

I worked in Yend station, trying to get the ETMY QPD to work properly. When I started, only one (quadrant #3) of the 4 quadrants were seeing any lights. By just changing the beam splitter that reflects some light off to the QPD, I was able to get some amount of light in quadrant #2. However, no amount of steering would show any light in any other quadrants.

The only reason I could think of is that the incoming beam gets partially clipped as it seems to be hitting the beam splitter near the top edge. So for this to work properly, a mirror upstream needs to be adjusted which would change the alignment of TRX photodiode. Without the light on TRX photodiode, there is no lock and there is no light. So one can't steer this beam without lossing lock.

I tried one trick, in which, I changed the YARM lock trigger to POY DC signal. I got it to work to get the lock going even when TRY was covered by a beam finder card. However, this lock was still bit finicky and would loose lock very frequently. It didn't seem worth it to potentially break the YARM locking system for ETMY QPD before running this by anyone and this late in evening. So I reset everything to how it was (except the beam splitter that reflects light to EMTY QPD. That now has equal ligth falling on quadrant #2 and #3.

The settings I temporarily changed were:

  • C1:LSC-TRIG_MTRX_7_10 changed from 0 to -1 (uses POY DC as trigger)
  • C1:LSC-TRIG_MTRX_7_13 changed from 1 to 0 (stops using TRY DC as trigger)
  • C1:LSC-YARM_TRIG_THRESH_ON changed from 0.3 to -22
  • C1:LSC-YARM_TRIG_THRESH_OFF changed from 0.1 to -23.6
  • C1:LSC-YARM_FM_TRIG_THRESH_ON changed from 0.5 to -22
  • C1:LSC-YARM_FM_TRIG_THRESH_OFF changed from 0.1 to -23.6

All these were reverted back to there previous values manually at the end.

 

  16236   Thu Jul 1 16:55:21 2021 AnchalSummaryOptical LeversFixed Centeringoptical levers PRM

This was a mistake. When arms are locked, PRM is misaligned by setting -800 offset in PIT dof of PRM. The oplev is set to function in normal state not this misalgined configuration. I undid my changes today by switching off the offset, realigning the oplev to center and then restoring the single arm locked state. The PRM OpLevs loops are off now.

Quote:

 PRM because I've known that we have been keeping its OpLev off. The reason was clear once I opened the table. The oplev reflection beam was hitting the PD box instead of the PD. After correcting, I was able to swithc on PRM opLev loops and saw normal functioning.

 

  16242   Fri Jul 9 15:39:08 2021 AnchalSummaryALSSingle Arm Actuation Calibration with IR ALS Beat [Correction]

I did this analysis again by just doing demodulation go 5s time segments of the 60s excitation signal. The major difference is that I was not summing up the sine-cosine multiplied signals, so the error associated was a lot more. If I simply multpy the whole beatnote signal with digital LO created at excitation frequency, divide it up in 12 segments of 5 s each, sum them up individually, then take the mean and standard deviation, I get the answer as:
\frac{6.88 \pm 0.05}{f^2} nm/ctsas opposed to \frac{7.32 \pm 0.03}{f^2} nm/ctsthat was calculated using MICH signal earlier by gautum in 13984.

Attachment 1 shows the scatter plot for the complex calibration factors found for the 12 segments.

My aim in the previous post was however to get a time series of the complex calibration factor from which I can take a noise spectral density measurement of the calibration. I'll still look into how I can do that. I'll have to add a low pass filter to integrate the signal. Then the noise spectrum up to the low pass pole frequency would be available. But what would this noise spectrum really mean? I still have to think a bit about it. I'll put another post soon.

Quote:

We attempted to simulate "oscillator based realtime calibration noise monitoring" in offline analysis with python. This helped us in finding about a factor of sqrt(2) that we were missing earlier in 16171. we measured C1:ALS-BEATX_FINE_PHASE_OUT_HZ_DQ when X-ARM was locked to main laser and Xend green laser was locked to XARM. An excitation signal of amplitude 600 was setn at 619 hz at C1:ITMX_LSC_EXC.

Signal analysis flow:

  • The C1:ALS-BEATX_FINE_PHASE_OUT_HZ_DQ is calibrated to give value of beatntoe frequency in Hz. But we are interested in the fluctuations of this value at the excitation frequency. So the beatnote signal is first high passed with 50 hz cut-off. This value can be reduced a lot more in realtime system. We only took 60s of data and had to remove first 2 seconds for removing transients so we didn't reduce this cut-off further.
  • The I and Q demodulated beatntoe signal is combined to get a complex beatnote signal amplitude at excitation frequency.
  • This signal is divided by cts amplitude of excitation and multiplied by square of excitation frequency to get calibration factor for ITMX in units of nm/cts/Hz^2.
  • The noise spectrum of absolute value of  the calibration factor is plotted in attachment 1, along with its RMS. The calibration factor was detrended linearly so the the DC value was removed before taking the spectrum.
  • So Attachment 1 is the spectrum of noise in calibration factor when measured with this method. The shaded region is 15.865% - 84.135% percentile region around the solid median curves.

We got a value of \frac{7.3 \pm 3.9}{f^2}\, \frac{nm}{cts}.  The calibration factor in use is from \frac{7.32}{f^2} nm/cts from 13984.

Next steps could be to budget this noise while we setup some way of having this calibration factor generated in realitime using oscillators on a FE model. Calibrating actuation of a single optic in a single arm is easy, so this is a good test setup for getting a noise budget of this calibration method.

 

Attachment 1: ITMX_calibration_With_ALS_Beat.pdf
ITMX_calibration_With_ALS_Beat.pdf
  16261   Tue Jul 27 23:04:37 2021 AnchalUpdateLSC40 meter party

[ian, anchal, paco]

After our second attempt of locking PRFPMI tonight, we tried to resotre XARM and YARM locks to IR by clicking on IFO_CONFIGURE>Restore XARM (POX) and IFO_CONFIGURE>Restore YARM (POY) but the arms did not lock. The green lasers were locked to the arms at maximum power, so the relative alignments of each cavity was ok. We were also able to lock PRMI using IFO_CONFIGURE>Restore PRMI carrier.

This was very weird to us. We were pretty sure that the aligment is correct, so we decided to cehck the POX POY signal chain. There was essentially no signal coming at POX11 and there was a -100 offset on it. We could see some PDH signal on POY11 but not enough to catch the locks.

We tried running IFO_CONFIGURE>LSC OFFSETS to cancel out any dark current DC offsets. The changes made by the script are shown in attachment 1.

We went to check the tables and found no light visible on beam finder cards on POX11 or POY11. We found that ITMX was stuck on one of the coils. We unstuck it using the shaking method. The OPLEVs on ITMX after this could not be switched on as the OPLEV servo were railing to limits. But when we ran Restore XARM (POX) again, they started working fine. Something is done by this script that we are not aware of.

We're stopping here. We still can not lock any of the single arms.


Wed Jul 28 11:19:00 2021 Update:

[gautam, paco]

Gautam found that the restoring of POX/POY failed to restore the whitening filter gains in POX11 / POY11. These are meant to be restored to 30 dB and 18 dB for POX11 and POY11 respectively but were set to 0 dB in detriment of any POX/POY triggering/locking. The reason these are lowered is to avoid saturating the speakers during lock acquisition. Yesterday, burt-restore didn't work because we restored the c1lscepics.snap but said gains are actually in c1lscaux.snap. After manually restoring the POX11 and POY11 whitening filter gains, gautam ran the LSCOffsets script. The XARM and YARM were able to quickly lock after we restored these settings.

The root of our issue may be that we didn't run the CARM & DARM watch script (which can be accessed from the ALS/Watch Scripts in medm). Gautam added a line on the Transition_IR_ALS.py script to run the watch script instead.

Attachment 1: Screenshot_2021-07-27_22-19-58.png
Screenshot_2021-07-27_22-19-58.png
  16264   Wed Jul 28 17:10:24 2021 AnchalUpdateLSCSchnupp asymmetry

[Anchal, Paco]

I redid the measurement of Schnupp asymmetry today and found it to be 3.8 cm \pm 0.9 cm.


Method

  • One of the arms is misalgined both at ITM and ETM.
  • The other arm is locked and aligned using ASS.
  • The SRCL oscillator's output is changed to the ETM of the chosen arm.
  • The AS55_Q channel in demodulation of SRCL oscillator is configured (phase corrected) so that all signal comes in C1:CAL-SENSMAT_SRCL_AS55_Q_DEMOD_I_OUT.
  • The rotation angle of AS55 RFPD is scanned and the C1:CAL-SENSMAT_SRCL_AS55_Q_DEMOD_I_OUT is averaged over 10s after waiting for 5s to let the transients pass.
  • This data is used to find the zero crossing of AS55_Q signal when light is coming from one particular arm only.
  • The same is repeated for the other arm.
  • The difference in the zero crossing phase angles is twice the phase accumulated by a 55 MHz signal in travelling the length difference between the arm cavities i.e. the Schnupp Asymmetry.

I measured a phase difference of 5 \pm1 degrees between the two paths.

The uncertainty in this measurement is much more than gautam's 15956 measurement. I'm not sure yet why, but would look into it.

 

Quote:

I used the Valera technique to measure the Schnupp asymmetry to be \approx 3.5 \, \mathrm{cm}, see Attachment #1. The data points are points, and the zero crossing is estimated using a linear fit. I repeated the measurement 3 times for each arm to see if I get consistent results - seems like I do. Subtle effects like possible differential detuning of each arm cavity (since the measurement is done one arm at a time) are not included in the error analysis, but I think it's not controversial to say that our Schnupp asymmetry has not changed by a huge amount from past measurements. Jamie set a pretty high bar with his plot which I've tried to live up to. 

 

Attachment 1: Lsch.pdf
Lsch.pdf
  16268   Tue Aug 3 20:20:08 2021 AnchalUpdateOptical LeversRecentered ETMX, ITMX and ETMY oplevs at good state

Late elog. Original time 08/02/2021 21:00.

I locked both arms and ran ASS to reach to optimum alignment. ETMY PIT > 10urad, ITMX P > 10urad and ETMX P < -10urad. Everything else was ok absolute value less than 10urad. I recentered these three.

Than I locked PRMI, ran ASS on PRCL and MICH and checked BS and PRM alignment. They were also less than absolute value 10urad.

  16270   Thu Aug 5 14:59:31 2021 AnchalUpdateGeneralAdded temperature sensors at Yend and Vertex too

I've added the other two temperature sensor modules on Y end (on 1Y4, IP: 192.168.113.241) and in the vertex on (1X2, IP: 192.168.113.242). I've updated the martian host table accordingly. From inside martian network, one can go to the browser and go to the IP address to see the temperature sensor status . These sensors can be set to trigger alarm and send emails/sms etc if temperature goes out of a defined range.

I feel something is off though. The vertex sensor shows temperature of ~28 degrees C, Xend says 20 degrees C and Yend says 26 degrees C. I believe these sensors might need calibration.

Remaining tasks are following:

  • Modbus TCP solution:
    • If we get it right, this will be easiest solution.
    • We just need to add these sensors as streaming devices in some slow EPICS machine in there .cmd file and add the temperature sensing channels in a corresponding database file.
  • Python workaround:
    • Might be faster but dirty.
    • We run a python script on megatron which requests temperature values every second or so from the IP addresses and write them on a soft EPICs channel.
    • We still would need to create a soft EPICs channel fro this and add it to framebuilder data acquisition list.
    • Even shorted workaround for near future could be to just write temperature every 30 min to a log file in some location.

[anchal, paco]

We made a script under scripts/PEM/temp_logger.py and ran it on megatron. The script uses the requests package to query the latest sensor data from the three sensors every 10 minutes as a json file and outputs accordingly. This is not a permanent solution.

  16271   Fri Aug 6 13:13:28 2021 AnchalUpdateBHDc1teststand subnetwork now accessible remotely

c1teststand subnetwork is now accessible remotely. To log into this network, one needs to do following:

  • Log into nodus or pianosa. (This will only work from these two computers)
  • ssh -CY controls@192.168.113.245
  • Password is our usual workstation password.
  • This will log you into c1teststand network.
  • From here, you can log into fb1, chiara, c1bhd and c1sus2  which are all part of the teststand subnetwork.

Just to document the IT work I did, doing this connection was bit non-trivial than usual.

  • The martian subnetwork is created by a NAT router which connects only nodus to outside GC network and all computers within the network have ip addresses 192.168.113.xxx with subnet mask of 255.255.255.0.
  • The cloned test stand network was also running on the same IP address scheme, mostly because fb1 and chiara are clones in this network. So every computer in this network also had ip addresses 192.168.113.xxx.
  • I setup a NAT router to connect to martian network forwarding ssh requests to c1teststand computer. My NAT router creates a separate subnet with IP addresses 10.0.1.xxx and suubnet mask 255.255.255.0 gated through 10.0.1.1.
  • However, the issue is for c1teststand, there are now two networks accessible which have same IP addresses 192.168.113.xxx. So when you try to do ssh, it always search in its local c1teststand subnetwork instead of routing through the NAT router to the martian network.
  • To work around this, I had to manually provide an ip router to c1teststand for connecting to two of the computers (nodus and pianosa) in martian network. This is done by:
    ip route add 192.168.113.200 via 10.0.1.1 dev eno1
    ip route add 192.168.113.216 via 10.0.1.1 dev eno1
  • This gives c1teststand specific path for ssh requests to/from these computers in the martian network.
  16273   Mon Aug 9 10:38:48 2021 AnchalUpdateBHDc1teststand subnetwork now accessible remotely

I had to add following two lines in the /etc/network/interface file to make the special ip routes persistent even after reboot:

post-up ip route add 192.168.113.200 via 10.0.1.1 dev eno1
post-up ip route add 192.168.113.216 via 10.0.1.1 dev eno1

  16282   Wed Aug 18 20:30:12 2021 AnchalUpdateASSFixed runASS scripts

Late elog: Original time of work Tue Aug 17 20:30 2021


I locked the arms yesterday remotely and tried running runASS.py scripts (generally ran by clicking Run ASS buttons on IFO OVERVIEW screen of ASC screen). We have known for few weeks that this script stopped working for some reason. It would start the dithering and would optimize the alignment but then would fail to freeze the state and save the alignment.

I found the caget('C1:LSC-TRX_OUT') or caget('C1:LSC-TRY_OUT') were not working in any of the workstations. This is weird since caget was able to acquire these fast channel values earlier and we have seen this script to work for about a month without any issue.

Anyways, to fix this, I just changed the channel name to 'C1:LSC-TRY_OUT16' when the script checks in the end if the arm has indeed been aligned. It was only this step that was failing. Now the script is working fine and I tested them on both arms. On the Y arm, I misaligned the arm by adding bias in yaw by changing C1:SUS-ITMY_YAW_OFFSET from -8 to 22. The script was able to align the arm back.

  16283   Thu Aug 19 03:23:00 2021 AnchalUpdateCDSTime synchornization not running

I tried to read a bit and understand the NTP synchronization implementation in FE computers. I'm quite sure that NTP synchronization should be 'yes' if timesyncd are running correctly in the output of timedatectl in these computers. As Koji reported in 15791, this is not the case. I logged into c1lsc, c1sus and c1ioo and saw that RTC has drifted from the software clocks too which does not happen if NTP synchronization was active. This would mean that almost certainly, if the computers are rebooted, the synchronization will be lost and the models will fail to come online.

My current findings are the following (this should be documented in wiki once we setup everything):

  • nodus is running a NTP server using chronyd. One can check the configuration of this NTP serer in /etc/chornyd.conf
  • fb1 is running an NTP server using ntpd that follows nodus and an IP address 131.215.239.14. This can be seen in /etc/ntp.conf.
  • There are no comments to describe what this other server (131.215.239.14) is. Does the GC network have an NTP server too?
  • c1lsc, c1sus and c1ioo all have systemd-timesyncd.service running with configuration file in /etc/systemd/timesyncd.conf.
  • The configuration file set Servers=ntpserver but echo $ntpserver produces nothing (blank) on these computers and I've been unable to find anyplace where ntpserver is defined.
  • In chiara (our name server), the name server file /etc/hosts does not have any entry for ntpserver either.
  • I think the problem might be that these computers are unable to find the ntpserver as it is not defined anywhere.

The solution to this issue could be as simple as just defining ntpserver in the name server list. But I'm not sure if my understanding of this issue is correct. Comments/suggestions are welcome for future steps.

 

  16285   Fri Aug 20 00:28:55 2021 AnchalUpdateCDSTime synchornization not running

I added ntpserver as a known host name for address 192.168.113.201 (fb1's address where ntp server is running) in the martian host list in the following files in Chiara:

/var/lib/bind/martian.hosts
/var/lib/bind/rev.113.168.192.in-addr.arpa

Note: a host name called ntp was already defined at 192.168.113.11 but I don't know what computer this is.

Then, I restarted the DNS on chiara by doing:

sudo service bind9 restart

Then I logged into c1lsc and c1ioo and ran following:

controls@c1ioo:~ 0$ sudo systemctl restart systemd-timesyncd.service

controls@c1ioo:~ 0$ sudo systemctl status systemd-timesyncd.service -l
● systemd-timesyncd.service - Network Time Synchronization
   Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; enabled)
   Active: active (running) since Fri 2021-08-20 07:24:03 UTC; 53s ago
     Docs: man:systemd-timesyncd.service(8)
 Main PID: 23965 (systemd-timesyn)
   Status: "Idle."
   CGroup: /system.slice/systemd-timesyncd.service
           └─23965 /lib/systemd/systemd-timesyncd

Aug 20 07:24:03 c1ioo systemd[1]: Starting Network Time Synchronization...
Aug 20 07:24:03 c1ioo systemd[1]: Started Network Time Synchronization.
Aug 20 07:24:03 c1ioo systemd-timesyncd[23965]: Using NTP server 192.168.113.201:123 (ntpserver).
Aug 20 07:24:35 c1ioo systemd-timesyncd[23965]: Using NTP server 192.168.113.201:123 (ntpserver).
controls@c1ioo:~ 0$ timedatectl
      Local time: Fri 2021-08-20 07:25:28 UTC
  Universal time: Fri 2021-08-20 07:25:28 UTC
        RTC time: Fri 2021-08-20 07:25:31
       Time zone: Etc/UTC (UTC, +0000)
     NTP enabled: yes
NTP synchronized: no
 RTC in local TZ: no
      DST active: n/a

The same output is shown in c1lsc too. The NTP synchronized flag in output of timedatectl command did not change to yes and the RTC is still 3 seconds ahead of the local clock.

Then I went to c1sus to see what was the status output before rstarting the timesyncd service. I got folloing output:

controls@c1sus:~ 0$ sudo systemctl status systemd-timesyncd.service -l
● systemd-timesyncd.service - Network Time Synchronization
   Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; enabled)
   Active: active (running) since Tue 2021-08-17 04:38:03 UTC; 3 days ago
     Docs: man:systemd-timesyncd.service(8)
 Main PID: 243 (systemd-timesyn)
   Status: "Idle."
   CGroup: /system.slice/systemd-timesyncd.service
           └─243 /lib/systemd/systemd-timesyncd

Aug 20 02:02:18 c1sus systemd-timesyncd[243]: Using NTP server 192.168.113.201:123 (ntpserver).
Aug 20 02:36:27 c1sus systemd-timesyncd[243]: Using NTP server 192.168.113.201:123 (ntpserver).
Aug 20 03:10:35 c1sus systemd-timesyncd[243]: Using NTP server 192.168.113.201:123 (ntpserver).
Aug 20 03:44:43 c1sus systemd-timesyncd[243]: Using NTP server 192.168.113.201:123 (ntpserver).
Aug 20 04:18:51 c1sus systemd-timesyncd[243]: Using NTP server 192.168.113.201:123 (ntpserver).
Aug 20 04:53:00 c1sus systemd-timesyncd[243]: Using NTP server 192.168.113.201:123 (ntpserver).
Aug 20 05:27:08 c1sus systemd-timesyncd[243]: Using NTP server 192.168.113.201:123 (ntpserver).
Aug 20 06:01:16 c1sus systemd-timesyncd[243]: Using NTP server 192.168.113.201:123 (ntpserver).
Aug 20 06:35:24 c1sus systemd-timesyncd[243]: Using NTP server 192.168.113.201:123 (ntpserver).
Aug 20 07:09:33 c1sus systemd-timesyncd[243]: Using NTP server 192.168.113.201:123 (ntpserver).

This actually shows that the service was able to find ntpserver correctly at 192.168.113.201 even before I changed the name server file in chiara. So I'm retracting the changes made to name server. They are probably not required.

The configuration files for timesynd.conf are read only even with sudo. I tried changing permissions but that did not work either. Maybe these files are not correctly configured. The man page of timesyncd  says to use field 'NTP' to give the ntp servers. Our files are using field 'Servers'. But since we are not getting any error message, I don't think this is the issue here.

I'll look more into this problem.

  16286   Fri Aug 20 06:24:18 2021 AnchalUpdateCDSTime synchornization not running

I read on some stack exchange that 'NTP synchornized' indicator turns 'yes' in the output of command timedatectl only when RTC clock has been adjusted at some point. I also read that timesyncd does not do the change if the time difference is too much, roughly more than 3 seconds.

So I logged into all FE machines and ran sudo hwclock -w to synchronize them all to the system clocks and then waited if the timesyncd does any correction on RTC. It did not. A few hours later, I found the RTC clocks drifitng again from the system clocks. So even if the timesynd service is running as it should, it si not performing time correction for whatever reason.

Maybe we should try to use some other service?

Quote:
 

The NTP synchronized flag in output of timedatectl command did not change to yes and the RTC is still 3 seconds ahead of the local clock.

 

  16291   Mon Aug 23 22:51:44 2021 AnchalUpdateGeneralTime synchronization efforts

Related elog thread: 16286


I didn't really achieve anything but I'm listing what I've tried.

  • I know now that the timesyncd isn't working because systemd-timesyncd is known to have issues when running on a read-only file system. In particular, the service does not have privileges to change the clock or drift settings at /run/systemd/clock or /etc/adjtime.
  • The workarounds to these problems are poorly rated/reviews in stack exchange and require me to change the /etc/systmd/timesyncd.conf file but I'm unable to edit this file.
  • I know that Paco was able to change these files earlier as the files are now changed and configured to follow a debian ntp pool server which won't work as the FEs do not have internet access. So the conf file needs to be restored to using ntpserver as the ntp server.
  • From system messages, the ntpserver is recognized by the service as shown in the second part of 16285. I really think the issue is in file permissions. the file /etc/adjtime has never been updated since 2017.
  • I got help from Paco on how to edit files for FE machines. The FE machines directories are exported from fb1:/diskless/root.jessie/
  • I restored the /etc/systmd/timesyncd.conf file to how it as before with just servers=ntpserver line. Restarted timesyncd service on all FEs,I tried a few su the synchronization did not happen.
  • I tried a few suggestions from stackexchange but none of them worked. The only rated solution creates a tmpfs directory outside of read-only filesystem and uses that to run timesyncd. So, in my opinion, timesyncd  would never work in our diskless read-only file system FE machines.
  • One issue in an archlinux discussion ended by the questioner resorting to use opennptd from openBSD distribution. The user claimed that opennptd is simple enough that it can run ntp synchornization on a read-only file system.
  • Somehwat painfully, I 'kind of' installed the openntpd tool in the fb1:/diskless/root.jessie directory following directions from here. I had to manually add user group and group for the FEs (which I might not have done correctly). I was not able to get the openntpd daemon to start properly after soe tries.
  • I restored everything back to how it was and restarted timesyncd in c1sus even though it would not do anything really.
Quote:

This time no matter how we try to set the time, the IOPs do not run with "DC status" green. (We kept having 0x4000)

 

  16292   Tue Aug 24 09:22:48 2021 AnchalUpdateGeneralTime synchronization working now

Jamie told me to use chroot to log in into the chroot jail of debian os that are exported for the FEs and install ntp there. I took following steps at the end of which, all FEs have NTP synchronized now.

  • I logged into fb1 through nodus.
  • chroot /diskless/root.jessie /bin/bash took me to the bash terminal for debian os that is exported to all FEs.
  • Here, I ran sudo apt-get install ntp which ran without any errors.
  • I then edited the file in /etc/ntp.conf , i removed the default servers and added following lines for servers (fb1 and nodus ip addresses):
    server 192.113.168.201
    server 192.113.168.201
  • I logged into each FE machine and ran following commands:
    sudo systemctl stop systemd-timesyncd.service; sudo systemctl status systemd-timesyncd.service;
    timedatectl; sleep 2;sudo systemctl daemon-reload;  sudo systemctl start ntp; sleep 2; sudo systemctl status ntp; timedatectl
    sudo hwclock -s
    • The first line ensures that systemd-timesyncd.service is not running anymore. I did not uninstall timesyncd and left its configuration file as it is.
    • The second line first shows the times of local and RTC clocks. Then reloads the daemon services to get ntp registered. Then starts ntp.service and shows it's status. Finally, the timedatectl command shows the synchronized clocks and that NTP synchronization has occured.
    • The last line sets the local clock same as RTC clock. Even though this wasn't required as I saw that the clocks were already same to seconds, I just wanted a point where all the local clocks are synchronized to the ntp server.
  • Hopefully, this would resolve our issue of restarting the models anytime some glitch happens or when we need ot update something in one of them.

Edit Tue Aug 24 10:19:11 2021:

I also disabled timesyncd on all FEs using sudo systemctl disable systemd-timesyncd.service

I've added this wiki page for summarizing the NTP synchronization knowledge.

  16295   Tue Aug 24 22:37:40 2021 AnchalUpdateGeneralTime synchronization not really working

I attempted to install chrony and run it on one of the FE machines. It didn't work and in doing so, I lost the working NTP client service on the FE computers as well. Following are some details:

  • I added the following two mirrors in the apt source list of root.jessie at /etc/apt/sources.list
    deb http://ftp.us.debian.org/debian/ jessie main contrib non-free
    deb-src http://ftp.us.debian.org/debian/ jessie main contrib non-free
  • Then I installed chrony in the root.jessie using
    sudo apt-get install chrony
    • I was getting an error E: Can not write log (Is /dev/pts mounted?) - posix_openpt (2: No such file or directory) . To fix this, I had to run:
      sudo mount -t devpts none "$rootpath/dev/pts" -o ptmxmode=0666,newinstance
      sudo ln -fs "pts/ptmx" "$rootpath/dev/ptmx"
    • Then, I had another error to resolve.
      Failed to read /proc/cmdline. Ignoring: No such file or directory
      start-stop-daemon: nothing in /proc - not mounted?
      To fix this, I had to exit to fb1 and run:
      sudo mount --bind /proc /diskless/root.jessie/proc
    • With these steps, chrony was finally installed, but I immediately saw an error message saying:
      Starting /usr/sbin/chronyd...
      Could not open NTP sockets
  • I figured this must be due to ntp running in the FE machines.  I logged into c1iscex and stopped and disabled the ntp service:
    sudo systemctl stop ntp
    sudo systemctl disable ntp
    • I saw some error messages from the above coomand as FEs are read only file systems:
      Synchronizing state for ntp.service with sysvinit using update-rc.d...
      Executing /usr/sbin/update-rc.d ntp defaults
      insserv: fopen(.depend.stop): Read-only file system
      Executing /usr/sbin/update-rc.d ntp disable
      update-rc.d: error: Read-only file system
    • So I went back to chroot in fb1 and ran the two command sabove that failed:
      /usr/sbin/update-rc.d ntp defaults
      /usr/sbin/update-rc.d ntp disable
    • The last line gave the output:
      insserv: warning: current start runlevel(s) (empty) of script `ntp' overrides LSB defaults (2 3 4 5).
      insserv: warning: current stop runlevel(s) (2 3 4 5) of script `ntp' overrides LSB defaults (empty).
    • I igored this and moved forward.
  • I copied the chronyd.service from nodus to the chroot in fb1 and configured it to use nodus as the server. The I started the chronyd.service

    sudo systemctl status chronyd.service
    but got the saem issue of NTP sockets.

    â—Â chronyd.service - NTP client/server
       Loaded: loaded (/usr/lib/systemd/system/chronyd.service; disabled)
       Active: failed (Result: exit-code) since Tue 2021-08-24 21:52:30 PDT; 5s ago
      Process: 790 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=1/FAILURE)

    Aug 24 21:52:29 c1iscex systemd[1]: Starting NTP client/server...
    Aug 24 21:52:30 c1iscex chronyd[790]: Could not open NTP sockets
    Aug 24 21:52:30 c1iscex systemd[1]: chronyd.service: control process exited, code=exited status=1
    Aug 24 21:52:30 c1iscex systemd[1]: Failed to start NTP client/server.
    Aug 24 21:52:30 c1iscex systemd[1]: Unit chronyd.service entered failed state.

  • I tried a few things to resolve this, but couldn't get it to work. So I gave up on using chrony and decided to go back to ntp service atleast.

  • I stopped, disabled and checked status of chrony:
    sudo systemctl stop chronyd
    sudo systemctl disable chronyd
    sudo systemctl status chronyd
    This gave the output:

    â—Â chronyd.service - NTP client/server
       Loaded: loaded (/usr/lib/systemd/system/chronyd.service; disabled)
       Active: failed (Result: exit-code) since Tue 2021-08-24 22:09:07 PDT; 25s ago

    Aug 24 22:09:07 c1iscex systemd[1]: Starting NTP client/server...
    Aug 24 22:09:07 c1iscex chronyd[2490]: Could not open NTP sockets
    Aug 24 22:09:07 c1iscex systemd[1]: chronyd.service: control process exited, code=exited status=1
    Aug 24 22:09:07 c1iscex systemd[1]: Failed to start NTP client/server.
    Aug 24 22:09:07 c1iscex systemd[1]: Unit chronyd.service entered failed state.
    Aug 24 22:09:15 c1iscex systemd[1]: Stopped NTP client/server.

  • I went back to fb1 chroot and removed chrony package and deleted the configuration files and systemd service files:
    sudo apt-get remove chrony

  • But when I started ntp daemon service back in c1iscex, it gave error:
    sudo systemctl restart ntp
    Job for ntp.service failed. See 'systemctl status ntp.service' and 'journalctl -xn' for details.

  • Status shows:

    sudo systemctl status ntp
    â—Â ntp.service - LSB: Start NTP daemon
       Loaded: loaded (/etc/init.d/ntp)
       Active: failed (Result: exit-code) since Tue 2021-08-24 22:09:56 PDT; 9s ago
      Process: 2597 ExecStart=/etc/init.d/ntp start (code=exited, status=5)

    Aug 24 22:09:55 c1iscex systemd[1]: Starting LSB: Start NTP daemon...
    Aug 24 22:09:56 c1iscex systemd[1]: ntp.service: control process exited, code=exited status=5
    Aug 24 22:09:56 c1iscex systemd[1]: Failed to start LSB: Start NTP daemon.
    Aug 24 22:09:56 c1iscex systemd[1]: Unit ntp.service entered failed state.

  • I tried to enable back the ntp service by sudo systemctl enable ntp. I got similar error messages of read only filesystem as earlier.
    Synchronizing state for ntp.service with sysvinit using update-rc.d...
    Executing /usr/sbin/update-rc.d ntp defaults
    insserv: warning: current start runlevel(s) (empty) of script `ntp' overrides LSB defaults (2 3 4 5).
    insserv: warning: current stop runlevel(s) (2 3 4 5) of script `ntp' overrides LSB defaults (empty).
    insserv: fopen(.depend.stop): Read-only file system
    Executing /usr/sbin/update-rc.d ntp enable
    update-rc.d: error: Read-only file system

    • I went back to chroot in fb1 and ran:
      /usr/sbin/update-rc.d ntp defaults
      insserv: warning: current start runlevel(s) (empty) of script `ntp' overrides LSB defaults (2 3 4 5).
      insserv: warning: current stop runlevel(s) (2 3 4 5) of script `ntp' overrides LSB defaults (empty).
      and
      /usr/sbin/update-rc.d ntp enable

  • I came back to c1iscex and tried restarting the ntp service but got same error messages as above with exit code 5.

  • I checked c1sus, the ntp was running there. I tested the configuration by restarting the ntp service, and then it failed with same error message. So the remaining three FEs, c1lsc, c1ioo and c1iscey have running ntp service, but they won't be able to restart.

  • As a last try, I rebooted c1iscex to see if ntp comes back online nicely, but it doesn't.

Bottom line, I went to try chrony in the FEs, and I ended up breaking the ntp client services on the computers as well. We have no NTP synchronization in any of the FEs.

Even though Paco and I are learning about the ntp and cds stuff, I think it's time we get help from someone with real experience. The lab is not in a good state for far too long.

Quote:

tl;dr: NTP servers and clients were never synchronized, are not synchronizing even with ntp... nodus is synchronized but uses chronyd; should we use chronyd everywhere?

 

  16322   Mon Sep 13 15:14:36 2021 AnchalUpdateLSCXend Green laser injection mirrors M1 and M2 not responsive

I was showing some green laser locking to Tega, I noticed that changing the PZT sliders of M1/M2 angular position on Xend had no effect on locked TEM01 or TEM00 mode. This is odd as changing these sliders should increase or decrease the mode-matching of these modes. I suspect that the controls are not working correctly and the PZTs are either not powered up or not connected. We'll investigate this in near future as per priority.

  16330   Tue Sep 14 17:22:21 2021 AnchalUpdateCDSAdded temp sensor channels to DAQ list

[Tega, Paco, Anchal]

We attempted to reboot fb1 daqd today to get the new temperature sensor channels recording. However, the FE models got stuck, apparantely due to reasons explaine din 40m/16325. Jamie cleared the /var/logs in fb1 so that FE can reboot. We were able to reboot the FE machines after this work successfully and get the models running too. During the day, the FE machines were shut down manually and brought back on manually, a couple of times on the c1iscex machine. Only change in fb1 is in the /opt/rtcds/caltech/c1/chans/daq/C0EDCU.ini where the new channels were added, and some hacking was done by Jamie in gpstime module (See 40m/16327).

  16337   Thu Sep 16 10:07:25 2021 AnchalUpdateGeneralMelting 2

Put outside.

Quote:

It happened again. Defrosting required.

 

Attachment 1: PXL_20210916_170602832.jpg
PXL_20210916_170602832.jpg
  16340   Thu Sep 16 20:18:13 2021 AnchalUpdateGeneralReset

Fridge brought back inside.

Quote:

Put outside.

Quote:

It happened again. Defrosting required.

 

 

Attachment 1: PXL_20210917_031633702.jpg
PXL_20210917_031633702.jpg
  16351   Tue Sep 21 11:09:34 2021 AnchalSummaryCDSXARM YARM UGF Servo and Oscillators added

I've updated the c1LSC simulink model to add the so-called UGF servos in the XARM and YARM single arm loops as well. These were earlier present in DARM, CARM, MICH and PRCL loops only. The UGF servo themselves serves a larger purpose but we won't be using that. What we have access to now is to add an oscillator in the single arm and get realtime demodulated signal before and after the addition of the oscillator. This would allow us to get the open loop transfer function and its uncertaintiy at particular frequencies (set by the oscillator) and would allow us to create a noise budget on the calibration error of these transfer functions.

 

The new model has been committed locally in the 40m/RTCDSmodels git repo. I do not have rights to push to the remote in git.ligo. The model builds, installs and starts correctly.

  16354   Wed Sep 22 12:40:04 2021 AnchalSummaryCDSXARM YARM UGF Servo and Oscillators shifted to OAF

To reduce burden on c1lsc, I've shifted the added UGF block to to c1oaf model. c1lsc had to be modified to allow addition of an oscillator in the XARm and YARM control loops and take out test points before and after the addition to c1oaf through shared memory IPC to do realtime demodulation in c1oaf model.

The new models built and installed successfully and I've been able to recover both single arm locks after restarting the computers.

 

  16365   Wed Sep 29 17:10:09 2021 AnchalSummaryCDSc1teststand problems summary

[anchal, ian]

We went and collected some information for the overlords to fix the c1teststand DAQ network issue.


  • from c1teststand, c1bhd and c1sus2 computers were not accessible through ssh. (No route to host). So we restarted both the computers (the I/O chassis were ON).
  • After the computers restarted, we were able to ssh into c1bhd and c1sus, ad we ran rtcds start c1x06 and rtcds start c1x07.
  • The first page in attachment shows the screenshot of GDS_TP screens of the IOP models after this step.
  • Then we started teh user models by running rtcds start c1bhd and rtcds start c1su2.
  • The second page shows the screenshot of GDS_TP screens. You can notice that DAQ status is red in all the screens and the DC statuses are blank.
  • So we checked if daqd_ services are running in the fb computer. They were not. So we started them all by sudo systemctl start daqd_*.
  • Third page shows the status of all services after this step. the daqd_dc.service remained at failed state.
  • open-mx_stream.service was not even loaded in fb. We started it by running sudo systemctl start open-mx_stream.service.
  • The fourth page shows the status of this service. It started without any errors.
  • However, when we went to check the status of mx_stream.service in c1bhd and c1sus2, they were not loaded and we we tried to start them, they showed failed state and kept trying to start every 3 seconds without success. (See page 5 and 6).
  • Finally, we also took a screenshot of timedatectl command output on the three computers fb, c1bhd, and c1sus2 to show that their times were not synced at all.
  • The ntp service is running on fb but it probably does not have access to any of the servers it is following.
  • The timesyncd on c1bhd and c1sus2 (FE machines) is also running but showing status 'Idle' which suggested they are unable to find the ntp signal from fb.
  • I believe this issue is similar to what jamie ficed in the fb1 on martian network in 40m/16302. Since the fb on c1teststand network was cloned before this fix, it might have this dysfunctional ntp as well.

We would try to get internet access to c1teststand soon. Meanwhile, someone with more experience and knowledge should look into this situation and try to fix it. We need to test the c1teststand within few weeks now.

Attachment 1: c1teststand_issues_summary.pdf
c1teststand_issues_summary.pdf c1teststand_issues_summary.pdf c1teststand_issues_summary.pdf c1teststand_issues_summary.pdf c1teststand_issues_summary.pdf c1teststand_issues_summary.pdf c1teststand_issues_summary.pdf
  16367   Thu Sep 30 14:09:37 2021 AnchalSummaryCDSNew way to ssh into c1teststand

Late elog, original time Wed Sep 29 14:09:59 2021

We opened a new port (22220) in the router to the martian subnetwork which is forwarded to port 22 on c1teststand (192.168.113.245) allowing direct ssh access to c1teststand computer from the outside world using:

                                                                       

                                                                                    
 

Checkout this wiki page for unredadcted info.

  16368   Thu Sep 30 14:13:18 2021 AnchalUpdateLSCHV supply to Xend Green laser injection mirrors M1 and M2 PZT restored

Late elog, original date Sep 15th

We found that the power switch of HV supply that powers the PZT drivers for M1 and M2 on Xend green laser injection alignment was tripped off. We could not find any log of someone doing it, it is a physical switch. Our only explanation is that this supply might have a solenoid mechansm to shut off during power glitches and it probably did so on Aug 23 (see 40m/16287). We were able to align the green laser using PZT again, however, the maximum power at green transmission from X arm cavity is now about half of what it used to be before the glitch. Maybe the seed laser on the X end died a little.

  16372   Mon Oct 4 11:05:44 2021 AnchalSummaryCDSc1teststand problems summary

[Anchal, Paco]

We tried to fix the ntp synchronization in c1teststand today by repeating the steps listed in 40m/16302. Even though teh cloned fb1 now has the exact same package version, conf & service files, and status, the FE machines (c1bhd and c1sus2) fail to sync to the time. the timedatectl shows the same stauts 'Idle'. We also, dug bit deeper into the error messages of daq_dc on cloned fb1 and mx_stream on FE machines and have some error messages to report here.


Attempt on fixing the ntp

  • We copied the ntp package version 1:4.2.6 deb file from /var/cache/apt/archives/ntp_1%3a4.2.6.p5+dfsg-7+deb8u3_amd64.deb on the martian fb1 to the cloned fb1 and ran.
    controls@fb1:~ 0$ sudo dbpg -i ntp_1%3a4.2.6.p5+dfsg-7+deb8u3_amd64.deb
  • We got error messages about missing dependencies of libopts25 and libssl1.1. We downloaded oldoldstable jessie versions of these packages from here and here. We ensured that these versions are higher than the required versions for ntp. We installed them with:
    controls@fb1:~ 0$ sudo dbpg -i libopts25_5.18.12-3_amd64.deb 
    controls@fb1:~ 0$ sudo dbpg -i libssl1.1_1.1.0l-1~deb9u4_amd64.deb
  • Then we installed the ntp package as described above. It asked us if we want to keep the configuration file, we pressed Y.
  • However, we decided to make the configuration and service files exactly same as martian fb1 to make it same in cloned fb1. We copied /etc/ntp.conf and /etc/systemd/system/ntp.service files from martian fb1 to cloned fb1 in the same positions. Then we enabled ntp, reloaded the daemon, and restarted ntp service:
    controls@fb1:~ 0$ sudo systemctl enable ntp
    controls@fb1:~ 0$ sudo systemctl daemon-reload
    controls@fb1:~ 0$ sudo systemctl restart ntp
  • But ofcourse, since fb1 doesn't have internet access, we got some errors in status of the ntp.service:
    controls@fb1:~ 0$ sudo systemctl status ntp
    ● ntp.service - NTP daemon (custom service)
       Loaded: loaded (/etc/systemd/system/ntp.service; enabled)
       Active: active (running) since Mon 2021-10-04 17:12:58 UTC; 1h 15min ago
     Main PID: 26807 (code=exited, status=0/SUCCESS)
       CGroup: /system.slice/ntp.service
               ├─30408 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 105:107
               └─30525 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 105:107
    
    Oct 04 17:48:42 fb1 ntpd_intres[30525]: host name not found: 2.debian.pool.ntp.org
    Oct 04 17:48:52 fb1 ntpd_intres[30525]: host name not found: 3.debian.pool.ntp.org
    Oct 04 18:05:05 fb1 ntpd_intres[30525]: host name not found: 0.debian.pool.ntp.org
    Oct 04 18:05:15 fb1 ntpd_intres[30525]: host name not found: 1.debian.pool.ntp.org
    Oct 04 18:05:25 fb1 ntpd_intres[30525]: host name not found: 2.debian.pool.ntp.org
    Oct 04 18:05:35 fb1 ntpd_intres[30525]: host name not found: 3.debian.pool.ntp.org
    Oct 04 18:21:48 fb1 ntpd_intres[30525]: host name not found: 0.debian.pool.ntp.org
    Oct 04 18:21:58 fb1 ntpd_intres[30525]: host name not found: 1.debian.pool.ntp.org
    Oct 04 18:22:08 fb1 ntpd_intres[30525]: host name not found: 2.debian.pool.ntp.org
    Oct 04 18:22:18 fb1 ntpd_intres[30525]: host name not found: 3.debian.pool.ntp.org
  • But the ntpq command is giving the saem output as given by ntpq comman in martian fb1 (except for the source servers), that the broadcasting is happening in the same manner:
    controls@fb1:~ 0$ ntpq -p
         remote           refid      st t when poll reach   delay   offset  jitter
    ==============================================================================
     192.168.123.255 .BCST.          16 u    -   64    0    0.000    0.000   0.000
    
  • On the FE machines side though, the systemd-timesyncd are still unable to read the time signal from fb1 and show the status as idle:
    controls@c1bhd:~ 3$ timedatectl
          Local time: Mon 2021-10-04 18:34:38 UTC
      Universal time: Mon 2021-10-04 18:34:38 UTC
            RTC time: Mon 2021-10-04 18:34:38
           Time zone: Etc/UTC (UTC, +0000)
         NTP enabled: yes
    NTP synchronized: no
     RTC in local TZ: no
          DST active: n/a
    controls@c1bhd:~ 0$ systemctl status systemd-timesyncd -l
    ● systemd-timesyncd.service - Network Time Synchronization
       Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; enabled)
       Active: active (running) since Mon 2021-10-04 17:21:29 UTC; 1h 13min ago
         Docs: man:systemd-timesyncd.service(8)
     Main PID: 244 (systemd-timesyn)
       Status: "Idle."
       CGroup: /system.slice/systemd-timesyncd.service
               └─244 /lib/systemd/systemd-timesyncd
  • So the time synchronization is still not working. We expected the FE machined to just synchronize to fb1 even though it doesn't have any upstream ntp server to synchronize to. But that didn't happen.
  • I'm (Anchal) working on getting internet access to c1teststand computers.

Digging into mx_stream/daqd_dc errors:

  • We went and changed the Restart fileld in /etc/systemd/system/daqd_dc.service on cloned fb1 to 2. This allows the service to fail and stop restarting after two attempts. This allows us to see the real error message instead of the systemd error message that the service is restarting too often. We got following:
    controls@fb1:~ 3$ sudo systemctl status daqd_dc -l
    ● daqd_dc.service - Advanced LIGO RTS daqd data concentrator
       Loaded: loaded (/etc/systemd/system/daqd_dc.service; enabled)
       Active: failed (Result: exit-code) since Mon 2021-10-04 17:50:25 UTC; 22s ago
      Process: 715 ExecStart=/usr/bin/daqd_dc_mx -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.dc (code=exited, status=1/FAILURE)
     Main PID: 715 (code=exited, status=1/FAILURE)
    
    Oct 04 17:50:24 fb1 systemd[1]: Started Advanced LIGO RTS daqd data concentrator.
    Oct 04 17:50:25 fb1 daqd_dc_mx[715]: [Mon Oct  4 17:50:25 2021] Unable to set to nice = -20 -error Unknown error -1
    Oct 04 17:50:25 fb1 daqd_dc_mx[715]: Failed to do mx_get_info: MX not initialized.
    Oct 04 17:50:25 fb1 daqd_dc_mx[715]: 263596
    Oct 04 17:50:25 fb1 systemd[1]: daqd_dc.service: main process exited, code=exited, status=1/FAILURE
    Oct 04 17:50:25 fb1 systemd[1]: Unit daqd_dc.service entered failed state.
    
  • It seemed like the only thing daqd_dc process doesn't like is that mx_stream services are in failed state in teh FE computers. So we did the same process on FE machines to get the real error messages:
    controls@fb1:~ 0$ sudo chroot /diskless/root
    fb1:/ 0#
    fb1:/ 0# sudo nano /etc/systemd/system/mx_stream.service
    fb1:/ 0#
    fb1:/ 0# exit
  • Then I ssh'ed into c1bhd to see the error message on mx_stream service properly.
    controls@c1bhd:~ 0$ sudo systemctl daemon-reload
    controls@c1bhd:~ 0$ sudo systemctl restart mx_stream
    controls@c1bhd:~ 0$ sudo systemctl status mx_stream -l
    ● mx_stream.service - Advanced LIGO RTS front end mx stream
       Loaded: loaded (/etc/systemd/system/mx_stream.service; enabled)
       Active: failed (Result: exit-code) since Mon 2021-10-04 17:57:20 UTC; 24s ago
      Process: 11832 ExecStart=/etc/mx_stream_exec (code=exited, status=1/FAILURE)
     Main PID: 11832 (code=exited, status=1/FAILURE)
    
    Oct 04 17:57:20 c1bhd systemd[1]: Starting Advanced LIGO RTS front end mx stream...
    Oct 04 17:57:20 c1bhd systemd[1]: Started Advanced LIGO RTS front end mx stream.
    Oct 04 17:57:20 c1bhd mx_stream_exec[11832]: send len = 263596
    Oct 04 17:57:20 c1bhd mx_stream_exec[11832]: OMX: Failed to find peer index of board 00:00:00:00:00:00 (Peer Not Found in the Table)
    Oct 04 17:57:20 c1bhd mx_stream_exec[11832]: mx_connect failed Nic ID not Found in Peer Table
    Oct 04 17:57:20 c1bhd mx_stream_exec[11832]: c1x06_daq mmapped address is 0x7f516a97a000
    Oct 04 17:57:20 c1bhd mx_stream_exec[11832]: c1bhd_daq mmapped address is 0x7f516697a000
    Oct 04 17:57:20 c1bhd systemd[1]: mx_stream.service: main process exited, code=exited, status=1/FAILURE
    Oct 04 17:57:20 c1bhd systemd[1]: Unit mx_stream.service entered failed state.
    
  • c1sus2 shows the same error. I'm not sure I understand these errors at all. But they seem to have nothing to do with timing issuessurprise!

As usual, some help would be helpful

  16381   Tue Oct 5 17:58:52 2021 AnchalSummaryCDSc1teststand problems summary

open-mx service is running successfully on the fb1(clone), c1bhd and c1sus.

Quote:

I don't know anything about mx/open-mx, but you also need open-mx,don't you?


  16382   Tue Oct 5 18:00:53 2021 AnchalSummaryCDSc1teststand time synchronization working now

Today I got a new router that I used to connect the c1teststand, fb1 and chiara. I was able to see internet access in c1teststand and fb1, but not in chiara. I'm not sure why that is the case.

The good news is that the ntp server on fb1(clone) is working fine now and both FE computers, c1bhd and c1sus2 are succesfully synchronized to the fb1(clone) ntpserver. This resolves any possible timing issues in this DAQ network.

On running the IOP and user models however, I see the same errors are mentioned in 40m/16372. Something to do with:

Oct 06 00:47:56 c1sus2 mx_stream_exec[21796]: OMX: Failed to find peer index of board 00:00:00:00:00:00 (Peer Not Found in the Table)
Oct 06 00:47:56 c1sus2 mx_stream_exec[21796]: mx_connect failed Nic ID not Found in Peer Table
Oct 06 00:47:56 c1sus2 mx_stream_exec[21796]: c1x07_daq mmapped address is 0x7fa4819cc000
Oct 06 00:47:56 c1sus2 mx_stream_exec[21796]: c1su2_daq mmapped address is 0x7fa47d9cc000


Thu Oct 7 17:04:31 2021

I fixed the issue of chiara not getting internet. Now c1teststand, fb1 and chiara, all have internet connections. It was the issue of default gateway and interface and findiing the DNS. I have found the correct settings now.

  16385   Wed Oct 6 15:39:29 2021 AnchalSummarySUSPRM and BS Angular Actuation transfer function magnitude measurements

Note that your tests were done with the output matrix for BS and PRM in the compensated state as done in 40m/16374. The changes made there were supposed to clear out any coil actuation imbalance in the angular degrees of freedom.

  16391   Mon Oct 11 17:31:25 2021 AnchalSummaryCDSFixed mounting of mx devices in fb. daqd_dc is running now.
 
 

However, lspci | grep 'Myri' shows following output on both computers:

controls@fb1:/dev 0$ lspci | grep 'Myri'
02:00.0 Ethernet controller: MYRICOM Inc. Myri-10G Dual-Protocol NIC (rev 01)

Which means that the computer detects the card on PCie slot.

 

I tried to add this to /etc/rc.local to run this script at every boot, but it did not work. So for now, I'll just manually do this step everytime. Once the devices are loaded, we get:

controls@fb1:/etc 0$ ls /dev/*mx*
/dev/mx0  /dev/mx4  /dev/mxctl   /dev/mxp2  /dev/mxp6         /dev/ptmx
/dev/mx1  /dev/mx5  /dev/mxctlp  /dev/mxp3  /dev/mxp7
/dev/mx2  /dev/mx6  /dev/mxp0    /dev/mxp4  /dev/open-mx
/dev/mx3  /dev/mx7  /dev/mxp1    /dev/mxp5  /dev/open-mx-raw

The, restarting all daqd_ processes, I found that daqd_dc was running succesfully now. Here is the status:

controls@fb1:/etc 0$ sudo systemctl status daqd_* -l
● daqd_dc.service - Advanced LIGO RTS daqd data concentrator
   Loaded: loaded (/etc/systemd/system/daqd_dc.service; enabled)
   Active: active (running) since Mon 2021-10-11 17:48:00 PDT; 23min ago
 Main PID: 2308 (daqd_dc_mx)
   CGroup: /daqd.slice/daqd_dc.service
           ├─2308 /usr/bin/daqd_dc_mx -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.dc
           └─2370 caRepeater

Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: mx receiver 006 thread priority error Operation not permitted[Mon Oct 11 17:48:06 2021]
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: mx receiver 005 thread put on CPU 0
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:48:06 2021] [Mon Oct 11 17:48:06 2021] mx receiver 006 thread put on CPU 0
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: mx receiver 007 thread put on CPU 0
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:48:06 2021] mx receiver 003 thread - label dqmx003 pid=2362
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:48:06 2021] mx receiver 003 thread priority error Operation not permitted
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:48:06 2021] mx receiver 003 thread put on CPU 0
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: warning:regcache incompatible with malloc
Oct 11 17:48:07 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:48:06 2021] EDCU has 410 channels configured; first=0
Oct 11 17:49:06 fb1 daqd_dc_mx[2308]: [Mon Oct 11 17:49:06 2021] ->4: clear crc

● daqd_fw.service - Advanced LIGO RTS daqd frame writer
   Loaded: loaded (/etc/systemd/system/daqd_fw.service; enabled)
   Active: active (running) since Mon 2021-10-11 17:48:01 PDT; 23min ago
 Main PID: 2318 (daqd_fw)
   CGroup: /daqd.slice/daqd_fw.service
           └─2318 /usr/bin/daqd_fw -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.fw

Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] [Mon Oct 11 17:48:09 2021] Producer thread - label dqproddbg pid=2440
Oct 11 17:48:09 fb1 daqd_fw[2318]: Producer crc thread priority error Operation not permitted
Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] [Mon Oct 11 17:48:09 2021] Producer crc thread put on CPU 0
Oct 11 17:48:09 fb1 daqd_fw[2318]: Producer thread priority error Operation not permitted
Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] Producer thread put on CPU 0
Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] Producer thread - label dqprod pid=2434
Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] Producer thread priority error Operation not permitted
Oct 11 17:48:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:09 2021] Producer thread put on CPU 0
Oct 11 17:48:10 fb1 daqd_fw[2318]: [Mon Oct 11 17:48:10 2021] Minute trender made GPS time correction; gps=1318034906; gps%60=26
Oct 11 17:49:09 fb1 daqd_fw[2318]: [Mon Oct 11 17:49:09 2021] ->3: clear crc

● daqd_rcv.service - Advanced LIGO RTS daqd testpoint receiver
   Loaded: loaded (/etc/systemd/system/daqd_rcv.service; enabled)
   Active: active (running) since Mon 2021-10-11 17:48:00 PDT; 23min ago
 Main PID: 2311 (daqd_rcv)
   CGroup: /daqd.slice/daqd_rcv.service
           └─2311 /usr/bin/daqd_rcv -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.rcv

Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1X07_CRC_SUM
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1BHD_STATUS
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1BHD_CRC_CPS
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1BHD_CRC_SUM
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1SU2_STATUS
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1SU2_CRC_CPS
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1SU2_CRC_SUM
Oct 11 17:50:21 fb1 daqd_rcv[2311]: Creating C1:DAQ-NDS0_C1OM[Mon Oct 11 17:50:21 2021] Epics server started
Oct 11 17:50:24 fb1 daqd_rcv[2311]: [Mon Oct 11 17:50:24 2021] Minute trender made GPS time correction; gps=1318035040; gps%120=40
Oct 11 17:51:21 fb1 daqd_rcv[2311]: [Mon Oct 11 17:51:21 2021] ->3: clear crc

Now, even before starting teh FE models, I see DC status as ox2bad in the CDS screens of the IOP and user models. The mx_stream service remains in a failed state at teh FE machines and remain the same even after restarting the service.

controls@c1sus2:~ 0$ sudo systemctl status mx_stream -l
● mx_stream.service - Advanced LIGO RTS front end mx stream
   Loaded: loaded (/etc/systemd/system/mx_stream.service; enabled)
   Active: failed (Result: exit-code) since Mon 2021-10-11 17:50:26 PDT; 15min ago
  Process: 382 ExecStart=/etc/mx_stream_exec (code=exited, status=1/FAILURE)
 Main PID: 382 (code=exited, status=1/FAILURE)

Oct 11 17:50:25 c1sus2 systemd[1]: Starting Advanced LIGO RTS front end mx stream...
Oct 11 17:50:25 c1sus2 systemd[1]: Started Advanced LIGO RTS front end mx stream.
Oct 11 17:50:25 c1sus2 mx_stream_exec[382]: Failed to open endpoint Not initialized
Oct 11 17:50:26 c1sus2 systemd[1]: mx_stream.service: main process exited, code=exited, status=1/FAILURE
Oct 11 17:50:26 c1sus2 systemd[1]: Unit mx_stream.service entered failed state.

But  if I restart the mx_stream service before starting the rtcds models, the mx-stream service starts succesfully:

controls@c1sus2:~ 0$ sudo systemctl restart mx_stream
controls@c1sus2:~ 0$ sudo systemctl status mx_stream -l
● mx_stream.service - Advanced LIGO RTS front end mx stream
   Loaded: loaded (/etc/systemd/system/mx_stream.service; enabled)
   Active: active (running) since Mon 2021-10-11 18:14:13 PDT; 25s ago
 Main PID: 1337 (mx_stream)
   CGroup: /system.slice/mx_stream.service
           └─1337 /usr/bin/mx_stream -e 0 -r 0 -w 0 -W 0 -s c1x07 c1su2 -d fb1:0

Oct 11 18:14:13 c1sus2 systemd[1]: Starting Advanced LIGO RTS front end mx stream...
Oct 11 18:14:13 c1sus2 systemd[1]: Started Advanced LIGO RTS front end mx stream.
Oct 11 18:14:13 c1sus2 mx_stream_exec[1337]: send len = 263596
Oct 11 18:14:13 c1sus2 mx_stream_exec[1337]: Connection Made

However, the DC status on CDS screens still show 0x2bad. As soon as I start the rtcds model c1x07 (the IOP model for c1sus2), the mx_stream service fails:

controls@c1sus2:~ 0$ sudo systemctl status mx_stream -l
● mx_stream.service - Advanced LIGO RTS front end mx stream
   Loaded: loaded (/etc/systemd/system/mx_stream.service; enabled)
   Active: failed (Result: exit-code) since Mon 2021-10-11 18:18:03 PDT; 27s ago
  Process: 1337 ExecStart=/etc/mx_stream_exec (code=exited, status=1/FAILURE)
 Main PID: 1337 (code=exited, status=1/FAILURE)

Oct 11 18:14:13 c1sus2 systemd[1]: Starting Advanced LIGO RTS front end mx stream...
Oct 11 18:14:13 c1sus2 systemd[1]: Started Advanced LIGO RTS front end mx stream.
Oct 11 18:14:13 c1sus2 mx_stream_exec[1337]: send len = 263596
Oct 11 18:14:13 c1sus2 mx_stream_exec[1337]: Connection Made
Oct 11 18:18:03 c1sus2 mx_stream_exec[1337]: isendxxx failed with status Remote Endpoint Unreachable
Oct 11 18:18:03 c1sus2 mx_stream_exec[1337]: disconnected from the sender
Oct 11 18:18:03 c1sus2 mx_stream_exec[1337]: c1x07_daq mmapped address is 0x7fe3620c3000
Oct 11 18:18:03 c1sus2 mx_stream_exec[1337]: c1su2_daq mmapped address is 0x7fe35e0c3000
Oct 11 18:18:03 c1sus2 systemd[1]: mx_stream.service: main process exited, code=exited, status=1/FAILURE
Oct 11 18:18:03 c1sus2 systemd[1]: Unit mx_stream.service entered failed state.

This shows that the start of rtcds model, causes the fail in mx_stream, possibly due to inability of finding the endpoint on fb1. I've again reached to the edge of my knowledge here. Maybe the fiber optic connection between fb and the network switch that connects to FE is bad, or the connection between switch and FEs is bad.

But we are just one step away from making this work.

 

 

  16392   Mon Oct 11 18:29:35 2021 AnchalSummaryCDSMoving forward?

The teststand has some non-trivial issue with Myrinet card (either software or hardware) which even teh experts are saying they don't remember how to fix it. CDS with mx was iin use more than a decade ago, so it is hard to find support for issues with it now and will be the same in future. We need to wrap up this test procedure one way or another now, so I have following two options moving forward:


Direct integration with main CDS and testing

  • We can just connect the c1sus2 and c1bhd FE computers to martian network directly.
  • We'll have to connect c1sus2 and c1bhd to the optical fiber subnetwork as well.
  • On booting, they would get booted through the exisitng fb1 boot server which seems to work fine for the other 5 FE machines.
  • We can update teh DHCP in chiara and reload it so that we can ssh into these FEs with host names.
  • Hopefully, presence of these computers won't tank the existing CDS even if they  themselves have any issues, as they have no shared memory with other models.
  • If this works, we can do the loop back testing of I/O chassis using the main DAQ network and move on with our upgrade.
  • If this does not work and causes any harm to exisitng CDS network, we can disconnect these computers and go back to existing CDS. Recently, our confidence on rebooting the CDS has increased with the robust performance as some legacy issues were fixed.
  • We'll however, continue to use a CDS which is no more supported by the current LIGO CDS group.

Testing CDS upgrade on teststand

  • From what I could gather, most of the hardware in I/O chassis that I could find, is still used in CDS of LLO and LHO, with their recent tests and documents using the same cards and PCBs.
  • There might be some difference in the DAQ network setup that I need to confirm.
  • I've summarised the current c1teststand hardware on this wiki page.
  • If the latest CDS is backwards compatible with our hardware, we can test the new CDS in teh c1teststand setup without disrupting our main CDS. We'll have ample help and support for this upgrade from the current LIGO CDS group.
  • We can do the loop back testing of the I/O chassis as well.
  • If the upgrade is succesfull in the teststand without many hardware changes, we can upgrade the main CDS of 40m as well, as it has the same hardware as our teststand.
  • Biggest plus point would be that out CDS will be up-to-date and we will be able to take help from CDS group if any trouble occurs.

So these are the two options we have. We should discuss which one to take in the mattermost chat or in upcoming meeting.

  16395   Tue Oct 12 17:10:56 2021 AnchalSummaryCDSSome more information

Chris pointed out some information displaying scripts, that show if the DAQ network is working or not. I thought it would be nice to log this information here as well.

controls@fb1:/opt/mx/bin 0$ ./mx_info
MX Version: 1.2.16
MX Build: controls@fb1:/opt/src/mx-1.2.16 Mon Aug 14 11:06:09 PDT 2017
1 Myrinet board installed.
The MX driver is configured to support a maximum of:
    8 endpoints per NIC, 1024 NICs on the network, 32 NICs per host
===================================================================
Instance #0:  364.4 MHz LANai, PCI-E x8, 2 MB SRAM, on NUMA node 0
    Status:        Running, P0: Link Up
    Network:    Ethernet 10G

    MAC Address:    00:60:dd:45:37:86
    Product code:    10G-PCIE-8B-S
    Part number:    09-04228
    Serial number:    423340
    Mapper:        00:60:dd:45:37:86, version = 0x00000000, configured
    Mapped hosts:    3

                                                        ROUTE COUNT
INDEX    MAC ADDRESS     HOST NAME                        P0
-----    -----------     ---------                        ---
   0) 00:60:dd:45:37:86 fb1:0                             1,0
   1) 00:25:90:05:ab:47 c1bhd:0                           1,0
   2) 00:25:90:06:69:c3 c1sus2:0                          1,0

 

controls@c1bhd:~ 1$ /opt/open-mx/bin/omx_info
Open-MX version 1.5.4
 build: root@fb1:/opt/src/open-mx-1.5.4 Tue Aug 15 23:48:03 UTC 2017

Found 1 boards (32 max) supporting 32 endpoints each:
 c1bhd:0 (board #0 name eth1 addr 00:25:90:05:ab:47)
   managed by driver 'igb'

Peer table is ready, mapper is 00:60:dd:45:37:86
================================================
  0) 00:25:90:05:ab:47 c1bhd:0
  1) 00:60:dd:45:37:86 fb1:0
  2) 00:25:90:06:69:c3 c1sus2:0

 

controls@c1sus2:~ 0$ /opt/open-mx/bin/omx_info
Open-MX version 1.5.4
 build: root@fb1:/opt/src/open-mx-1.5.4 Tue Aug 15 23:48:03 UTC 2017

Found 1 boards (32 max) supporting 32 endpoints each:
 c1sus2:0 (board #0 name eth1 addr 00:25:90:06:69:c3)
   managed by driver 'igb'

Peer table is ready, mapper is 00:60:dd:45:37:86
================================================
  0) 00:25:90:06:69:c3 c1sus2:0
  1) 00:60:dd:45:37:86 fb1:0
  2) 00:25:90:05:ab:47 c1bhd:0

These outputs prove that the framebuilder and the FEs are able to see each other in teh DAQ network.


Further, the error that we see when IOP model is started which crashes the mx_stream service on the FE machines (see 40m/16391) :

isendxxx failed with status Remote Endpoint Unreachable

This has been seen earlier when Jamie was troubleshooting the current fb1 in martian network in 40m/11655 in Oct, 2015. Unfortunately, I could not find what Jamie did over a year to fix this issue.

  16396   Tue Oct 12 17:20:12 2021 AnchalSummaryCDSConnected c1sus2 to martian network

I connected c1sus2 to the martian network by splitting the c1sim connection with a 5-way switch. I also ran another ethernet cable from the second port of c1sus2 to the DAQ network switch on 1X7.

Then I logged into chiara and added the following in chiara:/etc/dhcp/dhcpd.conf :

host c1sus2 {
  hardware ethernet 00:25:90:06:69:C2;
  fixed-address 192.168.113.92;
}

And following line in chiara:/var/lib/bind/martian.hosts :

c1sus2          A    192.168.113.92

Note that entires c1bhd is already added in these files, probably during some earlier testing by Gautam or Jon. Then I ran following to restart the dhcp server and nameserver:

~> sudo service bind9 reload
[sudo] password for controls:
 * Reloading domain name service... bind9                                                 [ OK ]
~> sudo service isc-dhcp-server restart
isc-dhcp-server stop/waiting
isc-dhcp-server start/running, process 25764

Now, As I switched on c1sus2 from front panel, it booted over network from fb1 like other FE machines and I was able to login to it by first logging to fb1 and then sshing to c1sus2.

Next, I copied the simulink models and the medm screens of c1x06, xc1x07, c1bhd, c1sus2 from the paths mentioned on this wiki page. I also copied the medm screens from chiara(clone):/opt/rtcds/caltech/c1/medm to martian network chiara in the appropriate places. I have placed the file /opt/rtcds/caltech/c1/medm/teststand_sitemap.adl which can be used to open sitemap for c1bhd and c1sus2 IOP and user models.

Then I logged into c1sus2 (via fb1) and did make, install, start procedure:

controls@c1sus2:~ 0$ rtcds make c1x07
buildd: /opt/rtcds/caltech/c1/rtbuild/release
### building c1x07...
Cleaning c1x07...
Done
Parsing the model c1x07...
Done
Building EPICS sequencers...
Done
Building front-end Linux kernel module c1x07...
Done
RCG source code directory:
/opt/rtcds/rtscore/branches/branch-3.4
The following files were used for this build:
/opt/rtcds/userapps/release/cds/c1/models/c1x07.mdl

Successfully compiled c1x07
***********************************************
Compile Warnings, found in c1x07_warnings.log:
***********************************************
***********************************************
controls@c1sus2:~ 0$ rtcds install c1x07
buildd: /opt/rtcds/caltech/c1/rtbuild/release
### installing c1x07...
Installing system=c1x07 site=caltech ifo=C1,c1
Installing /opt/rtcds/caltech/c1/chans/C1X07.txt
Installing /opt/rtcds/caltech/c1/target/c1x07/c1x07epics
Installing /opt/rtcds/caltech/c1/target/c1x07
Installing start and stop scripts
/opt/rtcds/caltech/c1/scripts/killc1x07
/opt/rtcds/caltech/c1/scripts/startc1x07
sudo: unable to resolve host c1sus2
Performing install-daq
Updating testpoint.par config file
/opt/rtcds/caltech/c1/target/gds/param/testpoint.par
/opt/rtcds/rtscore/branches/branch-3.4/src/epics/util/updateTestpointPar.pl -par_file=/opt/rtcds/caltech/c1/target/gds/param/archive/testpoint_211012_174226.par -gds_node=24 -site_letter=C -system=c1x07 -host=c1sus2
Installing GDS node 24 configuration file
/opt/rtcds/caltech/c1/target/gds/param/tpchn_c1x07.par
Installing auto-generated DAQ configuration file
/opt/rtcds/caltech/c1/chans/daq/C1X07.ini
Installing Epics MEDM screens
Running post-build script

safe.snap exists 
controls@c1sus2:~ 0$ rtcds start c1x07
Cannot start/stop model 'c1x07' on host c1sus2.
controls@c1sus2:~ 4$ rtcds list

controls@c1sus2:~ 0$ 

One can see that even after making and installing, the model c1x07 is not listed as available models in rtcds list. Same is the case for c1sus2 as well. So I could not proceed with testing.

Good news is that nothing that I did affect the current CDS functioning. So we can probably do this testing safely from the main CDS setup.

  16398   Wed Oct 13 11:25:14 2021 AnchalSummaryCDSRan c1sus2 models in martian CDS. All good!

Three extra steps (when adding new models, new FE):

  • Chris pointed out that the sudo command in c1sus2 is giving error
    sudo: unable to resolve host c1sus2
    
    This error comes in when the computer could not figure out it's own hostname. Since FEs are network booted off the fb1, we need to update the /etc/hosts in /diskless/root everytime we add a new FE.
    controls@fb1:~ 0$ sudo chroot /diskless/root
    fb1:/ 0# sudo nano /etc/hosts
    fb1:/ 0# exit
    
    I added the following line in /etc/hosts file above:
    192.168.113.92  c1sus2 c1sus2.martian
    
    This resolved the issue of sudo giving error. Now, the rtcds make and install steps had no errors mentioned in their outputs.
  • Another thing that needs to be done, as Koji pointed out, is to add the host and models in /etc/rtsystab in /diskless/root of fb:
    controls@fb1:~ 0$ sudo chroot /diskless/root
    fb1:/ 0# sudo nano /etc/rtsystab
    fb1:/ 0# exit
    
    I added the following lines in /etc/rtsystab file above:
    c1sus2   c1x07  c1su2
    
    This told rtcds what models would be available on c1sus2. Now rtcds list is displaying the right models:
    controls@c1sus2:~ 0$ rtcds list
    c1x07
    c1su2
  • The above steps are still not sufficient for the daqd_ processes to know about the new models. This part is supossed to happen automatically, but does not happen in our CDS apparently. So everytime there is a new model, we need to edit the file /opt/rtcds/caltech/c1/target/daqd/master and add following lines to it:
    # Fast Data Channel lists
    # c1sus2
    /opt/rtcds/caltech/c1/chans/daq/C1X07.ini
    /opt/rtcds/caltech/c1/chans/daq/C1SU2.ini
    
    # test point lists
    # c1sus2
    /opt/rtcds/caltech/c1/target/gds/param/tpchn_c1x07.par
    /opt/rtcds/caltech/c1/target/gds/param/tpchn_c1su2.par
    
    I needed to restart the daqd_ processes in  fb1 for them to notice these changes:
    controls@fb1:~ 0$ sudo systemctl restart daqd_*
    
    This finally lit up the status channels of DC in C1X07_GDS_TP.adl and C1SU2_GDS_TP.adl . However the channels C1:DAQ-DC0_C1X07_STATUS and C1:DAQ-DC0_C1SU2_STATUS both have values 0x2bad. This persists on restarting the models. I then just simply restarted teh mx_stream on c1sus2 and boom, it worked! (see attached all green screen, never seen before!)

So now Ian can work on testing the I/O chassis and we would be good to move c1sus2 FE and I/O chassis to 1Y3 after that. I've also done following extra changes:

  • Updated CDS_FE_STATUS medm screen to show the new c1sus2 host.
  • Updated global diag rest script to act on c1xo7 and c1su2 as well.
  • Updated mxstream restart script to act on c1sus2 as well.
Attachment 1: CDS_screens_running.png
CDS_screens_running.png
  16407   Fri Oct 15 16:46:27 2021 AnchalSummaryOptical LeversVent Prep

I centered all the optical levers on ITMX, ITMY, ETMX, ETMY, and BS to a position where the single arm lock on both were best aligned. Unfortunately, we are seeing the TRX at 0.78 and TRY at 0.76 at the most aligned positions. It seems less power is getting out of PMC since last month. (Attachment 1).

Then, I tried to lock PRMI with carrier with no luck. But I was able to see flashing of up to 4000 counts in POP_DC. At this position, I centered the PRM optical lever too (Attachment 2).

Attachment 1: Screen_Shot_2021-10-15_at_4.34.45_PM.png
Screen_Shot_2021-10-15_at_4.34.45_PM.png
Attachment 2: Screen_Shot_2021-10-15_at_4.45.31_PM.png
Screen_Shot_2021-10-15_at_4.45.31_PM.png
Attachment 3: Screen_Shot_2021-10-15_at_4.34.45_PM.png
Screen_Shot_2021-10-15_at_4.34.45_PM.png
Attachment 4: Screen_Shot_2021-10-15_at_4.34.45_PM.png
Screen_Shot_2021-10-15_at_4.34.45_PM.png
  16416   Wed Oct 20 11:16:21 2021 AnchalSummaryPEMParticle counter setup near BS Chamber

I have placed a GT321 particle counter on top of the MC1/MC3 chamber next to the BS chamber. The serial cable is connected to c1psl computer on 1X2 using 2 usb extenders (blue in color) over the PSL enclosure and over the 1X1 rack.

The main serial communication script for this counter by Radhika is present in 40m/labutils/serial_com/gt321.py.

A 40m specific application script is present in the new git repo for 40m scripts, in 40m/scripts/PEM/particleCounter.py. Our plan is to slowly migrate the legacy scripts directory to this repo overtime. I've cloned this repo in the nfs shared directory at /opt/rtcds/caltech/c1/Git/40m/scripts which makes the scripts available at all computers and keep them upto date in all computers.

The particle counter script is running on c1psl through a systemd service, using service file 40m/scripts/PEM/particleCounter.service. Locally in c1psl, /etc/systemd/system/particleCounter.service is symbollically linked to the file in the file.

Following channels for particle counter needed to be created as I could not find any existing particle counter channels.

[C1:PEM-BS_PAR_CTS_0p3_UM]
[C1:PEM-BS_PAR_CTS_0p5_UM]
[C1:PEM-BS_PAR_CTS_1_UM]
[C1:PEM-BS_PAR_CTS_2_UM]
[C1:PEM-BS_PAR_CTS_5_UM]

These are created from 40m/softChansModbus/particleCountChans.db database file. Computer optimus is running a docker container to serve as EPICS server for such soft channels. To add or edit channels, one just need to add new database file or edit database files in thsi repo and on optimus do:

controls@optimus|~> sudo docker container restart softchansmodbus_SoftChans_1
softchansmodbus_SoftChans_1

that's it.

I've added the above channels to /opt/rtcds/caltech/c1/chans/daq/C0EDCU.ini to record them in framebuilder. Starting from 11:20 am Oct 20, 2021 PDT, the data on these channels is from BS chamber area. Currently the script is running continuosly, which means 0.3u particles are sampled every minute, 0.5u twice in 5 minutes and 1u, 2u, and 5u particles are sampled once in 5 minutes. We can reduce the sampling rate if this seems unncessary to us.

Attachment 1: PXL_20211020_183728734.jpg
PXL_20211020_183728734.jpg
  16417   Wed Oct 20 11:48:27 2021 AnchalSummaryCDSPower supple configured correctly.

This was horrible! That's my bad, I should have checked the configuration before assuming that it is right.

I fixed the power supply configuration. Now the strip has two rails of +/- 18V and the GND is referenced to power supply earth GND.

Ian should redo the tests.

  16420   Thu Oct 21 11:41:31 2021 AnchalSummaryPEMParticle counter setup near BS Chamber

The particle count channel names were changes yesterday to follow naming conventions used at the sites. Following are the new names:

C1:PEM-BS_DUST_300NM
C1:PEM-BS_DUST_500NM
C1:PEM-BS_DUST_1000NM
C1:PEM-BS_DUST_2000NM
C1:PEM-BS_DUST_5000NM
 

The legacy count channels are kept alive with C1:PEM-count_full copying C1:PEM-BS_DUST_1000NM channel and C1:PEM-count_half copying C1:PEM-BS_DUST_500NM channel.

Attachment one is the particle counter trend since 8:30 am morning today when the HVAC wokr started. Seems like there was some peak particle presence around 11 am. The particle counter even counted 8 counts of particles size above 5um!

 

Attachment 1: ParticleCountData20211021.pdf
ParticleCountData20211021.pdf
  16424   Mon Oct 25 13:23:45 2021 AnchalSummaryBHDBefore photos of BSC

[Yehonathan, Anchal]

On thursday Oct 21 2021, Yehonathan and I opened the door to BSC and took some photos. We setup the HEPA stand next to the door with anti-static curtains covering all sides. We spend about 15 minutes trying to understand the current layout and taking photos and a video. Any suggestions on improvement in our technique and approach would be helpful.

Links to photos:

https://photos.app.goo.gl/fkkdu9qAvH1g5boq6

  16425   Mon Oct 25 17:37:42 2021 AnchalSummaryBHDPart I of BHR upgrade - Removed optics from BSC

[Anchal, Paco, Ian]


Clean room etiquettes

  • Two people in coverall suits, head covers, masks and AccuTech ultra clean gloves.
  • One person in just booties to interact with outside "dirty" world.
  • Anything that comes in chamber, first cleaned outside with clean cloth and IPA. Then cleaned by the "clean" folks. We followed this for allen keys, camera and beam finder card.
  • Once the chamber cover has been removed, cover the annulus with donut. We forgot to do this :(

Optics removal and changes

We removed the following optics from the BSC table and stored them in X-end flowbench with fan on. See attachment 1 and 2.

  1. IPPOS SM2
  2. GRX SM2
  3. PRM OL1
  4. PRMOL4
  5. IPPOS SM3
  6. IPANG SM1
  7. PRM OL2
  8. Unidentified optic inbetween IPPOS45P and IPPOS SM3
  9. Beam block behing PR3
  10. Beam block behind GR PBS
  11. GR PBS
  12. GRPERI1L (Periscope)
  13. PRMOL3
  14. IPPOS45P
  15. Cylindrical counterweight on North-west end of table.
  16. Cheap rectangular mirror on South west end of table (probably used for some camera, but not in use anymore)
  17. IPANGSM2

We also changed the direction of clamp of MMT1 to move it away from the center of the able (where PRM will be placed)

We screwed in the earthquake stops on PRM and BS from front face and top.

We unscrewed the cable post for BS and PRM oplevs and loved it in between SR3 and BS and screwed it lightly.

We moved the PRM, turned it anti-clockwise 90 degrees and brought it in between TT2 and BS. Now there is a clear line of sight between TT2 and PR2 on ITMY table.


Some next steps:

  • We align the input beam to TT2 by opening the "Injection Chamber" (formerly known as OMC chamber). While doing so, we'll clear unwanted optics from this table as well.
  • We open ITMX chamber, clear some POP optics. If SOS are ready, we would replace PR2 with SOS and put it in a new position.
  • Then we'll replace PR3 with an SOS and align the beam to BS.

These are next few days of work. We need atleast one SOS ready by Thursday.


Photos after today's work: https://photos.app.goo.gl/EE7Mvhw5CjgZrQpG6

Attachment 1: rn_image_picker_lib_temp_44cb790a-c3b4-42aa-8907-2f9787a02acd.jpg
rn_image_picker_lib_temp_44cb790a-c3b4-42aa-8907-2f9787a02acd.jpg
Attachment 2: rn_image_picker_lib_temp_0fd8f4fd-64ae-4ccd-8422-cfe929d4eeee.jpg
rn_image_picker_lib_temp_0fd8f4fd-64ae-4ccd-8422-cfe929d4eeee.jpg
  16431   Wed Oct 27 16:27:16 2021 AnchalSummaryBHDPart II of BHR upgrade - Prep

[Anchal, Paco, Ian]

Before we could start working on Part II, which is to relocate TT2 to new location, we had to clear space in front of injection chamber door and clean the floor which was very dusty. This required us to disconnect everything we could safely from OMC North short electronics rack, remove 10-15 BNC cables, 4-5 power cords and relocate some fiber optic cables. We didn't had caps for fiber optic cables handy, so we did not remove them from the rack mounted unit and just turned it away. At the end, we mopped the floor and dried it with a dry cloth. Before and after photos in attachments.

 

Attachment 1: OMCNorthBefore.jpeg
OMCNorthBefore.jpeg
Attachment 2: OMCNorthAfter.jpeg
OMCNorthAfter.jpeg
  16432   Wed Oct 27 16:31:35 2021 AnchalSummaryBHDPart III of BHR upgrade - Removal of PR2 Small Suspension

I went inside the ITMX Chamber to read off specs from PR2 edge. This was required to confirm our calculations of LO power for BHR later. The numbers that I could read from the edge were kind of meaningless "0.5 088 or 2.0 088". To make it more worthwhile this opening of the chamber, we decided to remove the PR2 suspension unit so that the optic can be removed and installed on an SOS in the cleanroom. We covered the optic in clean aluminum foil inside the chamber, then placed in on another aluminum foil to cover completely. Then I traveled slowly to the C&B room, where I placed it on a flow bench.


Later on, we decided to use a dummy fixed mount mirror for PR2 initially with the same substrate thickness, so that we get enough LO power in transmission for alignment. In the very end, we'll swap that with the PR2 mounted on an SOS unit.

  16433   Wed Oct 27 16:38:02 2021 AnchalSummaryBHDPart II of BHR upgrade - Relocation of TT2 and MMT1/2 alignment

[Anchal, Paco]

We opened BSC and Injection Chamber doors. We removed two stacked counterweights from near the center of the BS table, from behind TT2 and placed them in the Xend flow bench. Then we unscrewed TT2 and relocated it to the new BHR layout position. This provided us with the target for the alignment of MMT1 and MMT2 mirrors.

While aligning MMT1 and MMT2, we realized that the BHR layout underestimated the clearance of the beam from MMT2 to TT2, from the TT1 suspension unit. The TT1 suspension stage was clipping our beam going to TT2. To rectify this, we decided to move the MMT2 mirror mount about a cm South and retry. We were able to align the beam to the TT2 optic, but it is a bit off-center. The reflection of TT2 now is going in the general direction of the ITMX chamber. We stopped our work here as fatigue was setting in. Following are some thoughts and future directions:

  • We realized that the output beam from the mode cleaner moves a lot (by more than a cm at MMT2) between different locks. Maybe that's just because of our presence. But we wonder how much clearance all beams must have from MC3 to TT2.
  • Currently, we think the Faraday Isolator might be less than 2 cm away from the beam between MMT1 and MMT2 and the TT1 suspension is less than 2 cm away from MMT2 and TT2.
  • Maybe we can fix these by simply changing the alignment on TT1 which was fixed for our purposes.
  • We definitely need to discuss the robustness of our path a bit more before we proceed to the next part of the upgrade.

Thu Oct 28 17:00:52 2021 After Photos: https://photos.app.goo.gl/wNL4dxPyEgYTKQFG9

  16438   Thu Oct 28 17:01:54 2021 AnchalSummaryBHDPart III of BHR upgrade - Adding temp fixed flat mirror for PR2

[Anchal, Paco, Ian]

  • We added a Y1-2037-0 mirror (former IPPOS SM2 mirror) on a fixed mount in the position of where PR2 is supposed to be in new BHR layout.
  • After turning out all lights in the lab, we were able to see a transmitted beam on our beam finder card.
  • We aligned the mirror so that it relfects the beam off to PR3 clearly and the reflection from PR3 hits BS in the center.
  • We were able to see clear gaussian beams splitted from BS going towards ITMX and ITMY.

Photos: https://photos.app.goo.gl/cKdbtLGa9NtkwqQ68

  16440   Fri Oct 29 14:39:37 2021 AnchalSummaryBHD1Y1 cleared. IY3 ready for C1SUS2 I/O and FE.

[Anchal, Paco]

We cleared 1Y1 rack today removing the following items. This stuff is sitting on the floor about 2 meters east of 1Y3 (see attachment 1):

  • A VME crate: We disconnected it's power cords from the side bus.
  • A NI PXIe-1071 crate with some SMA multiplexer units on it.

We also moved the power relay ethernet strip from the middle of the rack to the bottom of the rack clearing the space marked clear in Koji's schematics. See attachment 2.

There was nothing to clear in 1Y3. It is ready for installing c1sus2 I/O chassis and FE once the testing is complete.

We also removed some orphaned hanging SMA RG-405 cables between 1Y3 and 1Y1.

Attachment 1: RemovedStuff.jpeg
RemovedStuff.jpeg
Attachment 2: 1Y1.jpeg
1Y1.jpeg
Attachment 3: 1Y3.jpeg
1Y3.jpeg
  16450   Fri Nov 5 12:21:16 2021 AnchalSummaryBHDPart VI of BHR upgrade - Removal of ITMYC optics

Today I opened the ITMY chamber and removed the following optics and placed them in Xend flow bench (See attachment 1-3 for updated photograph):

  • OM1
  • OM2
  • ITMYOL1
  • ITMYOL2
  • SRMOL1
  • SRMOL2
  • POYM1
  • 3 counterweights one of which was double the height of others.

I also unscrewed SRM and parked it near the Western end of the table where no optical paths would intersect it. Later we will move it in place once the alignment of the rest of the optics has been done.

While doing this work, I found two unnoted things on the table:

  • One mirror mounted on a mount but not on a post was just sitting next to ITMY. I have removed this and placed it on Xend flow bench.
  • One horizontal razor or plate on the South end of table, mounted on what I thought looked like a picomotor. The motor was soldered to wires without any connector in-line, so I could not remove this. This is on the spot of AS4 and will need to be removed later.

Photos: https://photos.app.goo.gl/S5siAYguBM4UnKuf8

Attachment 1: XendFlowBenchLeftEnd.jpg
XendFlowBenchLeftEnd.jpg
Attachment 2: XendFlowBenchMiddle.jpg
XendFlowBenchMiddle.jpg
Attachment 3: XendFlowBenchRightEnd.jpg
XendFlowBenchRightEnd.jpg
  16463   Tue Nov 9 19:02:47 2021 AnchalSummaryBHD1Y0 Populated and 1Y1,1Y0 powered

[Anchal, Paco]

Today we populated 4 Sat Amp boxes for LO1, Lo2, AS1, and AS4, 2 BO boxes for C1SU2, and 1 Sat Amp Adaptor box, at 1Y0 according the latest rack plan. We also added 2 Sorenson power supplies in 1Y0 at the top slots to power +/- 18V DC strips on both 1Y1 and 1Y0. All wiring has been done for these power connections.

ELOG V3.1.3-