As part of the general lab clean up we removed many unused BNC cables (long and short) from around the SP table. We removed one very long BNC cable which was connected on one side to an PEM input and not connected on the other side near the 1X2 rack.. There were several cables from an old SURF phase camera project which were still attached to a couple of RF amps on the SP tables and running towards the 1X6 rack.
We also removed some unused power cables plugged into a power distribution strip near Megatron.
This afternoon I started setting up the Supermicro 5017A-EP that will replace c1vac1/2. Following Johannes's procedure in 13681 I installed Debian 8.11 (jessie). There is a more recent stable release, 9.5, now available since the first acromag machine was assembled, but I stuck to version 8 for consistency. We already know that version to work. The setup is sitting on the left side of the electronics bench for now.
Following the procedure in this elog, we effected a reset of the vacuum slow machines. Usually, I just turn the key on these crates to do a power cycle, but Steve pointed out that for the vacuum machines, we should only push the "reset" button.
While TP1 was spun down, we took the opportunity to replace the TP1 controller with a spare unit the company has sent us for use while our unit is sent to them for maintenance. The procedure was in principle simple (I only list the additional ones, for the various valve closures, see the slow machine reset procedure elog):
However, we were foiled by a Philips screw on the DB37 connector labelled "MAG BRG", which had all its head worn out. We had to make a cut in this screw using a saw blade, and use a "-" screwdriver to get this troublesome screw out. Steve suspects this is a metric gauge screw, and will request the company to send us a new one, we will replace it when re-installing the maintaiend controller.
Attachments #1 and #2 show the Vacuum MEDM screen before and after the reboot respectively - evidently, the fields that were reading "NO COMM" now read numbers. Attachment #3 shows the main volume pressure during this work.
The problem will be revisited on Monday.
Steve pointed out that some of the vacuum MEDM screen fields were reporting "NO COMM". Koji confirmed that this is a c1vac1 problem, likely the same as reported here and can be fixed using the same procedure.
However, Steve is worried that the interlock won't kick in in case of a vacuum emergency, so we are leaving the PSL shutter closed over the weekend. The problem will be revisited on Monday.
The vacuum and MC are OK
Jon and I stuck a extender card into the eurocrate at 1X8 earlier today (~5pm PT), to see if the box was getting +24V DC from the Sorensen or not. Upon sticking the card in, the FAIL LEDs on all the VME cards came on. We immediately removed the extender card. Without any intervention from us, after ~1 minute, the FAIL LEDs went off again. Judging by the main volume pressure (Attachment #1) and the Vacuum MEDM screen (Attachment #2), this did not create any issues and the c1vac1 computer is still responsive.
But Steve can perhaps run a check in the AM to confirm that this activity didn't break anything.
Is there a reason why extender cards shouldn't be stuck into eurocrates?
- Disk full
I updated the configuration file '/etc/logrotate.d/rsyslog' to set a file sise limit of 50M on 'syslog' and 'daemon.log' since these are the two log files that capture caget & caput terminal outputs. I also reduce the number of backup files to 2.
controls@c1vac:~$ cat /etc/logrotate.d/rsyslog
invoke-rc.d rsyslog rotate > /dev/null
invoke-rc.d rsyslog rotate > /dev/null
invoke-rc.d rsyslog rotate > /dev/null
- Vacuum gauge
The XGS-600 can handle 6 FRGs and we currently have 5 of them connected. Yes, having a spare would be good. I'll see about placing an order for these then.
- Disk Full: Just use the usual /etc/logrotate thing
I rather feel not replacing P1a. We used to have Ps and CCs as they didn't cover the entire pressure range. However, this new FRG (=Full Range Gauge) does cover from 1atm to 4nTorr.
Why don't we have a couple of FRG spares, instead?
Questions to Tega: How many FRGs can our XGS-600 controller handle?
[Anchal, Paco, Tega]
c1vac was showing /var disk to be full. We moved all gunzipped backup logs to /home/controls/logBackUp. This emptied 36% of space on /var. Ideally, we need not log so much. Some solution needs to be found for reducing these log sizes or monitoring them for smart handling.
We were unable to opel the PSL shuttter, due to the interlock with C1:Vac-P1a_pressure. We found that C1:Vac-P1a_pressure is not being written by serial_MKS937a service on c1vac. The issue was the the sensor itself has become bad and needs to be replaced. We believe that "L 0E-04" in the status (C1:Vac-P1a_status) message indicates a malfunctioning sensor.
We removed writing of C1:Vac-P1a_pressure and C1:Vac-P1a_status from MKS937a and mvoed them to XGS600 which is using the sensor 1 from main volume. See this commit.
Now we are able to open PSL shutter. The sensor should be replaced ASAP and this commit can be reverted then.
I've been monitoring the status of the pumpdown remotely with ndscope lookbacks of C1:Vac-CC1_pressure. Today morning, I saw that the channel was putting out a constant value (signature of EPICS server being frozen). caget did not work either. Then I tried ssh-ing into c1vac to see if there were any issues but I was unable to. The machine isn't responding to ping either. The EPICS value has been frozen since ~1030pm PDT 26 May 2019.
I will try and head to campus later today to check on it. Isn't an email alert or soemthing supposed to be sent out in such an event?
The vacuum itself was fine - CC1 gauge reported a pressure of 1.3e-5 torr. Note to self: the C1:Vac-CC1_HORNET_PRESSURE channel, which is the analog readback of the Hornet gauge and which is hooked up to an Acromag ADC in the c1auxex chassis, is independent of the status of the c1vac machine, and so can serve as a diagnostic.
However, I was unable to interact with c1vac in any way, the monitor hooked up directly to it was showing a frozen display. So I hard-rebooted the system. It took a few minutes to come back online - but even after 10 minutes of waiting, still no display. In the process of the reboot, several valves were closed off - when the EPICS processes restart, there are momentary instances where the readback channels get an "undefined" value, which prompts the main interlock process to transition to a "SAFE" state.
Running df -h, I saw that the /var partition was completely full. Maybe this was somehow interfering with the machine running smoothly? Two files in particular, daemon.log and daemon.log.1 were ~1GB each. The contents of these files seemed to be just the readbacks for the caget and caput commands. So I cleared both these files, and now the /var partition usage is only 26%. I also got the display back up and running on the physical monitor hooked up to the c1vac machine's VGA port. Let's see if this has improved the stability situation. The CPU load is still high (~6-7), with most of this coming from the modbus process. Why is this so high? c1susaux has more Acromag units but claims a much lower load of 0.71. Is the CPU of the c1vac machine somehow inferior?
In the meantime, I ssh-ed into c1vac and restored the "Vacuum normal" valve config. During this little escapade, the main volume pressure rose to ~6e-5 torr. It's coming back down smoothly.
Unrelated to this work: we had turned the RGA off for the vent, I powered it back on and re-initialized it this morning.
I deleted references to c1vac1 and c1vac2 (which no longer exist) and added c1vac to the autoburt request file list at /opt/rtcds/caltech/c1/burt/autoburt/requestfilelist
Setting up c1asy:
Now Yuki can work on copying the simulink model (copy c1asx structure) and implementing the autoalignment servo.
Today I got a new router that I used to connect the c1teststand, fb1 and chiara. I was able to see internet access in c1teststand and fb1, but not in chiara. I'm not sure why that is the case.
The good news is that the ntp server on fb1(clone) is working fine now and both FE computers, c1bhd and c1sus2 are succesfully synchronized to the fb1(clone) ntpserver. This resolves any possible timing issues in this DAQ network.
On running the IOP and user models however, I see the same errors are mentioned in 40m/16372. Something to do with:
Oct 06 00:47:56 c1sus2 mx_stream_exec: OMX: Failed to find peer index of board 00:00:00:00:00:00 (Peer Not Found in the Table)
Oct 06 00:47:56 c1sus2 mx_stream_exec: mx_connect failed Nic ID not Found in Peer Table
Oct 06 00:47:56 c1sus2 mx_stream_exec: c1x07_daq mmapped address is 0x7fa4819cc000
Oct 06 00:47:56 c1sus2 mx_stream_exec: c1su2_daq mmapped address is 0x7fa47d9cc000
Thu Oct 7 17:04:31 2021
I fixed the issue of chiara not getting internet. Now c1teststand, fb1 and chiara, all have internet connections. It was the issue of default gateway and interface and findiing the DNS. I have found the correct settings now.
c1teststand subnetwork is now accessible remotely. To log into this network, one needs to do following:
Just to document the IT work I did, doing this connection was bit non-trivial than usual.
I had to add following two lines in the /etc/network/interface file to make the special ip routes persistent even after reboot:
post-up ip route add 192.168.113.200 via 10.0.1.1 dev eno1
post-up ip route add 192.168.113.216 via 10.0.1.1 dev eno1
c1teststand has been restructured. There is no port computer called 'c1teststand' anymore. When you ssh into the c1teststand network using ssh c1teststand from inside martian or from outside network using the method mentioned in this wiki page , you would land into chiara (clone) computer and you can navigate into any teststand network computer from there.
I'll be repurposing 1U c1teststand computer into the new c1susaux2 slow machine now. All files from home directory and from /etc directory of former c1teststand have been zipped and stored in /home/controls of chiara (clone). Just a aside, the network configuration of teststand can be done from inside the teststand network, by going to a browser on either fb1 (clone) or chair (clone) and going to address 10.0.1.1. The login and password are same as our usual workstation username and password.
Moved the rack to the location of the test stand just behind 1X7 and plan to remove the other two small test stand racks to create some space there. We then mounted the c1bhd I/O chassis and 4 front-end machines on the test stand (see attachment 1).
Installed the dolphin IX cards on all 4 front-end machines: c1bhd, c1ioo, c1sus, c1lsc. I also removed the dolphin DX card that was previously installed on c1bhd.
Found a single OneStop host card with a mini PCI slot mounting plate in a storage box (see attachment 2). Since this only fits into the dual PCI riser card slot on c1bhd, I swapped out the full-length PCI slot OneStop host card on c1bhd and installed it on c1lsc, (see attachments 3 & 4).
I keep getting confused about the purpose of the teststand. The view I am adopting going forward is its use as a platform for testing the compatibility of new hardware upgrade, instead of thinking of it as an independent system that works with old hardware.
The initial idea of clearing 1X7 cannot be done for now, because I missed the deadline for providing a detailed enough plan before Monday power up of the lab, so we are just going to go ahead and use the new rack as was initially intended and get the latest hardware and software tested here.
We mounted the DAQ, subnet and dolphin IX switches, see attachement 1. The mounting ears that came with the dolphin switch did not fit and so could not be used for mounting. We looked around the lab and decided to used one of the NavePoint mounting brackets which we found next to the teststand, see attachment 2.
We plan to move the new rack to the current location of the teststand and use the power connection from there. It is also closer to 1X7 so that moving the front-ends and switches to 1X7 should be straight forward after we complete all CDS upgrade testing.
we want to be able to run SimPlant on the teststand, test our new controls algorithms, test watchdogs, and any other software upgrades. Ideally in the steady state it will run some plants with suspensions and cavities and we will develop our measurement scripts on there also (e.g. IFOtest).
We went and collected some information for the overlords to fix the c1teststand DAQ network issue.
We would try to get internet access to c1teststand soon. Meanwhile, someone with more experience and knowledge should look into this situation and try to fix it. We need to test the c1teststand within few weeks now.
We tried to fix the ntp synchronization in c1teststand today by repeating the steps listed in 40m/16302. Even though teh cloned fb1 now has the exact same package version, conf & service files, and status, the FE machines (c1bhd and c1sus2) fail to sync to the time. the timedatectl shows the same stauts 'Idle'. We also, dug bit deeper into the error messages of daq_dc on cloned fb1 and mx_stream on FE machines and have some error messages to report here.
controls@fb1:~ 0$ sudo dbpg -i ntp_1%3a4.2.6.p5+dfsg-7+deb8u3_amd64.deb
controls@fb1:~ 0$ sudo dbpg -i libopts25_5.18.12-3_amd64.deb
controls@fb1:~ 0$ sudo dbpg -i libssl1.1_1.1.0l-1~deb9u4_amd64.deb
controls@fb1:~ 0$ sudo systemctl enable ntp
controls@fb1:~ 0$ sudo systemctl daemon-reload
controls@fb1:~ 0$ sudo systemctl restart ntp
controls@fb1:~ 0$ sudo systemctl status ntp
● ntp.service - NTP daemon (custom service)
Loaded: loaded (/etc/systemd/system/ntp.service; enabled)
Active: active (running) since Mon 2021-10-04 17:12:58 UTC; 1h 15min ago
Main PID: 26807 (code=exited, status=0/SUCCESS)
├─30408 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 105:107
└─30525 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 105:107
Oct 04 17:48:42 fb1 ntpd_intres: host name not found: 2.debian.pool.ntp.org
Oct 04 17:48:52 fb1 ntpd_intres: host name not found: 3.debian.pool.ntp.org
Oct 04 18:05:05 fb1 ntpd_intres: host name not found: 0.debian.pool.ntp.org
Oct 04 18:05:15 fb1 ntpd_intres: host name not found: 1.debian.pool.ntp.org
Oct 04 18:05:25 fb1 ntpd_intres: host name not found: 2.debian.pool.ntp.org
Oct 04 18:05:35 fb1 ntpd_intres: host name not found: 3.debian.pool.ntp.org
Oct 04 18:21:48 fb1 ntpd_intres: host name not found: 0.debian.pool.ntp.org
Oct 04 18:21:58 fb1 ntpd_intres: host name not found: 1.debian.pool.ntp.org
Oct 04 18:22:08 fb1 ntpd_intres: host name not found: 2.debian.pool.ntp.org
Oct 04 18:22:18 fb1 ntpd_intres: host name not found: 3.debian.pool.ntp.org
controls@fb1:~ 0$ ntpq -p
remote refid st t when poll reach delay offset jitter
192.168.123.255 .BCST. 16 u - 64 0 0.000 0.000 0.000
controls@c1bhd:~ 3$ timedatectl
Local time: Mon 2021-10-04 18:34:38 UTC
Universal time: Mon 2021-10-04 18:34:38 UTC
RTC time: Mon 2021-10-04 18:34:38
Time zone: Etc/UTC (UTC, +0000)
NTP enabled: yes
NTP synchronized: no
RTC in local TZ: no
DST active: n/a
controls@c1bhd:~ 0$ systemctl status systemd-timesyncd -l
● systemd-timesyncd.service - Network Time Synchronization
Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; enabled)
Active: active (running) since Mon 2021-10-04 17:21:29 UTC; 1h 13min ago
Main PID: 244 (systemd-timesyn)
controls@fb1:~ 3$ sudo systemctl status daqd_dc -l
● daqd_dc.service - Advanced LIGO RTS daqd data concentrator
Loaded: loaded (/etc/systemd/system/daqd_dc.service; enabled)
Active: failed (Result: exit-code) since Mon 2021-10-04 17:50:25 UTC; 22s ago
Process: 715 ExecStart=/usr/bin/daqd_dc_mx -c /opt/rtcds/caltech/c1/target/daqd/daqdrc.dc (code=exited, status=1/FAILURE)
Main PID: 715 (code=exited, status=1/FAILURE)
Oct 04 17:50:24 fb1 systemd: Started Advanced LIGO RTS daqd data concentrator.
Oct 04 17:50:25 fb1 daqd_dc_mx: [Mon Oct 4 17:50:25 2021] Unable to set to nice = -20 -error Unknown error -1
Oct 04 17:50:25 fb1 daqd_dc_mx: Failed to do mx_get_info: MX not initialized.
Oct 04 17:50:25 fb1 daqd_dc_mx: 263596
Oct 04 17:50:25 fb1 systemd: daqd_dc.service: main process exited, code=exited, status=1/FAILURE
Oct 04 17:50:25 fb1 systemd: Unit daqd_dc.service entered failed state.
controls@fb1:~ 0$ sudo chroot /diskless/root
fb1:/ 0# sudo nano /etc/systemd/system/mx_stream.service
fb1:/ 0# exit
controls@c1bhd:~ 0$ sudo systemctl daemon-reload
controls@c1bhd:~ 0$ sudo systemctl restart mx_stream
controls@c1bhd:~ 0$ sudo systemctl status mx_stream -l
● mx_stream.service - Advanced LIGO RTS front end mx stream
Loaded: loaded (/etc/systemd/system/mx_stream.service; enabled)
Active: failed (Result: exit-code) since Mon 2021-10-04 17:57:20 UTC; 24s ago
Process: 11832 ExecStart=/etc/mx_stream_exec (code=exited, status=1/FAILURE)
Main PID: 11832 (code=exited, status=1/FAILURE)
Oct 04 17:57:20 c1bhd systemd: Starting Advanced LIGO RTS front end mx stream...
Oct 04 17:57:20 c1bhd systemd: Started Advanced LIGO RTS front end mx stream.
Oct 04 17:57:20 c1bhd mx_stream_exec: send len = 263596
Oct 04 17:57:20 c1bhd mx_stream_exec: OMX: Failed to find peer index of board 00:00:00:00:00:00 (Peer Not Found in the Table)
Oct 04 17:57:20 c1bhd mx_stream_exec: mx_connect failed Nic ID not Found in Peer Table
Oct 04 17:57:20 c1bhd mx_stream_exec: c1x06_daq mmapped address is 0x7f516a97a000
Oct 04 17:57:20 c1bhd mx_stream_exec: c1bhd_daq mmapped address is 0x7f516697a000
Oct 04 17:57:20 c1bhd systemd: mx_stream.service: main process exited, code=exited, status=1/FAILURE
Oct 04 17:57:20 c1bhd systemd: Unit mx_stream.service entered failed state.
As usual, some help would be helpful
I don't know anything about mx/open-mx, but you also need open-mx,don't you?
controls@c1ioo:~ 0$ systemctl status *mx*
● open-mx.service - LSB: starts Open-MX driver
Loaded: loaded (/etc/init.d/open-mx)
Active: active (running) since Wed 2021-09-22 11:54:39 PDT; 1 weeks 5 days ago
Process: 470 ExecStart=/etc/init.d/open-mx start (code=exited, status=0/SUCCESS)
└─620 /opt/3.2.88-csp/open-mx-1.5.4/bin/fma -d
● mx_stream.service - Advanced LIGO RTS front end mx stream
Loaded: loaded (/etc/systemd/system/mx_stream.service; enabled)
Active: active (running) since Wed 2021-09-22 12:08:00 PDT; 1 weeks 5 days ago
Main PID: 5785 (mx_stream)
└─5785 /usr/bin/mx_stream -e 0 -r 0 -w 0 -W 0 -s c1x03 c1ioo c1als c1omc -d fb1:0
open-mx service is running successfully on the fb1(clone), c1bhd and c1sus.
[JC, Tega, Chris]
After moving the test stand front-ends, chiara (name server) and fb1 (boot server) to the new rack behind 1X7, we powered everything up and checked that we can reach c1teststand via pianosa and that the front-ends are still able to boot from fb1. After confirming these tests, we decided to start the software upgrade to debian 10. We installed buster on fb1 and are now in the process of setting up diskless boot. I have been looking around for cds instructions on how to do this and I found the CdsFrontEndDebian10page which contains most of the info we require. The page suggests that it may be cleaner to start the debian10 installation on a front-end that is connected to an I/O chassis with at least 1 ADC and 1 DAC card, then move the installation disk to the boot server and continue from there, so I moved the disk from fb1 to one of the front-ends but I had trouble getting it to boot. I decided to do a clean install on another disk on the c1lsc front-end which has a host adapter card that can be connected to the c1bhd I/O chassis. We can then mount this disk on fb1 and use it to setup the diskless boot OS.
c1susvme2 has been running just a bit late for about a week. I rebooted it.
The plot shows SRM_FE_SYNC, which is the number of times in the last second that c1susvme2 was late for the 16k cycle. Similarly for ETMX.
The reboot appears to have worked.
When I came in earlier today, I noticed that c1susvme2 was red on the DAQ screens. Since the vme computers always seem to be happier as a set, I hit the physical reset buttons on sosvme, susvme1 and susvme2. I then did the telnet or ssh in as appropriate for each computer in turn. sosvme and susvme1 came back just fine. However, I couldn't cd to /cvs/cds/caltech/target/c1susvme2 while ssh-ed in to susvme2. I could cd to /cvs/cds, and then did an ls, and it came back totally blank. There was nothing at all in the folder.
Yoichi showed me how to do 'df' to figure out what filesystems are mounted, and it looked as though the filesystem was mounted. But then Yoichi tried to unmount the filesystem, and it claimed that it wasn't mounted at all. We then remounted the filesystem, and things were good again. I was able to continue the regular restart procedure, and the computer is back up again.
Recap: c1susvme2 mysteriously got unmounted from /cvs/cds! But it's back, and the computers are all good again.
It got worse again, starting with locking last night, but it has not recovered. Attached is a 3-day trend of SRM cpu load showing the good spell.
Last week, Alex recompiled the c1susvme2 code without the decimation filters for the OUT16 channels, so these channels are now as aliased as the rest of them. This appears to have helped with the timing issues: although it's not completely cured it is much better. Attached is a five day trend.
We've also been having problems with timing for c1susvme2. Attached is a one-hour plot of timing data for this cpu, known as SRM. Each spike is an instance of lateness, and a potential cause of lock loss. This has been going on for a quite a while.
Attached is a 3 day trend of SRM CPU timing info. It clearly gets better (though still problematic) at some point, but I don't know why as it doesn't correspond with any work done. I've labeled a reboot, which was done to try to clear out the timing issues. It can also be seen that it gets worse during locking work, but maybe that's a coincidence.
The attached shows the 200 day '10-minute' trend of the CPU meters and also the room temperature.
To my eye there is no correlation between the signals. Its clear that c1susvme2 (SRM LOAD) is going up and no evidence that its temperature.
Today I noticed that the FE SYNC counters of c1susvme1/2 on the RFM network screen were stuck at 16384. I tried to reboot the machines to fix the problem but it didn't work.
The BS watchdog tripped off when I did that, because I had forgotten to disable it. I had to wait for a few minutes before it settled down again.
Later I also re-locked the mode cleaner. But before I could do it, Rana had to reduce the MC_L offset for me.
Power cycling c1dcuepics seems to have fixed the EPICs channel problems, and c1lsc, c1asc, and c1iovme are talking again.
I burt restored c1iscepics and c1Iosepics from the snapshot at 6 am this morning.
However, c1susvme1 never came back after the last power cycle of its crate that it shared with c1susvme2. I connected a monitor and keyboard per the reboot instructions. I hit ctrl-x, and it proceeded to boot, however, it displays that there's a media error, PXE-E61, suggests testing the cable, and only offers an option to reboot. From a cursory inspection of the front, the cables seem to look okay. Also, this machine had eventually come back after the first power cycle and I'm pretty sure no cables were moved in between.
I had a go at trying to bring c1susvme1 back online. The first few times I hit the physical reset button, I saw the same error that Joe mentioned, about needing to check some cables. I tried one round of rebooting c1sosvme, c1susvme2 and c1susvme1, with no success. After a few iterations of jiggle cables/reset button/ctrl-x on c1susvme1, it came back. I ran the startup.cmd script, and re-enabled the suspensions, and Mode Cleaner is now locked. So, all systems are back online, and I'm crossing my fingers and toes that they stay that way, at least for a little while.
c1susvme1 is behaving weirdly. I've restarted it several times but its computation time is hanging out around 260 usec, making it useless for suspension control and locking. I also found a PS/2 keyboard plugged in, which doesn't work, so I unplugged it. It needs to be plugged into a PS/2 keyboard/mouse Y-splitter cable.
Yesterday Gautam and I ran final tests of the eight suspensions controlled by c1susaux, using PyIFOTest. All of the optics pass a set of basic signal-routing tests, which are described in more detail below. The only issue found was with ITMX having an apparent DC bias polarity reversal (all four front coils) relative to the other seven susaux optics. However, further investigation found that ETMX and ETMY have the same reversal, and there is documentation pointing to the magnets being oppositely-oriented on these two optics. It seems likely that this is the case for ITMX as well.
I conclude that all the new c1susaux wiring/EPICS interfacing works correctly. There are of course other tests that can still be scripted, but at this point I'm satisfied that the new Acromag machine itself is correctly installed. PyIFOTest has been morphed into a powerful general framework for automating IFO tests. Anything involving fast/slow IO can now be easily scripted. I highly encourage others to think of more applications this may have at the 40m.
The code is currently located in /users/jon/pyifotest although we should find a permanent location for it. From the root level it is executed as
$ ./IFOTest <PARAMETER_FILE>
where PARAMETER_FILE is the filepath to a YAML config file containing the test parameters. I've created a config file for each of the suspended optics. They are located in the root-level directory and follow the naming convention SUS-<OPTIC>.yaml.
The code climbs a hierarchical "ladder" of actuation/readback-paired tests, with the test at each level depending on signals validated in the preceding level. At the base is the fast data system, which provides an independent reference against which the slow channels are tested. There are currently three scripted tests for the slow SUS channels, listed in order of execution:
I took the c1teststand computer from teststand and converted it into c1susaux2. To do so, I installed a fresh copy of debian 10 on it and followed the steps on this wiki page. I did some parts slightly differently though. The directory /cvs/cds/caltecg/c1susaux2 is a repository and contains the service unit file modbusIOC.service as well. A symbolic link is created at /etc/systemd/system to use this service file for creating the modbusIOC service. All db files are generated by parsing the acromag chassis wiring file using this python script.
The service file is running without any errors now and all channels are available. The leftmost bench on EEshop at 40m is now ready to do LO1 slow controls and monitor testing. If someone gets time today, they can hookup an unused coil driver to the chassis and verify ENABLE switching and monitoring through the optical isolators. We can also drive some voltage on the PD monitors and verify the functioning of our ADCs. Once this test passes, it is straight forward to finish the remaining 6 SOS wiring and we would be good to install the chassis.
Attaching wiring diagram of c1susuaux2 acromag chassis. Any comments/modification suggestions should come soon as we'll go ahead and wire it soon.
Note: While accessing channels using caget on c1susuaux2, you might get a warning "Identical process variable names on multiple servers". You can safely ignore it. It just means that channel is accessible on that particular computer via two different network interfaces (martian network eno1 and acromag subnetwork eno2) and it will just pick one of them.
I tried to perform a simple enabling test of coils using c1susaux2 modbus channels but failed. I'm able to do the enabling of coils using the windows GUI of acromag card but I can not do it when the cards are connected to the computer subnetwork. The issue is two-fold:
There's also an issue in reading back the ENABLE_MON channels. Here we suspect that one of the optical isolator box that we have been using might have a short in one of it's output channel. I'll investigate this more tomorrow. Again, the issue is two-fold. The EPICS channel values do not really change. So there is clearly some issue of communicating with the acromag cards.
[Anchal, Yehonathan, Ian]
We installed c1susaux2 acromag chassis in 1Y0 with c1susaux2 computer. We connected PD monitors, Binary inputs, Binary outputs, and Run/Acquire RTS signals for 6 of the 7 suspensions. We ran out of DB9 cables to connect PR3. Of the ones that were connected, LO2, AS1, AS4, SR2, and PR2 are showing no issues in the functionality from the chassis. For LO1, everything is working except for UR EnableMon channel. The enable monitor does not show an ON state for the coil even though the coil driver chassis shows that it is ON via the LED lights. A possible reason could be that a wire got disconnected when we closed the chassis (there are a lot of wires pushing against each other. Another reason could be that the optical isolator ISO10 could have developed a bad channel on channel 2. The circuit was tested before closing the chassis, so not sure what went wrong after closing it.
PR2 is showing a non-acromag chassis related issue. As soon as we close the loop by enabling the coils, the watchdog triggers because the loop is unstable. Not sure what has changed for PR2, but someone should take a look at it.
For the issue with LO1, I suggest we keep a note that the C1:SUS-LO1_UR_ENABLEMon channel is faulty and don't take its value seriously. We should diagnose and fix this issue once we have more reasons to disconnect the chassis and open it.