ID |
Date |
Author |
Type |
Category |
Subject |
6175
|
Fri Jan 6 01:00:56 2012 |
kiwamu | Update | CDS | c1scx out of sync | Both the c1scx and its IOP realtime processes became out of sync.
Initially I found that the c1scx didn't show any ADC signals, though the sync sign was green.
Then I software-rebooted the c1iscex machine and then it became out of sync.
For tonight this is fine because I am concentrating on the central part anyway. |
4173
|
Thu Jan 20 04:03:02 2011 |
kiwamu | Update | CDS | c1scy error | I found that c1scy was not running due to a daq initialization error.
I couldn't figure out how to fix it, so I am leaving it to Joe.
Here is the error messages in the dmesg on c1iscey
[ 39.429002] c1scy: Invalid num daq chans = 0
[ 39.429002] c1scy: DAQ init failed -- exiting
Before I found this fact, I rebooted c1iscey in order to recover the synchronization with fb.
The synchronization had been lost probably because I shutdowned the daqd on fb.
|
4175
|
Thu Jan 20 10:15:50 2011 |
josephb | Update | CDS | c1scy error | This is caused by an insufficient number of active DAQ channels in the C1SCY.ini file located in /opt/rtcds/caltech/c1/chans/daq/. A quick look (grep -v # C1SCY.ini) indicates there are no active channels. Experience tells me you need at least 2 active channels.
Taking a look at the activateDAQ.py script in the daq directory, it looks like the C1SCY.ini file is included, by the loop over optics is missing ETMY. This caused the file to improperly updated when the activateDAQ.py script was run. I have fixed the C1SCY.ini file (ran a modified version of the activate script on just C1SCY.ini).
I have restarted the c1scy front end using the startc1scy script and is currently working.
Quote: |
Here is the error messages in the dmesg on c1iscey
[ 39.429002] c1scy: Invalid num daq chans = 0
[ 39.429002] c1scy: DAQ init failed -- exiting
|
|
8626
|
Thu May 23 10:24:23 2013 |
Jamie | Summary | CDS | c1scy model continues to run at the hairy edge | c1scy, the controller model at the Y END, is still running very long, typically at 55/60 microseconds, or ~92% of it's cycle. It's currently showing a recorded max cycle time (since last restart or reset) of 60, which means that it has actually hit it's limit sometime in the very recent past. This is obviously not good, since it's going to inject big glitches into ETMY.
c1scy is actually running a lot less code than c1scx, but c1scx caps out it's load at about 46 us. This indicates to me that it must be some hardware configuration setting in the c1iscey computer.
I'll try to look into this more as soon as I can. |
9441
|
Wed Dec 4 21:33:24 2013 |
Koji | Update | CDS | c1scy time-over issue mitigated | c1scy had frequent time-over. This caused the glitches of the OSEM damping servos.
Today Eric Q was annoyed by the glitches while he worked on the green PDH inspection at the Y-end.
In order to mitigate this issue, low priority RFM channels are moved from c1scy to c1tst.
The moved channels (see Attachment 1) are supposed to be less susceptible to the additional delay.
This modification required the following models to be modified, recompiled, reinstalled, and restarted
in the listed order: c1als, c1sus, c1rfn, c1tst, c1scy
Now the models are are running. CDS status is all green.
The time consumption of c1scy is now ~30us (porevious ~60us) (see Attachment 2)
I am looking at the cavity lock of TEM00 and I have witnessed no glitch any more.
In fact, the OSEM signals have no glitch. (see Attachment 3)
We still have c1mcs having regularly time-over. Can I remove the WFS->OAF connections temporarily? |
5786
|
Wed Nov 2 17:29:10 2011 |
Katrin | Update | CDS | c1scy.mdl compiled | Slight modification on that model:
- terminated Q_out of Lockins to be able to compile the old model
- assigned other ADC channels to GCY (green YARM)
|
16728
|
Tue Mar 15 14:10:41 2022 |
Anchal | Summary | CDS | c1su2 model remade, reinstalled, restarted after the update | I have restarted c1su2 model with the connections of Run Acquire switch to analog filters on coil drivers. Following steps were taken:
First ssh to c1sus2 and then:
controls@c1sus2:~ 0$ rtcds make c1su2
buildd: /opt/rtcds/caltech/c1/rtbuild/release
### building c1su2...
Cleaning c1su2...
Done
Parsing the model c1su2...
Done
Building EPICS sequencers...
Done
Building front-end Linux kernel module c1su2...
Done
RCG source code directory:
/opt/rtcds/rtscore/branches/branch-3.4
The following files were used for this build:
/opt/rtcds/userapps/release/cds/common/models/lockin.mdl
/opt/rtcds/userapps/release/cds/common/models/rtbitget.mdl
/opt/rtcds/userapps/release/cds/common/models/rtdemod.mdl
/opt/rtcds/userapps/release/isc/common/models/QPD.mdl
/opt/rtcds/userapps/release/sus/c1/models/c1su2.mdl
/opt/rtcds/userapps/release/sus/c1/models/lib/sus_single_control.mdl
Successfully compiled c1su2
***********************************************
Compile Warnings, found in c1su2_warnings.log:
***********************************************
WARNING *********** No connection to subsystem output named SUS_DAC1_12
WARNING *********** No connection to subsystem output named SUS_DAC1_13
WARNING *********** No connection to subsystem output named SUS_DAC1_14
WARNING *********** No connection to subsystem output named SUS_DAC1_15
WARNING *********** No connection to subsystem output named SUS_DAC2_7
WARNING *********** No connection to subsystem output named SUS_DAC2_8
WARNING *********** No connection to subsystem output named SUS_DAC2_9
WARNING *********** No connection to subsystem output named SUS_DAC2_10
WARNING *********** No connection to subsystem output named SUS_DAC2_11
WARNING *********** No connection to subsystem output named SUS_DAC2_12
WARNING *********** No connection to subsystem output named SUS_DAC2_13
WARNING *********** No connection to subsystem output named SUS_DAC2_14
WARNING *********** No connection to subsystem output named SUS_DAC2_15
***********************************************
controls@c1sus2:~ 0$ rtcds install c1su2
buildd: /opt/rtcds/caltech/c1/rtbuild/release
### installing c1su2...
Installing system=c1su2 site=caltech ifo=C1,c1
Installing /opt/rtcds/caltech/c1/chans/C1SU2.txt
Installing /opt/rtcds/caltech/c1/target/c1su2/c1su2epics
Installing /opt/rtcds/caltech/c1/target/c1su2
Installing start and stop scripts
/opt/rtcds/caltech/c1/scripts/killc1su2
/opt/rtcds/caltech/c1/scripts/startc1su2
Performing install-daq
Updating testpoint.par config file
/opt/rtcds/caltech/c1/target/gds/param/testpoint.par
/opt/rtcds/rtscore/branches/branch-3.4/src/epics/util/updateTestpointPar.pl -par_file=/opt/rtcds/caltech/c1/target/gds/param/archive/testpoint_220315_135808.par -gds_node=26 -site_letter=C -system=c1su2 -host=c1sus2
Installing GDS node 26 configuration file
/opt/rtcds/caltech/c1/target/gds/param/tpchn_c1su2.par
Installing auto-generated DAQ configuration file
/opt/rtcds/caltech/c1/chans/daq/C1SU2.ini
Installing Epics MEDM screens
Running post-build script
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-AS1_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS1_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-AS1_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS1_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-AS1_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS1_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-AS4_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS4_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-AS4_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS4_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-AS4_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_AS4_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-LO1_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO1_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-LO1_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO1_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-LO1_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO1_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-LO2_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO2_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-LO2_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO2_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-LO2_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_LO2_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-PR2_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR2_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-PR2_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR2_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-PR2_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR2_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-PR3_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR3_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-PR3_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR3_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-PR3_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_PR3_TO_COIL_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 4 5 C1:SUS-SR2_INMATRIX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_SR2_INMATRIX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 2 4 C1:SUS-SR2_LOCKIN_INMTRX > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_SR2_LOCKIN_INMTRX_KB.adl
/opt/rtcds/userapps/release/cds/common/scripts/generate_KisselButton.py 5 6 C1:SUS-SR2_TO_COIL --fi > /opt/rtcds/caltech/c1/medm/c1su2/C1SUS_SR2_TO_COIL_KB.adl
safe.snap exists
controls@c1sus2:~ 0$
Then on rossa, run activateSUS2DQ.py which creates a file C1SU2.ini.NEW. Remove old backup file C1SU2.ini.bak, rename C1SU2.ini to C1SU2.ini.bak and rename C1SU2.ini.NEW to C1SU2.ini:
~> cd /opt/rtcds/caltech/c1/chans/daq/
daq>python2 activateSUS2DQ.py
/opt/rtcds/caltech/c1/chans/daq/C1SU2.ini
daq>rm C1SU2.ini.bak
daq>mv C1SU2.ini C1SU2.ini.bak
daq>mv C1SU2.ini.NEW C1SU2.ini
Then ssh back to c1sus2 and restart the rtcds model:
controls@c1sus2:~ 0$ rtcds restart c1su2
### stopping c1su2...
### starting c1su2...
c1su2epics: no process found
Number of ADC cards on bus = 2
Number of DAC16 cards on bus = 3
Number of DAC18 cards on bus = 0
Number of DAC20 cards on bus = 0
Specified filename iocC1.log does not exist.
c1su2epics C1 IOC Server started
c1su2 RT ready in 4
awg_server Version $Id$
channel_client Version $Id$
testpoint_server Version $Id$
/opt/rtcds/caltech/c1/target/gds/bin/awgtpman -s c1su2 -l /opt/rtcds/caltech/c1/target/gds/awgtpman_logs/c1su2.log started on host c1sus2 hostid ffffffffa8c05771
awgtpman Version $Id$
controls@c1sus2:~ 0$
Then restart daqd services from rossa and burtrestore to latest snap of c1su2epics.snap:
daq>telnet fb 8083
Trying 192.168.113.201...
Connected to fb.martian.
Escape character is '^]'.
daqd> shutdown
OK
Connection closed by foreign host.
daq>burtgooey
>burtwb -f /opt/rtcds/caltech/c1/burt/autoburt/latest/c1su2epics.snap -l /tmp/controls_1220315_140755_0.write.log -o /tmp/controls_1220315_140755_0.nowrite.snap -v <
daq>
All suspensions are back online and everything is same as before now. Will test later the Run/Acquire switch functionality. |
16726
|
Tue Mar 15 11:52:34 2022 |
Anchal | Summary | CDS | c1su2 model updated for sending Run/Acquire Binary Output to Binary Interface card | I routed the XXX_COIL_DW signals from the 7 SOS blocks in c1su2.mdl (located at /cvs/cds/rtcds/userapps/trunk/sus/c1/models/c1su2.mdl) to the binary outputs from the FE model. The routing is done such that when these binary outputs are routed through the binary interface card mounted on 1Y0, they go to the acromag chassis just installed and from there they go to the binary inputs of the coil drivers together with the acromag controlled coil outputs.
I have not restarted the rtcds models yet. This needs more care and need to follow instructions from 40m/16533. Will do that sometime later or Koji can follow up this work. |
16533
|
Wed Dec 22 17:40:22 2021 |
Anchal | Summary | CDS | c1su2 model updated with SUS damping blocks for 7 SOSs | [Anchal, Koji]
I've updated the c1su2 model today with model suspension blocks for the 7 new SOSs (LO1, LO2, AS1, AS4, SR2, PR2 and PR3). The model is running properly now but we had some difficulty in getting it to run.
Initially, we were getting 0x2000 error on the c1su2 model CDS screen. The issue probably was high data transmission required for all the 7 SOSs in this model. Koji dug up a script /opt/rtcds/caltech/c1/userapps/trunk/cds/c1/scripts/activateDQ.py that has been used historically for updating the data rate on some of theDQ channels in the suspension block. However, this script was not working properly for Koji, so he create a new script at /opt/rtcds/caltech/c1/chans/daq/activateSUS2DQ.py.
[Ed by KA: I could not make this modified script run so that I replaces the input file (i.e. C1SU2.ini). So the output file is named C1SU2.ini.NEW and need to manually replace the original file.]
With this, Koji was able to reduce acquisition rate of SUSPOS_IN1_DQ, SUSPIT_IN1_DQ, SUSYAW_IN1_DQ, SUSSIDE_IN1_DQ, SENSOR_UL, SENSOR_UR, SENSOR_LL,SENSOR_LR, SENSOR_SIDE, OPLEV_PERROR, OPLEV_YERROR, and OPLEV_SUM to 2048 Sa/s. The script modifies the /opt/rtcds/caltech/c1/chans/daq/C1SU2.ini file which would get re-written if c1su2 model is remade and reinstalled. After this modification, the 0x2000 error stopped appearing and the model is running fine.
Should we change the library model part for sus_single_control.mdl
We notice that all our suspension models need to go through this weird python script modifying auto-generated .ini files to reduce the data rate. Ideally, there is a simpler solution to this by simply adding the datarate 2048 in the '#DAQ Channels' block in the model library part /cvs/cds/rtcds/userapps/trunk/sus/c1/models/lib/sus_single_control.mdl which is the root model in all the suspensions. With this change, the .ini files will automatically be written with correct datarate and there will be no need for using the activateDQ script. But we couldn't find why this simple solution was not implemented in the past, so we want to know if there is more stuff going on here then we know. Changing the library model would obviously change every suspension model and we don't want a broken CDS system on our head at the begining of holidays, so we'll leave this delicate task for the near future. |
16537
|
Wed Dec 29 20:09:40 2021 |
rana | Summary | CDS | c1su2 model updated with SUS damping blocks for 7 SOSs | We want to maintain the 16 kHz sample rate for the COIL DAQ channels, but nothing wrong with reducing the others.
I would suggest setting the DQ sample rates to 256 Hz for the SUS DAMP channels and 1024 Hz for the OPLEV channels (for noise diagnostics).
Maybe you can put these numbers into a new library part and we can have the best of all worlds?
Quote: |
Should we change the library model part for sus_single_control.mdl
We notice that all our suspension models need to go through this weird python script modifying auto-generated .ini files to reduce the data rate. Ideally, there is a simpler solution to this by simply adding the datarate 2048 in the '#DAQ Channels' block in the model library part /cvs/cds/rtcds/userapps/trunk/sus/c1/models/lib/sus_single_control.mdl which is the root model in all the suspensions. With this change, the .ini files will automatically be written with correct datarate and there will be no need for using the activateDQ script. But we couldn't find why this simple solution was not implemented in the past, so we want to know if there is more stuff going on here then we know. Changing the library model would obviously change every suspension model and we don't want a broken CDS system on our head at the begining of holidays, so we'll leave this delicate task for the near future.
|
|
7165
|
Mon Aug 13 20:12:29 2012 |
jamie | Update | CDS | c1sup model moved to c1lsc machine | I moved the c1sup simplant model to the c1lsc machine, where there was one remaining available processor. This requires changing a bunch of IPC routing in the c1sus and c1lsp models. I have rebuilt and installed the models, and have restarted c1sup, but have not restarted c1sus and c1lsp since they're currently in use. I'll restart them first thing tomorrow. |
6619
|
Mon May 7 22:39:37 2012 |
Den | Update | CDS | c1sus | [Jenne, Den]
We decided to reboot C1SUS machine in hope that this will fix the problem with seismic channels. After reboot the machine could not connect to framebuilder. We restarted mx_stream but this did not relp. Then we manually executed
/opt/rtcds/caltech/c1/target/fb/mx_stream -s c1x02 c1sus c1mcs c1rfm c1pem -d fb:0 -l /opt/rtcds/caltech/c1/target/fb/mx_stream_logs/c1sus.log
but c1sus still could not connect to fb. This script returned the following error:
controls@c1sus ~ 128$ cat /opt/rtcds/caltech/c1/target/fb/mx_stream_logs/c1sus.log
c1x02
c1sus
c1mcs
c1rfm
c1pem
mmapped address is 0x7fb5ef8cc000
mapped at 0x7fb5ef8cc000
mmapped address is 0x7fb5eb8cc000
mapped at 0x7fb5eb8cc000
mmapped address is 0x7fb5e78cc000
mapped at 0x7fb5e78cc000
mmapped address is 0x7fb5e38cc000
mapped at 0x7fb5e38cc000
mmapped address is 0x7fb5df8cc000
mapped at 0x7fb5df8cc000
send len = 263596
OMX: Failed to find peer index of board 00:00:00:00:00:00 (Peer Not Found in the Table)
mx_connect failed
Looks like CDS error. We are leaving the WATCHDOGS OFF for the night. |
10135
|
Mon Jul 7 13:44:21 2014 |
Jenne | Update | CDS | c1sus - bad fb connection |
Quote: |
I managed to recover c1sus. It required stopping all the models, and the restarting them one-by-one:
$ rtcds stop all # <-- this does the right to stop all the models with the IOP stopped last, so they will all unload properly.
$ rtcds start iop
$ rtcds start c1sus c1mcs c1rfm
I have no idea why the c1sus models got wedged, or why restarting them in this way fixed the issue.
|
In addition to needing obnoxiously regular mxstream restarts, this afternoon the sus machine was doing something slightly differently. Only 1 fb block per core was red (the mxstream symptom is 3 fb-related blocks are red per core), and restarting the mxstream didn't help. Anyhow, I was searching through the elog, and this entry to which I'm replying had similar symptoms. However, by the time I went back to the CDS FE screen, c1sus had regular mxstream symptoms, and an mxstream restart fixed things right up.
So, I don't know what the issue is or was, nor do I know why it is fixed, but it's fine for now, but I wanted to make a note for the future. |
3945
|
Thu Nov 18 11:06:20 2010 |
josephb | Update | CDS | c1sus and ADCs | Problem:
ADCs are timing out on c1sus when we have more than 3.
Talked with Rolf:
Alex will be back tomorrow (he took yesterday and today off), so I talked with Rolf.
He said ordering shouldn't make a difference and he's not sure why would be having a problem. However, when he loads the chassis, he tends to put all the ADCs on the same PCI bus (the back plane apparently contains multiples). Slot 1 is its own bus, Slots 2-9 should be the same bus, and 10-17should be the same bus.
He also mentioned that when you use dmesg and see a line like "ADC TIMEOUT # ##### ######", the first number should be the ADC number, which is useful for determining which one is reporting back slow.
Plan:
Disconnect c1sus IO chassis completely, pull it out, pull out all cards, check connectors, and repopulate with Rolf's suggestions and keeping this elog in mind.
In regards to the RFM, it looks like one of the fibers had been disconnected from the c1sus chassis RFM card (its plugged in in the middle of the chassis so its hard to see) during all the plugging in and out of the cables and cards last night. |
4733
|
Tue May 17 18:09:13 2011 |
Jamie, Kiwamu | Configuration | CDS | c1sus and c1auxey crashed, rebooted | c1sus and c1auxey crashed, required hard reboot
For some reason, we found that c1sus and c1auxey were completely unresponsive. We went out and gave them a hard reset, which brought them back up with no problems.
This appears to be related to a very similar problem report by Kiwamu just a couple of days ago, where c1lsc crashed after editing the C1LSC.ini and restarting the daqd process, which is exactly what I just did (see my previous log). What could be causing this? |
6737
|
Fri Jun 1 02:33:40 2012 |
Jenne | Update | Computers | c1sus and c1iscex - bad fb connections | Something bad happened to c1sus and c1iscex ~20 min ago. They both have "0x2bad" 's. I restarted the daqd on the framebuilder, and then rebooted c1sus, and nothing changed. The SUS screens are all zeros (the gains seem to be set correctly, but all of the signals are 0's).
If it's not fixed when I get in tomorrow, I'll keep poking at it to make it better. |
6740
|
Fri Jun 1 09:50:50 2012 |
Jamie | Update | Computers | c1sus and c1iscex - bad fb connections |
Quote: |
Something bad happened to c1sus and c1iscex ~20 min ago. They both have "0x2bad" 's. I restarted the daqd on the framebuilder, and then rebooted c1sus, and nothing changed. The SUS screens are all zeros (the gains seem to be set correctly, but all of the signals are 0's).
If it's not fixed when I get in tomorrow, I'll keep poking at it to make it better.
|
This is at least partially related to the mx_stream issue I reported previously. I restarted mx_stream on c1iscex and that cleared up the models on that machine.
Something else is happening with c1sus. Restarting mx_stream on c1sus didn't help. I'll try to fix it when I get over there later. |
6742
|
Fri Jun 1 14:40:24 2012 |
Jamie | Update | Computers | c1sus and c1iscex - bad fb connections |
Quote: |
This is at least partially related to the mx_stream issue I reported previously. I restarted mx_stream on c1iscex and that cleared up the models on that machine.
Something else is happening with c1sus. Restarting mx_stream on c1sus didn't help. I'll try to fix it when I get over there later.
|
I managed to recover c1sus. It required stopping all the models, and the restarting them one-by-one:
$ rtcds stop all # <-- this does the right to stop all the models with the IOP stopped last, so they will all unload properly.
$ rtcds start iop
$ rtcds start c1sus c1mcs c1rfm
I have no idea why the c1sus models got wedged, or why restarting them in this way fixed the issue. |
6738
|
Fri Jun 1 08:01:46 2012 |
steve | Update | Computers | c1sus and c1iscex are down |
Quote: |
Something bad happened to c1sus and c1iscex ~20 min ago. They both have "0x2bad" 's. I restarted the daqd on the framebuilder, and then rebooted c1sus, and nothing changed. The SUS screens are all zeros (the gains seem to be set correctly, but all of the signals are 0's).
If it's not fixed when I get in tomorrow, I'll keep poking at it to make it better.
|
|
4183
|
Fri Jan 21 15:26:15 2011 |
josephb | Update | CDS | c1sus broken yesterday and now fixed | [Joe, Koji]
Yesterday's CDS swap of c1sus and c1iscex left the interfometer in a bad state due to several issues.
The first being a need to actually power down the IO chassis completely (I eventually waited for a green LED to stop glowing and then plugged the power back in) when switching computers. I also plugged and plugged the interface cable from the IO chassis and computer while powered down. This let the computer actually see the IO chassis (previously the host interface card was glowing just red, no green lights).
Second, the former c1iscex computer and now new c1sus computer only has 6 CPUs, not 8 like most of the other front ends. Because it was running 6 models (c1sus, c1mcs, c1rms, c1rfm, c1pem, c1x02) and 1 CPU needed to be reserved for the operating system, 2 models were not actually running (recycling mirrors and PEM). This meant the recycling mirrors were left swinging uncontrolled.
To fix this I merged the c1rms model with the c1sus model. The c1sus model now controls BS, ITMX, ITMY, PRM, SRM. I merged the filter files in the /chans/ directory, and reactivated all the DAQ channels. The master file for the fb in the /target/fb directory had all references to c1rms removed, and then the fb was restarted via "telnet fb 8088" and then "shutdown".
My final mistake was starting the work late in the day.
So the lesson for Joe is, don't start changes in the afternoon.
Koji has been helping me test the damping and confirm things are really running. We were having some issues with some of the matrix values. Unfortunately I had to add them by hand since the previous snapshots no longer work with the models. |
3653
|
Tue Oct 5 16:58:41 2010 |
josephb, yuta | Update | CDS | c1sus front end status | We moved the filters for the mode cleaner optics over from the C1SUS.txt file in /opt/rtcds/caltech/c1/chans/ to the C1MCS.txt file, and placed SUS_ on the front of all the filter names. This has let us load he filters for the mode cleaner optics.
At the moment, we cannot seem to get testpoints for the optics (i.e. dtt is not working, even the specially installed ones on rosalba). I've asked Yuta to enter in the correct matrix elements and turn the correct filters on, then save with a burt backup. |
6787
|
Thu Jun 7 17:49:09 2012 |
Jamie | Update | CDS | c1sus in weird state, running models but unresponsive otherwise | Somehow c1sus was in a very strange state. It was running models, but EPICS was slow to respond. We could not log into it via ssh, and we could not bring up test points. Since we didn't know what else to do we just gave it a hard reset.
Once it came it, none of the models were running. I think this is a separate problem with the model startup scripts that I need to debug. I logged on to c1sus and ran:
rtcds restart all
(which handles proper order of restarts) and everything came up fine.
Have no idea what happened there to make c1sus freeze like that. Will keep an eye out. |
3946
|
Thu Nov 18 14:05:06 2010 |
josephb, yuta | Update | CDS | c1sus is alive! | Problem:
We broke c1sus by moving ADC cards around.
Solution:
We pulled all the cards out, examined all contacts (which looked fine), found 1 poorly connected cable internally, going between an ADC and ADC timing interface card (that probably happened last night), and one of the two RFM fiber cables pulled out of its RFM card.
We then placed all of the cards back in with a new ordering, tightened down everything, and triple checked all connections were on and well fit.
Gotcha!
Joe forgot that slot 1 and slot 2 of the timing interface boards have their last channels reserved for duotone signals. Thus, they shouldn't be used for any ADCs or DACs that need their last channel (such as MC3_LR sensor input). We saw a perfect timing signal come in through the MC3_LR sensor input, which prevented damping.
We moved the ADC timing interface card out of the 1st slot of the timing interface board and into slot 6 of the timing interface board, which resolved the problem.
Final Configuration:
Timing Interface Board
Timing Interface Slot |
1 (Duotone) |
2 (Duotone) |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
Card |
None |
DAC interface (can't use last channel) |
ADC Interface |
ADC interface |
ADC interface |
ADC
interface
|
None |
None |
None |
DAC interface |
DAC interface |
None |
None |
PCIe Chassis
Slot |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
PCIe Number |
Do Not Use |
1 |
6 |
5 |
4 |
9 |
8 |
7 |
3 |
2 |
14 |
13 |
12 |
17 |
16 |
15 |
11 |
10 |
Card |
None |
ADC |
DAC |
ADC |
ADC |
ADC |
BO |
BO |
BO |
BO |
DAC |
DAC |
BIO |
RFM |
None |
None |
None |
None |
Still having Issues with:
ITM West damps. ITM South damps, but the coil gains are opposite to the other optics in order to damp properly.
We also need to look into switching the channel names for the watchdogs on ITMX/Y in addition to the front end code changes. |
6924
|
Fri Jul 6 01:12:02 2012 |
Jenne | Update | Computers | c1sus is fine |
Quote: |
I was trying to use a new BLRMs c-code block that the seismic people developed, instead of Mirko's more clunky version, but putting this in crashed c1sus.
I reverted to a known good c1pem.mdl, and Jamie and I did a reboot, but c1sus is still funny - none of the models are actually running.
rtcds restart all - all the models are happy again, c1sus is fine.
But, we still need to figure out what was wrong with the c-code block.
Also, the BLRMS channels are listed in a Daq Channels block inside of the (new) library part, so they're all saved with the new CDS system which became effective as of the upgrade. (I made the Mirko copy-paste BLRMS into a library part, including a DAQ channels block before trying the c-code. This is the known-working version to which I reverted, and we are currently running.)
|
The reason I started looking at BLRMS and c1sus today was that the BLRMS striptool was totally wacky. I finally figured out that the pemepics hadn't been burt restored, so none of the channels were being filtered. It's all better now, and will be even better soon when Masha finishes updating the filters (she'll make her own elog later) |
14719
|
Tue Jul 2 16:57:09 2019 |
gautam | Update | CDS | c1sus is flaky | Since the work earlier this morning, the fast c1sus model has crashed ~5 times. Tried rebooting vertex FEs using the reboot script a few times, but the problem is persisting. I'm opting to do the full hard reboot of the 3 vertex FEs to resolve this problem.
Judging by Attachment #1, the processes have been stable overnight. |
6923
|
Thu Jul 5 16:49:35 2012 |
Jenne | Update | Computers | c1sus is funny | I was trying to use a new BLRMs c-code block that the seismic people developed, instead of Mirko's more clunky version, but putting this in crashed c1sus.
I reverted to a known good c1pem.mdl, and Jamie and I did a reboot, but c1sus is still funny - none of the models are actually running.
rtcds restart all - all the models are happy again, c1sus is fine.
But, we still need to figure out what was wrong with the c-code block.
Also, the BLRMS channels are listed in a Daq Channels block inside of the (new) library part, so they're all saved with the new CDS system which became effective as of the upgrade. (I made the Mirko copy-paste BLRMS into a library part, including a DAQ channels block before trying the c-code. This is the known-working version to which I reverted, and we are currently running.) |
6026
|
Mon Nov 28 16:46:55 2011 |
kiwamu | Update | CDS | c1sus is now up | I have restarted the c1sus machine and burt-restored c1sus and c1mcs to the day before Thank giving, namely 23rd of November.
Quote from #6020 |
I have restarted the c1sus machine around 9:00 PM yesterday and then shut it down around 4:00 AM this morning after a little bit of taking care of the interferometer.
|
|
7182
|
Tue Aug 14 17:47:44 2012 |
Jamie | Update | CDS | c1sus machine replaced | Rolf and Alex came back over with a replacement machine for c1sus. We removed the old machine, removed it's timing, dolphin, and PCIe extension cards and put them in the new machine. We then installed the new machine and booted it and it came up fine. The BIOS in this machine is slightly different, and it wasn't having the same failure-to-boot-with-no-COM issue that the previous one was. The COM ports are turned off on this machine (as is the USB interface).
Unfortunately the problem we were experiencing with the old machine, that unloading certain models was causing others to twitch and that dolphin IPC writes were being dropped, is still there. So the problem doesn't seem to have anything to do with hardware settings...
After some playing, Rolf and Alex determined that for some reason the c1rfm model is coming up in a strange state when started during boot. It runs faster, but the IPC errors are there. If instead all models are stopped, the c1rfm model is started first, and then the rest of the models are started, the c1rfm model runs ok. They don't have an explanation for this, and I'm not sure how we can work around it other than knowing the problem is there and do manual restarts after boot. I'll try to think of something more robust.
A better "fix" to the problems is to clean up all of our IPC routing, a bunch of which we're currently doing very inefficient right now. We're routing things through c1rfm that don't need to be, which is introducing delays. It particular, things that can communicate directly over RFM or dolphin should just do so. We should also figure out if we can put the c1oaf and c1pem models on the same machine, so that they can communicate directly over shared memory (SHMEM). That should cut down on overhead quite a bit. I'll start to look at a plan to do that.
|
6042
|
Tue Nov 29 18:54:29 2011 |
kiwamu | Update | CDS | c1sus machine up | [Zach / Kiwamu]
Woke up the c1sus machine in order to lock PSL to MC so that we can observe the effect of not having the EOM heater. |
3157
|
Fri Jul 2 11:33:15 2010 |
josephb | Update | CDS | c1sus needs real time linux to be setup on it | I connected a monitor and keyboard to the new c1sus machine and discovered its not running RTL linux. I changed the root password to the usual, however, without help from Alex I don't know where to get the right version or how to install it, since it doesn't seem to have an obvious CD rom drive or the like. Hopefully Tuesday I can get Alex to come over and help with the setup of it, and the other 1-2 IO chassis. |
3636
|
Fri Oct 1 16:34:06 2010 |
josephb | Update | CDS | c1sus not booting due to fb dhcp server not running | For some reason, the dhcp server running on the fb machine which assigns the IP address to c1sus (since its running a diskless boot) was down. This was preventing c1sus from coming up properly. The symptom was an error indicated no DHCP offers were made(when I plugged a keyboard and monitor in).
To check if the dhcp server is running, run ps -ef | grep dhcpd. If its not, it can be started with "sudo /etc/init.d/dhcpd start" |
6033
|
Tue Nov 29 04:47:49 2011 |
kiwamu | Update | CDS | c1sus shut down again | I have shut down the c1sus machine at 3:30 AM. |
6020
|
Mon Nov 28 06:53:30 2011 |
kiwamu | Update | CDS | c1sus shutdown | I have restarted the c1sus machine around 9:00 PM yesterday and then shut it down around 4:00 AM this morning after a little bit of taking care of the interferomter.
Quote from #6016 |
c1sus has been shutdown so that the optics dont bang around. This is because the watch dogs are not working.
|
|
3687
|
Mon Oct 11 10:49:03 2010 |
josephb | Update | CDS | c1sus stability | Taking a look at the c1sus machine, it looks as if all of the front end codes its running (c1sus - running BS, ITMX, ITMY, c1mcs - running MC1, MC2, MC3, and c1rms - running PRM and SRM) worked over the weekend. As I see no
Running dmesg on c1sus reports on a single long cycle on c1x02, where it took 17 microseconds (~15 microseconds i maximum because the c1x02 IOP process is running at 64kHz).
Both the c1sus and c1mcs models are running at around 39-42 microseconds USR time and 44-50 microseconds CPU time. It would run into problems at 60-62 microseconds.
Looking at the filters that are turned on, it looks as it these models were running with only a single optic's worth of filters turned on via the medm screens. I.e. the MC2 and ITMY filters were properly set, but not the others.
The c1rms model is running at around 10 microseconds USR time and 14-18 microseconds CPU time. However it apparently had no filters on.
It looks as if no test points were used this weekend. We'll turn on the rest of the filters and see if we start seeing crashes of the front end again.
Edit:
The filters for all the suspensions have been turned on, and all matrix elements entered. The USR and CPU times have not appreciably changed. No long cycles have been reported through dmesg on c1sus at this time. I'm going to let it run and see if it runs into problems. |
3160
|
Tue Jul 6 17:07:56 2010 |
josephb | Update | CDS | c1sus status | I talked to Alex, and he explained the steps necessary to get the real time linux kernel installed. It basically went like copy the files from c1iscex (the one he installed last month) in the directory /opt/rtlidk-2.2 to the c1sus locally. Then go into rtlinux_kernel_2_6, and run make and make install (or something like that - need to look at the make file). Then edit the grub loader file to look like the one on c1iscex (located at /boot/grub/menu.lst).
This will then hopefully let us try out the RCG code on c1sus and see if it works. |
3662
|
Wed Oct 6 16:16:48 2010 |
josephb, yuta | Update | CDS | c1sus status | At the moment, c1sus and c1mcs on the c1sus machine seem to be dead in the water. At this point, it is unclear to me why.
Apparently during the 40m meeting, Alex was able to get test points working for the c1mcs model. He said he "had to slow down mx_stream startup on c1sus". When we returned at 2pm, things were running fine.
We began updating all the matrix values on the medm screens. Somewhere towards the end of this the c1sus model seemed to have crashed, leaving only c1x02 and c1mcs running. There were no obvious error messages I saw in dmesg and the target/c1sus/logs/log.txt file (although that seems to empty these days). We quickly saved to burt snap shots, one of c1sus and one of c1mcs and saved them to /opt/rtcds/catlech/c1/target/snapshots directory temporarily. We then ran the killc1sus script on c1sus, and then after confirming the code was removed, ran the startup script, startc1sus. The code seemed to come back partly. It was syncing up and finding the ADC/DAC boards, but not doing any real computations. The cycle time was reporting reasonably, but the usr time (representing computation done for the model) was 0. There were no updating monitor channels on the medm screens and filters would not turn on.
At this point I tried bringing down all 3 models, and restarting c1x02, then c1sus and c1mcs. At this point, both c1sus and c1mcs came back partly, doing no real calculations. c1x02 appears to be working normally (or at least the two filter banks in that model are showing changing channels from ADCs properly). I then tried rebooting the c1sus machine. It came back in the same state, working c1x02, non-calculating c1sus and c1mcs. |
3666
|
Thu Oct 7 10:48:41 2010 |
josephb, yuta | Update | CDS | c1sus status | This problem has been resolved.
Apparently during one of Alex's debugging sessions, he had commented out the feCode function call on line 1532 of the controller.c file (located in /opt/rtcds/caltech/c1/core/advLigoRTS/src/fe/ directory).
This function is the one that actually calls all the front end specific code and without it, the code just doesn't do any computations. We had to then rebuild the front end codes with this corrected file.
Quote: |
At the moment, c1sus and c1mcs on the c1sus machine seem to be dead in the water. At this point, it is unclear to me why.
Apparently during the 40m meeting, Alex was able to get test points working for the c1mcs model. He said he "had to slow down mx_stream startup on c1sus". When we returned at 2pm, things were running fine.
We began updating all the matrix values on the medm screens. Somewhere towards the end of this the c1sus model seemed to have crashed, leaving only c1x02 and c1mcs running. There were no obvious error messages I saw in dmesg and the target/c1sus/logs/log.txt file (although that seems to empty these days). We quickly saved to burt snap shots, one of c1sus and one of c1mcs and saved them to /opt/rtcds/catlech/c1/target/snapshots directory temporarily. We then ran the killc1sus script on c1sus, and then after confirming the code was removed, ran the startup script, startc1sus. The code seemed to come back partly. It was syncing up and finding the ADC/DAC boards, but not doing any real computations. The cycle time was reporting reasonably, but the usr time (representing computation done for the model) was 0. There were no updating monitor channels on the medm screens and filters would not turn on.
At this point I tried bringing down all 3 models, and restarting c1x02, then c1sus and c1mcs. At this point, both c1sus and c1mcs came back partly, doing no real calculations. c1x02 appears to be working normally (or at least the two filter banks in that model are showing changing channels from ADCs properly). I then tried rebooting the c1sus machine. It came back in the same state, working c1x02, non-calculating c1sus and c1mcs.
|
|
3668
|
Thu Oct 7 14:57:52 2010 |
josephb, yuta | Update | CDS | c1sus status | Around noon, Yuta and I were trying to figure out why we were getting no signal out to the mode cleaner coils. It turns out the mode cleaner optic control model was not talking to the IOP model.
Alex and I were working under the incorrect assumption that you could use the same DAC piece in multiple models, and simply use a subset of the channels. He finally went and asked Rolf, who said that the same DAC simulink piece in different models doesn't work. You need to use shared memory locations to move the data to the model with the DAC card. Rolf says there was a discussion (probably a long while back) where it was asked if we needed to support DAC cards in multiple models and the decision was that it was not needed.
Rolf and Alex have said they'd come over and discuss the issue.
In the meantime, I'm moving forward by adding shared memory locations for all the mode cleaner optics to talk to the DAC in the c1sus model.
Note by KA: Important fact that is worth remembering |
3673
|
Thu Oct 7 17:19:55 2010 |
josephb, alex, rolf | Update | CDS | c1sus status | As noted by Koji, Alex and Rolf stopped by.
We discussed the feasibility of getting multiple models using the same DAC. We decided that we infact did need it. (I.e. 8 optics through 3 DACs does not divide nicely), and went about changing the controller.c file so as to gracefully handle that case. Basically it now writes a 0 to the channel rather than repeating the last output if a particular model goes down that is sharing a DAC.
In a separate issue, we found that when skipping DACs in a model (say using DACs 1 and 2 only) there was a miscommunication to the IOP, resulting in the wrong DACs getting the data. the temporary solution is to have all DACs in each model, even if they are not used. This will eventually be fixed in code.
At this point, we *seem* to be able to control and damp optics. Look for a elog from Yuta confirming or denying this later tonight (or maybe tomorrow).
|
3665
|
Thu Oct 7 10:37:42 2010 |
josephb | Update | CDS | c1sus with flaky ssh | Currently trying to understand why the ssh connections to c1sus are flaky. This morning, every time I tried to make the c1sus model on the c1sus machine, the ssh session would be terminated at a random spot midway through the build process. Eventually restarting c1sus fixed the problem for the moment.
However, previously in the last 48 hours, the c1sus machine had stopped responding to ssh logins while still appearing to be running the front end code. The next time this occurs, we should attach a monitor and keyboard and see what kind of state the computer is in. Its interesting to note we didn't have these problems before we switched over to the Gentoo kernel from the real-time linux Centos 5.5 kernel. |
3638
|
Fri Oct 1 18:19:24 2010 |
josephb, kiwamu | Update | CDS | c1sus work | The c1sus model was split into 2, so that c1sus controls BS, PRM, SRM, ITMX, ITMY, while c1mcs controls MC1, MC2, MC3. The c1mcs uses shared memory to tell c1sus what signals to the binary outputs (which control analog whitening/dewhitening filters), since two models can't control a binary output.
This split was done because the CPU time was running above 60 microseconds (the limit allowable since we're trying to run at 16kHz). Apparently the work Alex had done getting testpoints working had put a greater load on the cpu and pushed it over an acceptable maximum. After removing the MC optics controls, the CPU time dropped to about 47 microseconds from about 67 microseconds. The c1mcs is taking about 20 microseconds per cycle.
The new model is using the top_names functionality to still call the channels C1SUS-XXX_YYY. However, the directory to find the actual medm filter modules is /opt/rtcds/caltech/c1/medm/c1mcs, and the gds testpoint screen for that model is called C1MCS-GDS_TP.adl. I'm currently in the process of updating the medm screens to point to the correct location.
Also, while plugging in the cables from the coil dewhitening boards, we realized I (Joe) had made a mistake in the assignment of channels to the binary output boards. I need to re-examine Jay's old drawings and fix the simulink model binary outputs. |
16414
|
Tue Oct 19 18:20:33 2021 |
Ian MacMillan | Summary | CDS | c1sus2 DAC to ADC test | I ran a DAC to ADC test on c1sus2 channels where I hooked up the outputs on the DAC to the input channels on the ADC. We used different combinations of ADCs and DACs to make sure that there were no errors that cancel each other out in the end. I took a transfer function across these channel combinations to reproduce figure 1 in T2000188.
As seen in the two attached PDFs the channels seem to be working properly they have a flat response with a gain of 0.5 (-6 dB). This is the response that is expected and is the result of the DAC signal being sent as a single ended signal and the ADC receiving as a differential input signal. This should result in a recorded signal of 0.5 the amplitude of the actual output signal.
The drop off on the high frequency end is the result of the anti-aliasing filter and the anti-imaging filter. Both of these are 8-pole elliptical filters so when combined we should get a drop off of 320dB per decade. I measured the slope on the last few points of each filter and the averaged value was around 347dB per decade. This is slightly steeper than expected but since it is to cut off higher frequencies it shouldn't have an effect on the operation of the system. Also it is very close to the expected value.
The ripples seen before the drop off are also an effect of the elliptical filters and are seen in T2000188.
Note: the transfer function that doesn't seem to match the others is the heartbeat timing signal. |
16415
|
Tue Oct 19 23:43:09 2021 |
Koji | Summary | CDS | c1sus2 DAC to ADC test | (Because of a totally unrelated reason) I was checking the electronics units for the upgrade. And I realized that the electronics units at the test stand have not been properly powered.
I found that the AA/AI stack at the test stand (Attachment 1) has an unusual powering configuration (Attachment 2).
- Only the positive power supply was used / - The supply voltage is only +15V / - The GND reference is not connected to anywhere.
For confirmation, I checked the voltage across the DC power strip (Attachments 3/4). The positive was +5.3V and the negative was -9.4V. This is subject to change depending on the earth potential.
This is not a good condition at all. The asymmetric powering of the circuit may cause damages to the opamps. So I turned off the switches of the units.
The power configuration should be immediately corrected.
- Use both positive and negative supply (2 power supply channels) to produce the positive and the negative voltage potentials. Connect the reference potential to the earth post of the power supply.
https://www.youtube.com/watch?v=9_6ecyf6K40 [Dual Power Supply Connection / Serial plus minus electronics laboratory PS with center tap]
- These units have DC power regulator which produces +/-15V out of +/-18V. So the DC power supplies are supposed to be set at +18V.
|
16430
|
Tue Oct 26 18:24:00 2021 |
Ian MacMillan | Summary | CDS | c1sus2 DAC to ADC test | [Ian, Anchal, Paco]
After the Koji found that there was a problem with the power source Anchal and I fixed the power then reran the measurment. The only change this time around is that I increased the excitation amplitude to 100. In the first run the excitation amplitude was 1 which seemed to come out noise free but is too low to give a reliable value.
link to previous results
The new plots are attached. |
17033
|
Mon Jul 25 17:58:10 2022 |
Tega | Configuration | BHD | c1sus2 IPC dolphin issue update | From the 40m wiki, I was able to use the instructions here to map out what to do to get the IPC issue resolved. Here is a summary of my findings.
I updated the /etc/dis/dishost.conf file on the frame builder machine to include the c1sus2 machine which runs the sender model, c1hpc, see below. After this, the file becomes available on c1sus2 machine, see attachment 1, and the c1sus2 node shows up in the dxadmin GUI, see attachment 2. However, the c1sus2 machine was not active. I noticed that the log file for the dis_nodemgr service, see attachment 3, which is responsible for setting things up, indicated that the dis_irm service may not be up, so I checked and confirmed that this was indeed the case, see attachment 4. I tried restarting this service but was unsuccessful. I restarted the machine but this did not help either. I have reached out to Jonathan Hanks for assistance. |
17052
|
Mon Aug 1 18:42:39 2022 |
Tega | Configuration | BHD | c1sus2 IPC dolphin issue update | [Yuta, Tega]
We decided to give the dolphin debugging another go. Firstly, we noticed that c1sus2 was no longer recogonising the dolphin card, which can be checked using
lspci | grep Stargen
or looking at the status light on the dolphin card of c1sus2, which was orange for both ports A and B.
We decided to do a hard reboot of c1sus2 and turned off the DAQ chassis for a few minutes, then restared c1sus2. This solved the card recognition problem as well as the 'dis_irm' driver loading issue (I think the driver does not get loaded if the system does not recognise a valid card, as I also saw the missing dis_irm driver module on c1testand).

Next, we confirmed the status of all dolphin cards on fb1, using
controls@fb1$ /opt/DIS/sbin/dxadmin

It looks like the dolphin card on c1sus2 has now been configured and is availabe to all other nodes. We then restated the all FE machines and models to see if we are in the clear. Unfortunately, we are not so lucky since the problem persisted.
Looking at the output of 'dmesg', we could only identity two notable difference between the operational dolphin cards on c1sus/c1ioo/c1lsc and c1sus2, namely: the card number being equal to zero and the memory addresses which are also zero, see image below.

Anyways, at least we can now eliminate driver issues and would move on to debugging the models next. |
17025
|
Thu Jul 21 21:50:47 2022 |
Tega | Configuration | BHD | c1sus2 IPC update | IPC issue still unresolved.
Updated shared memory tag so that 'SUS' -> 'SU2' in c1hpc, c1bac and c1su2. Removed obsolete 'HPC/BAC-SUS' references from IPC file, C1.ipc. Restarted the FE models but the c1sus2 machine froze, so I did a manual reboot. This brought down the vertex machines---which I restarted using /opt/rtcds/caltech/c1/scripts/cds/rebootC1LSC.sh---and the end machines which I restarted manually. Everything but the BHD optics now have their previous values. So need to burtrestore these.
# IPC file:
/opt/rtcds/caltech/c1/chans/ipc/C1.ipc
# Model file locations:
/opt/rtcds/userapps/release/isc/c1/models/isc/c1hpc.mdl
/opt/rtcds/userapps/release/sus/c1/models/c1su2.mdl
/opt/rtcds/userapps/release/isc/c1/models/isc/c1bac.mdl
# Log files:
/cvs/cds/rtcds/caltech/c1/rtbuild/3.4/c1hpc.log
/cvs/cds/rtcds/caltech/c1/rtbuild/3.4/c1su2.log
/cvs/cds/rtcds/caltech/c1/rtbuild/3.4/c1bac.log
SUS overview medm screen :
- Reduced the entire screen width
- Revert to old screen style watchdog layout
|
17335
|
Mon Dec 5 12:05:29 2022 |
Anchal | Update | CDS | c1sus2 all FE models crashed spontaneously | Just a few minutes ago, all models on FE c1sus2 crashed. I'm attaching some important files that can be helpful in investigating this. CDS upgrade team, please take a look.
I fixed this by running following on c1sus2:
controls@c1sus2:~$ rtcds restart --all |
17457
|
Thu Feb 9 10:05:37 2023 |
yuta | Update | CDS | c1sus2 all FE models crashed spontaneously again | I just noticed that c1sus2 crashed again. Following 40m/17335, I fixed it by running
controls@c1sus2:~$ rtcds restart --all
"global diag reset" made all FE STATUS green.
Burt restored at 2023/Feb/8/19:19 for c1sus2 models.
Watchdogs reset for BHD optics and now all look fine. |
17606
|
Mon May 29 11:04:13 2023 |
Paco | Update | CDS | c1sus2 all FE models crashed spontaneously again | c1sus2 crashed again. Following 40m/17335, I fixed it by running
controls@c1sus2:~$ rtcds restart --all
"global diag reset" made all FE STATUS green burt restored at 2023/May/28/00:19 for c1sus2 models, and watchdogs reset for BHD optics and now all look fine.
Other optics, including MC1, MC2, and MC3 were not damped, so maybe c1sus crashed too I also ran
controls@c1sus:~$ rtcds restart --all
"global diag reset" made all FE STATUS green, burt restored at 2023/May/28/00:19 for c1sus models and watchdogs reset for all other suspensions.
TT1 and TT2 were not responding, and the DAC monitors were frozen... so I ran
./opt/rtcds/caltech/c1/Git/40m/scripts/cds/restartAllModels.sh
Wed May 31 14:58:06 2023 UPDATE: I forgot to log that ITMX, ITMY and BS oplevs were centered after some nominal alignment was recovered. |
|