The camera server keeps throwing the error: failed to grab frames. Milind suggested that it might a problem with the ethernet cable, so I even unplugged it and connected it again; it still had the same issue. One more thing I noticed was, it does take snapshots sometimes with the terminal command caput C1:CAM-ETMX_SNAP 1, but produces a segmentation fault when ezca.Ezca().write(C1:CAM-ETMX_SNAP, 1) ezca.Ezca().write(CAM-ETMX_SNAP, 1) is used via ipython. When the terminal command also fails to take snapshots, I noticed that the SNAP button on the GigE medm screen remains on and doesn't switch back to OFF like it is supposed to.
Can you please be more specific about what the error is? Is this the usual instability with the camera server code? Or was it something new?
The camera server is throwing an error and is not grabbing snapshots :(
DATED, SEE ELOG14941 for the most up-to-date info on latch.py.
I wrote (/cvs/cds/caltech/target/c1iscaux3/latch.py) and tested the logic illustrated in Attachment #1. Results of a test are shown in Attachment #2, the various channels change as expected. Note that for negative values of the gain channel, the corresponding "BITS" channel will take on values like 65536 - this is because the mbboDirect data type is a 16 bit data type, and presumably the MSB is the sign bit. A bit mask is applied to this channel before the actual BIO unit bits are set - we should verify that the correct behavior happens, but I don't immediately see any problems.
To me, this is a robust logic, but it will benefit from more sets of eyes giving it a look over. The idea is to run this continuously on the Supermicro machine.
Apart from this, I also fixed some errors in the mbboDirect record syntax - so now I am able to start up the EPICS server without it throwing any error messages. It remains to verify that changing an EPICS gain slider results in the appropriate gain bits being flipped in the correct way (on the hardware side, I think the correct behavior is happening on the software end). For this testing, I turned off the old c1iscaux crate at ~10am, and started up the server on c1iscaux3. I am reverting to the nominal config now (~1pm).
Further testing will require the wiring inside the Acromag chassis to be completed. This should be the priority task for next week.
*Update 1130 22 July 2019: I've now installed the required dependencies on c1iscaux3 and setup the latch.py script to run as a systemctl process dependent on modbusIOC.service.
I'm running the MC2 loss map scripts on pianosa now. The camera server is throwing an error and is not grabbing snapshots :(
Update: I finished taking the readings for MC2 loss map. I couldn't get the snapshots with the script, so I manually took some 4-5 pictures.
See Attachment #2.
Make the MSE a subplot on the same axes as the time series for easier interpretation.
Do I need to provide any more details here?
Describe the training dataset - what is the pk-to-pk amplitude of the beam spot motion you are using for training in physical units? What was the frequency of the dither applied? Is this using a zoomed-in view of the spot or a zoomed out one with the OSEMs in it? If the excursion is large, and you are moving the spot by dithering MC2, the WFS servos may not have time to adjust the cavity alignment to the nominal maximum value.
What is the minimum detectable motion given the CCD resolution?
see attachment #4.
I wrote what I think is a handy script to observe if the frames are saturated. I thought this might be handy for if/when I collect data with higher exposure times. I assumed there was no saturation in the images because I'd set the exposure value to something low. I thought it'd be useful to just verify that. Attachment #3 has log scale on the x axis.
What is the significance of Attachment #6? I think the x-axis of that plot should also be log-scaled.
The boards arrived. I soldered on a DIN96 connector, and tested that the goemetry will work. It does . The only constraint is that the P2 interface board has to be installed before the P1 interface is installed. Next step is to confirm that the pin-mapping is correct. The pin mapping from the DIN96 connector to the DB15 was also verified.
*Maybe it isn't obvious from the picture, but there shouldn't be any space constraint even with the DB37/DB15 cables connected to the respective adapter boards.
SnapPy scripts made to work on Pianosa.
Of course rossa was the only machine in the lab that could run the python scripts to interface with the GigE camera. And it is totally bricked now. Lame.
So I installed several packages. The key was to install pypylon - if you go to the basler webpage, pypylon1.4.0 does not offer python2.7 support for x86_64 architecture, so I installed pypylon1.3.0. Here are the relevant lines from the changelog:
gstreamer-plugins-bad-0.10.23-5.el7.x86_64 Sat 20 Jul 2019 11:22:21 AM PDT
gstreamer-plugins-good-0.10.31-13.el7.x86_64 Sat 20 Jul 2019 11:22:11 AM PDT
gstreamer-plugins-ugly-0.10.19-31.el7.x86_64 Sat 20 Jul 2019 11:20:08 AM PDT
gstreamer-python-devel-0.10.22-6.el7.x86_64 Sat 20 Jul 2019 10:34:35 AM PDT
pygtk2-devel-2.24.0-9.el7.x86_64 Sat 20 Jul 2019 10:34:34 AM PDT
pygobject2-devel-2.28.6-11.el7.x86_64 Sat 20 Jul 2019 10:34:33 AM PDT
pygobject2-codegen-2.28.6-11.el7.x86_64 Sat 20 Jul 2019 10:34:33 AM PDT
gstreamer-devel-0.10.36-7.el7.x86_64 Sat 20 Jul 2019 10:34:32 AM PDT
gstreamer-python-0.10.22-6.el7.x86_64 Sat 20 Jul 2019 10:34:31 AM PDT
gtk2-devel-2.24.31-1.el7.x86_64 Sat 20 Jul 2019 10:34:30 AM PDT
libXrandr-devel-1.5.1-2.el7.x86_64 Sat 20 Jul 2019 10:34:28 AM PDT
pango-devel-1.42.4-1.el7.x86_64 Sat 20 Jul 2019 10:34:27 AM PDT
harfbuzz-devel-1.7.5-2.el7.x86_64 Sat 20 Jul 2019 10:34:26 AM PDT
graphite2-devel-1.3.10-1.el7_3.x86_64 Sat 20 Jul 2019 10:34:26 AM PDT
pycairo-devel-1.8.10-8.el7.x86_64 Sat 20 Jul 2019 10:34:25 AM PDT
cairo-devel-1.15.12-3.el7.x86_64 Sat 20 Jul 2019 10:34:25 AM PDT
mesa-libEGL-devel-18.0.5-3.el7.x86_64 Sat 20 Jul 2019 10:34:24 AM PDT
libXi-devel-1.7.9-1.el7.x86_64 Sat 20 Jul 2019 10:34:24 AM PDT
pygtk2-doc-2.24.0-9.el7.noarch Sat 20 Jul 2019 10:34:23 AM PDT
atk-devel-2.28.1-1.el7.x86_64 Sat 20 Jul 2019 10:34:21 AM PDT
libXcursor-devel-1.1.15-1.el7.x86_64 Sat 20 Jul 2019 10:34:20 AM PDT
fribidi-devel-1.0.2-1.el7.x86_64 Sat 20 Jul 2019 10:34:20 AM PDT
pixman-devel-0.34.0-1.el7.x86_64 Sat 20 Jul 2019 10:34:19 AM PDT
libXinerama-devel-1.1.3-2.1.el7.x86_64 Sat 20 Jul 2019 10:34:19 AM PDT
libXcomposite-devel-0.4.4-4.1.el7.x86_64 Sat 20 Jul 2019 10:34:19 AM PDT
libicu-devel-50.1.2-15.el7.x86_64 Sat 20 Jul 2019 10:34:18 AM PDT
gdk-pixbuf2-devel-2.36.12-3.el7.x86_64 Sat 20 Jul 2019 10:34:17 AM PDT
pygobject2-doc-2.28.6-11.el7.x86_64 Sat 20 Jul 2019 10:34:16 AM PDT
pygtk2-codegen-2.24.0-9.el7.x86_64 Sat 20 Jul 2019 10:34:15 AM PDT
Camera server is running on a tmux session on pianosa. But it keeps throwing up some gstreamer warnings/errors, and periodically (~every 20 mins) crashes. Kruthi tells me that this behavior was seen on Rossa as well, so whatever the problem is, doesn't seem to be because I missed out on installing some packages on pianosa. Moreover, if the server is in fact running, I am able to take a snapshot - but the camera client does not run.
What channels are you trying to read?
I'm not able to get trends of the TM adjustment test that Rana had asked us to perform, from the dataviewer. It's throwing the following error:
Connecting to NDS Server fb (TCP port 8088)
Server error 7: connect() failed
datasrv: DataWrite failed: daq_send: Resource temporarily unavailable
T0=19-07-20-01-27-39; Length=600 (s)
No data output.
Connecting to NDS Server fb (TCP port 8088)
Server error 7: connect() failed
datasrv: DataWrite failed: daq_send: Resource temporarily unavailable
T0=19-07-20-01-27-39; Length=600 (s)
No data output.
The database files for C1ISCAUX seem to work file - the exception being the mbbo channels for the CM board.
This was just a software test - the actual functionality of the channels will have to be tested once the Acromag crate has been installed in the rack. One change I had to make on the MEDM screen for the LSC PD whitening gains was to get rid of the "NMS" suffix on the EPICS channel names for whitening gain sliders/drop-down-menus. I suspect this has to do with the EPICS version we are using, 7.0.1. Furthermore, AS165 and POP55 no longer exist - I hold off removing them from the MEDM screen for the moment.
From the software point of view, the major steps are:
I am stopping the EPICS server on the new machine and restarting the old VME crate over the weekend.
For some reason, rossa's Xdisplay won't start up anymore. This happened right after the UPS reset. Koji and I tried ~1.5 hours of debugging, got nowhere.
I did a whole lot of hyperparameter tuning for convolutional networks (without 3d convolution). Of the results I obtained, I am attaching the best results below.
The lower the power of the error signal (difference between the true and predicted X and Y positions), essentially mse, on the test data, the better the performance of the model. Of the trained models I had, I chose the one with the lowest mse.
(Note: Attachment 6 and 7 present information regarding a fraction of the data. However, the behaviour remains the same for the rest of the data.)
Observations and analysis:
P.S. I will also try the 2D convolution followed by the 1D convolution thing now.
P.P.S. Gabriele suggested that I try average pooling instead of max pooling as this is a regression task. I'll give that a shot.
Experiment file: train_both.py
The control room UPS started making a beeping noise saying batteries need replacement. I hit the "Test" button and the beeping went away. According to the label on it, the batteries were last repalced in March 2016, so maybe it is time for a replacement, @Chub, please look into this.
Bulb replaced. Projector is back on.
Rana and I talked about some (genius) options for the large range DC bias actuation on the SOS, which do not require us to supply high-voltage to the OSEMs from outside the vacuum.
What we came up with (these are pretty vague ideas at the moment):
For the thermal option, I remembered that (exactly a year ago to the day!) when we were doing cavity mode scans, once the heaters were turned on, I needed to apply significant correction to the DC bias voltage to bring the cavity alignment back to normal. The mechanism of this wasn't exactly clear to me - furthermore, we don't have a FLIRcam picture of where the heater radiation patter was centered prior to my re-centering of it on the optic earlier this year, so we don't know what exactly we were heating. Nevertheless, I decided to look at the trend data from that night's work - see Attachment #1. This is a minute trend of some ETMY channels from 0000 UTC on 18 July 2018, for 24 hours. Some remarks:
Also see this elog by Terra.
Attachment #2 shows the results from today's heating. I did 4 steps, which are obvious in the data - I=0.6A, I=0.76A, I=0.9A, and I=1.05A.
In science, one usually tries to implement some kind of interpretation. so as to translate the natural world into meaning.
The whitening filter modules have been restored to the crates. The SMA cables have been restored and fastened by a spanner. The ribbon cable to the antialiasing board was also connected. The backplane cables have not been moved from the upper DIN96 connector to the lower one.
Everything is expected to be good, but just keep eyes on the LSC signals as the boards were not quantitatvely tested yet. If you find something suspicious, report on the elog.
[Kruthi, Yehonathan, Gautam]
Today evening, Yehonathan and I aligned the MC2 cameras. As of now there are 2 GigEs in the MC2 enclosure. For the temporary GigE (which is the analog camera's place), we are using an ethernet cable connection from the Netgear switch in 1x6. The MC2 was misaligned and the autolocker wasn't able to lock the mode cleaner. So, Gautam disabled the autolocker and manually changed the settings; the autolocker was able to take over eventually.
Now that the .db files were prepared, I wanted to test for errors. So I did the following:
All the Acromags are seen on the 192.168.114 subnet on c1iscaux3 - however, when I run the modbusIOC process, I see various errors in the logfile , so more debugging is required. Nevertheless, progress.
Update 2245: Turns out the errors were indeed due to a copy/paste error - I had changed the IP addresses for the ADCs from the .115 subnet c1susaux was using, but forgot to do so for the DACs and BIOs. Now, if I turn off the existing c1iscaux so that there aren't any EPICS clashes, the EPICS server initializes correctly. There are still some errors in the log file - these pertain to (i) the mbbo notation, which I have to figure out, and (ii) the fact that this version of EPICS, 7.0.1, does not support channel descriptions longer than 28 characters (we have several that exceed this threshold). I think the latter isn't a serious problem.
Getting closer... Note that I turned off the c1iscaux VME crate to prevent any EPICS server clashes. I will turn it back on tomorrow.
I completed the translation of the .db files for the EPICS database records from the VME notation to the Acromag/Modbus/Asyn notation. The channels are now organized into 5 database files, located in /cvs/cds/caltech/target/c1iscaux3/, for convenience:
For reasons unknown to me, the database files in the other Acromag system target directories (e.g. c1susaux, c1auxex) all had 755 level access permission - maybe this is required for systemctl to handle the EPICS serving? Anyways, I upgraded the permission level of the above 5 files using chmod.
There are almost certainly typos / other errors, and I may have missed copying over some soft/calibrated channels, but I hope that this way of grouping by subsystem will make the debugging less painful. Once Chub connects up the power lines to the Acromags, I will run the soft tests. For this purpose, I've also made a C1_ISC-AUX.cmd file and a C1_ISC-AUX.env file in the above target directory, and also made the modbusIOC.service file in /etc/systemd/system on the supermicro.
Along with the plan in ELOG 14744, the ISC PD interface and the whitening filter board have been modificed. The ISC PD I/Fs were restored to the crate and the cables were connected. The whitening filteres are still on the electronics bench for some more tests before being returned to the crate.
The updated schematics were uploaded as https://dcc.ligo.org/D1900318 and https://dcc.ligo.org/D1900319
- Modification of the ISC PD interface: Jumpers between DIN96 P1 and P2. Replace all AD797s with OP27. In fact only I/F #1 (the left most) had total 12 AD797 but the other units already had OP27s.
- Modification of the whitening filter: Jumpers between DIN96 P1 and P2.
Koji pointed out an important subtlety pertaining to the "LATCH ENABLE" signal line on the CM board. The purpose of this line is to smoothly facilitate the transition of a change in the "multi-bit-binary-outputs", a.k.a. "mbbo", that are controlled by MEDM gain sliders, to the analog electronics on the CM board. Why is this necessary? Imagine changing the gain from 7dB (=0111 in mbbo representation) to 8dB (=1000 in mbbo representation). In order to realize this change, all 4 bits have to change their state. But this almost certainly doesn't happen synchronously, because our EPICS interface isn't synchronous. So at some intermediate times, the mbbo representation could be 0100 (=4dB), or 1111 (=15dB), or many other possible values, which are all significantly different from either the initial value or the desired final state. This is clearly undesirable.
In order to protect against this kind of error, a Latched output part, 74ALS573, is used to buffer the physical digital logic levels from the switches in the analog gain stages. So in the default state, the "LATCH ENABLE" signal line is held "LOW". When a change happens in the EPICS value corresponding to a gain slider, the "LATCH ENABLE" state is quickly toggled to "HIGH", so as to enable the appropriate analog gain stages to be switched, and then again to "LOW", at which point the latch holds its output state. This logic is currently implemented by a piece of code called "latch.o", which is the compiled version of "latch.st", which may be found in /cvs/cds/caltech/target/c1iool0 where it presumably was written for the IMC servo board, but not in /cvs/cds/caltech/target/c1iool0 , which is where the CM board database files reside. The only elog reference I can find pertaining to this particular piece of code is from Alan, and doesn't say anything about the actual logic.
For the new c1iscaux, we need to implement this logic somehow. After discussion between Koji and me, we feel that a piece of python code is sufficient. This would continuously run in the background on the supermicro server machine. The channel hierarchy for each gain channes is as follows (I've taken the example of C1:LSC-CM_REFL1_GAIN):
So the logic will be that it continuously scans the EPICS channel C1:LSC-CM_REFL1_GAIN for a change in set value. When a change is detected, it has to update the C1:LSC-CM_REFL1_SET channel. In the next EPICS refresh cycle, this would result in the mbbo bits, C1:LSC-CM_REFL1_BITS , all changing to the appropriate values. After these changes have happened, we need to toggle the LATCH ENABLE in order to allow the changes to propagate to the analog gain stage switches. Need to think about what's the best way to do this.
I've taken the MC2 analog camera down and put another GigE (unit 151) in its place. This is just temporary and I'll put the analog camera back once I finish the MC2 loss map calibration. I'm using a 25mm focal length camera lens with it and it gives a view of MC2 similar to the analog camera one. But I don't think it is completely focused yet (pictures attached).
...more to follow
gautam - Attachment #3 is my (sad) attempt at finding some point scatterers - Kruthi is going to play around with photUtils to figure out the average size of some point scatterers.
Kruthi noticed that she could not login to rossa from giada.
I checked /etc/resolv.conf and it was
so obviously it is useless to refer localhost (i.e. giada) as a nameserver.
I copied our usual resolv.conf to giada as following:
Giada's ssh known_host had unupdated entry for rossa, so I had to clean it up, but after that we can connect to rossa from giada just by "ssh rossa".
[Kruthi, Gautam, Rana]
Gautam installed Atom text editor on Pianosa yesterday.
MC spot position measurement scripts (these can be found in /scripts/ASS/MC directory):
I worked on preparing for the c1iscaux upgrade a bit today.
All the required additional parts should be here by the end of the week - I'd like to aim for Wednesday 7/24 for the installation in 1Y3 and in-situ testing. While talking to Rana, I realized that we should also factor in the c1aux slow channels into this acromag crate - there is no need for a separate machine to handle the shutters and illuminators. But let's not worry about that for now, those channels can simply be added later.
Yutaro talked about the BIO bug in KAGRA elog. http://klog.icrr.u-tokyo.ac.jp/osl/?r=9536
I think I made the similar change for the 40m model somewhere (don't remember), but be aware of the presense of this bug.
There were several small/medium earthquakes in Ridgecrest and one medium one in Blackhawk CA at about 2000 UTC (i.e. ~ 2 hours ago), one of which caused BS, ITMY, and ETM watchdogs to trip. I restored the damping just now.
In addition to c1psl needing a reboot, megatron was un-ssh-able (although it was responding to ping). Clue was that the NPRO PZT control voltage was drifting a lot on the StripTool trace. Koji hard-rebooted the machine. Now IMC is locked, and FSS slow servo is also running.
Mode cleaner was not locked today. Koji came in and concluded that PSL had died. So we keyed it. Then we ran my unstick.py code. Mode cleaner is locked now.
Today, Gautam keyed the C1PSL crate and we got to test my unstick.py code. It seems to be working fine. Remarks:
Following this, we tested my PMC autolocker code. The code ran for about a minute before achieveing lock. Remarks:
I've set up network with a CNN encoder (front end) feeding into a single LSTM cell followed by the output layer (see attachment #1). The network requires significantly more memory than the previous ones. It takes around 30s for one epoch of training. Attached are the predicted yaw motion and the fft of the same. The FFT looks rather curious. I still haven't done any tuning and these are only the preliminary results.
Rana also suggested I try LSTMs today. I'll maybe code it up tomorrow. What I have in mind- A conv layer encoder, flatten, followed by an LSTM layer (why not plain RNNs? well LSTMs handle vanishing gradients, so why the hassle).
Well, what about the previous conv nets?
What I did:
What I observed:
What I think is going wrong:
What I will try now:
I made some rough measurements, using the setup I had used for CCD calibration, to get an idea of how good of a Lambertian scatterer the white paper is. Following are the values I got:
Note: All the measurements are just rough ones and are prone to larger errors than estimated.
I also measured the transmittance of the white paper sample being used (it consists of 2 white papers wrapped together). It was around 0.002
On Friday, Koji helped me find various components required for the scatterometer setup. Like he suggested, I'll first set it up on the SP table and try it out with an usual mirror. Later on, once I know it's working, I'll move the setup to the flow bench near the south arm and measure the BRDF of a spare end test mass.
On Friday, I took images for different power outputs of LED. I calculated the calibration factor as explained in my previous elog (plots attached).
Power incident on photodiode (W)
To estimate the uncertainity, I assumed an error of at most 20mV (due to stray lights or difference in orientation of GigE and photodiode) for the photodiode reading. Using the uncertainity in slope from the linear fit, I expect an uncertainity of maximum 4%. Note: I haven't accounted for the error in the responsivity value of the photodiode.
Johannes had reported CF as 0.0858E-15 W-sec/counts for 12 bit images, with measured a laser source. This value and the one I got are off by a factor of 25. Difference in the pixel formats and effect of coherence of the light used might be the possible reasons.
Optical chopper borrowed from CryoLab to 40m
All suspension watchdogs were tripped ~90mins ago. I restored the damping. IMC is locked.
ITMX was stuck. I set it free. But notice that the UL Sensor RMS is higher than the other 4? I thought ITMY UL was problematic, but maybe ITMX has also failed, or maybe it's coincidence? Something for IFOtest to figure out I guess. I don't think there is a cable switch between ITMX/ITMY as when I move the ITMX actuators, the ITMX sensors respond and I can also see the optic moving on the camera.
Took me a while to figure out what's going on because we don't have the seis BLRMS - i moved the usual projector striptool traces to the TV screen for better diagnostic ability.
Update 16 July 1515: Even though the RMS is computed from the slow readback channels, for diagnosis, I looked at the spectra of the fast PD monitoring channels (i.e. *_SENSOR_*) for ITMX - looks like the increased UL RMS is coming from enhanced BR-mode coupling and not of any issues with the whitening switching (which seems to work as advertised, see Attachment #3, where the LL traces are meant to be representative of LL, LR, SD and UR channels).
I looked at the PSL/IOO racks to check for which boards, if any, require an additional P2 interface, so that we can try and design a generic one for the IMC/CM boards and whatever else may require it. While searching the elog, I saw that Koji and Johannes had already done this, see Koji's elog in this thread. Some remarks:
Conclusion: Only the IMC Servo and CM boards need their P2 connectors connected to Acromag.It would be helpful to remove the TTFSS Interface board and figure out what exactly the pin-mapping for the backplane connectors are, but I didn't do this today because there is a "High Voltage" line going to the Interface Board and I'm not actually sure of the signal chain for the FSS servo.
Arnaud has taken 1 TT suspension from the 40m clean lab to Downs for modal testing. Estimated time of return is tomorrow evening.
I heard a popping sound in the control room; the projector lightbulb has blown out.
it will connect to a 15 pin breakout board in the Acromag chassis
It's nice and compact, and the cost of new 15-pin DSUB cables shouldn't be a factor here. What does the 15p cable connect to?
I looked into the design of the P2 interface board. The main difficulty here is geometric - we have to somehow accommodate sufficient number of D-sub connectors in the tight space between the two P-type connectors.
I think the least painful option is to stick with Johannes' design for the P1 connector. For the CM board, the P2 connector only uses 6 pairs of conductors for signals. So we can use a D-15 connector instead of 2 D-37 connectors. Then we can change the PCB shape such that the P1 connector can be accommodated (see Attachment #1). The other alternative would be to have 2 P-type connectors and 3 D-subs on the same PCB, but then we have to be extra careful about the relative positioning of the P-type connectors (otherwise they wont fit onto the Eurocrate). So I opted to still have two separate PCBs.
I took a first pass at the design, the files may be found here. I just auto-routed the connections, this is just an electrical feedthrough so I don't think we need to be too concerned about the PCB trace routing? If this looks okay, we should send out the piece for fab ASAP.
I will work on putting together the EPICS server machine (SuperMicro) this afternoon.
2. D040180 / D1500308 Common Mode Board
CM servo board itself doesn't need any modification. The CM board uses P1 and P2. So we need to manufacture a special connector for CM Board P2. (cf The adapter board for P1 T1800260). See also D1700058.
I trained a bunch (around 25 or so - to tune hyperparameters) of networks today. They were all CNNs. They all produced garbage. I also looked at lstm networks with CNN encoders (see this very useful link) and gave some thought to what kind of architecture we want to use and how to go about programming it (in Keras, will use tensorflow if I feel like I need more control). I will code it up tomorrow after some thought and discussion. I am not sure if abandoning CNNs is the right thing to do or if I should continue probing this with more architectures and tuning attempts. Any thoughts?
Right now, after speaking to Stuart (ldas_admin) I've decided on coding up the LSTM thing and then running that on one machine while probing the CNN thing on another.
Update on 10 July, 2019: I'm attaching all the results of training here in case anyone is interested in the future.
I received access today. After some incredible hassle, I was able to set up my repository and code on the remote system. Following this, Gautam wrote to Gabriele to ask him about which GPUs to use and if there was a previously set up environment I could directly use. Gabriele suggested that I use pcdev2 / pcdev3 / pcdev11 as they have good gpus. He also said that I could use source ~gabriele.vajente/virtualenv/bin/activate to use a virtualenv with tensorflow, numpy etc. preinstalled. However, I could not get that working, Therefore I created my own virtual environment with the necessary tensorflow, keras, scipy, numpy etc. libraries and suitable versions. On ssh-ing into the cluster, it can be activated using source /home/millind.vaddiraju/beamtrack/bin/activate. How do I know everything works? Well, I trained a network on it! With the new data. Attached (see attachment #1) is the prediction data for completely new test data. Yeah, its not great, but I got to observe the time it takes for the network to train for 50 epochs-
Therefore, I will carry out all training only on this machine from now.
Note to self:
Steps to repeat what you did are:
I attempted to train a bunch of networks on the new data to test if the code was alright but realised quickly that, training on my local machine is not feasilble at all as training for 10 epochs took roughly 6 minutes. Therefore, I have placed a request for access to the cluster and am waiting for a reply. I will now set up a bunch of experiments to tune hyperparameters for this data and see what the results are.
We noticed that the PRM watchdog was tripping frequently. This is a period of enhanced seismic activity. The reason PRM in particular trips often is because the SIDE OSEM has 5x increased transimpedance. We implemented a workaround by modifying the watchdog tripping condition to scale the SD channel RMS by a factor of 0.2 (relative to the UL and LL channels). We restarted the modbus process on c1susaux and tested that the new logic works. Here is the relevant snippet of code:
# PRM Side is special, see elog 14745
field(DESC,"Tests whether RMS too high")
field(INPA,"C1:SUS-PRM_ULPD_VAR NPP NMS")
field(INPB,"C1:SUS-PRM_PD_MAX_VAR NPP NMS")
field(INPC,"C1:SUS-PRM_LLPD_VAR NPP NMS")
field(INPD,"C1:SUS-PRM_SDPD_VAR NPP NMS")
The db file has a note about this as well so that future debuggers aren't mystified by a factor of 0.2.
The list of the iscaux channels and pin assignments were posted to google drive.
The spreadsheet can be viewable by the link sent to the 40m ML. It was shared with foteee@gmail for full access.
Necessary electronics modification
1. D990694 whitening filter modification (4 modules)
This module shares the fast and slow channels on the top DIN96pin (P1) connector. Also, the whitening selector (done by an analog signal per channel) is assigned over 17pin of the P1 connector, resulting in the necessity of the second DSUB cable. By migrating the fast channels, we can swap the cable from the P1 to P2. Also, the whitening selectors are concentrated on the first Dsub. (See Attachment1 P1)
3. D990543A1 LSC Photodiode Interface
PD I/F board has the DC mon channels spread over the 16pin limit. P1 21A can be connected to 6A so that we can accommdate it in the first Dsub.
Also the board uses AD797s. This is not necessary. We can replace them to OP27s. I actually don't know what is happening to those bias control, temp mon, enable, and status. These features should be disables at the I/F and the PDs. (See Attachment2 P1)
In fact the projector is still working. The lamp timer showed ~8200hrs. I just reset the timer, but not sure it was the cause of the shutdown. I also set the fan mode to be "High Altitude" to help cooling.
Arnaud and I moved one of the two spare TT suspensions from the south clean cabinet to the bake lab clean room. The main purpose was to inspect the contents of the packaging. According to the label, this suspension was cleaned to Class A standards, so we tried to be clean while handling it (frocks, gloves, masks etc). We found that the foil wrapping contained one suspension cage, with what looked like all the parts in a semi-assembled state. There were no OSEMs or electronics together with the suspension cage. Pictures were taken and uploaded to gPhoto. Arnaud is going to plan his tests, so in the meantime, this unit has been stored in Cabinet #6 in the bake lab cleanroom.
There was no green light even though the EX NPRO was on. I checked the doubling oven temperature controller and found that its power cable was loose on the rear. I reconnected it, and now there is green light again.
Last documented replacement in Nov 2018, so ~7 months, which I believe is par for the course. I am disconnecting its power supply cable.