Worked further on this. I skimmed through a few resources to look for details of what pre-processing can be done. Here (am planning to convert all these resources, particularly those I come across for GANs into either a README on the repo or a Wiki soon) are some of the useful things I found during today's reading. The work I skimmed through today mostly pointed to the use of a median filter for pre-processing, if any is to be done. I am presently using the Sequential() API in Keras to set up the neural network. I will train it tomorrow.
Begun setting up an environment (as mentioned before, on my local machine) and scripts to run experiments with Convolutional networks for beam tracking. All code has been pushed to this folder in the GigEcamera repository. I am presently looking for pre-processing techniques for the video which go beyond the usual "Crop the images! Normalize pixel values! Convert to Grayscale!".
In the previous meeting, Koji pointed out (once again) that I should determine if the displacement values and frames are synchronized before training a network. Pooja did the following last time. Koji also suggested that I first predict the motion (a series of x and y coordintates) and then slide resulting plots around until I get the best match for the original motion. This is however not possible with a neural network based approach as the network learns exactly what you show it and therefore it will learn any mismatch between the labels and the frames and predict exactly that. Therefore I came up with what Koji described as "hacky" method to achieve the same using the opencv work described previously in this elog (the only addition being the application of a mask to block out the OSEMs and work only with the beam spot) .
Hacky technique to sync frames and labels:
[Koji, Milind - 21/06/2019]
Upcoming work (in the order of priority):
Turns out, focusing the GigE is actually a bit tricky. With pylon, everytime I change the exposure or the focus, I'm running into the error I had mentioned earlier in one of my elogs; so I tried using the python scripts to interact with the GigE. But whenever I try to change the focal plane distance by rotating the lens coupler, the ethernet cable connection becomes loose and the camera server needs to be relaunched every now and then. Also, everytime we want to change the distance between the lenses, the telescope needs to be dismantled and refocused again. I'll try to come up with a better telescope design for this.
Yesterday, I had focused the GigE using a low exposure time and small aperture of iris, to make sure that we are actually seeing a sharp image of the beam spot. I'm attaching a picture of the beam spot I had clicked while focusing it, unfortunately, I forgot to take a picture after I had focused it completely. I'm also attaching a picture of the final setup for future reference.
Yesterday night, Rana asked me to lock the MC2. I figured that the PSL shutter was closed; I just opened it and was able to see the beam spot on the analog camera screen.
I discussed this with Gautam and he asked me to come up with a list of signals that I would need for my use and then design the data acquisition task at a high level before proceeding. I'm working on that right now. We came up with a very elementary sketch of what the script will do-
Tomorrow I will try and prepare a dummy script for this before the meeting at noon. Gautam asked me to familiarize myself with the awg, cdsutils (I have already used ezca before) to write the script. This will also help me do the following two tasks-
I got to speak to Gabriele about the project today and he suggested that if I am using Rana's memory based approach, then I had better be careful to ensure that the network does not falsely learn to predict a sinusoid at all points in time and that if I use the frame wise approach I try to somehow incorporate the fact that certain magnitudes and frequencies of motion are simply not physically possible. Something that Rana and Gautam emphasized as well.
I am pushing the code that I wrote for
to the GigEcamera repository.
Gautam also asked me to look at Jigyasa's report and elog 13443 to come up with the specs of a machine that would accomodate a dedicated camera server.
Yesterday, Rana asked me to look at Hiro Yamamoto's docs on the DCC to improve the simulation. I'm performing a first pass (=> Just skimming through to see if they're relevant, I will go through them more carefully soon!) and putting up stuff here for future reference. @Kruthi's help much appreciated!
The GigE is focused now (judged by eye) and I have closed the lid. I'm attaching a picture of the MC2 beam spot, captured using GigE at an exposure time of 400µs.
What was the solution to resolving the flaky video streaming during the alignment process????
-> I think, the issue was with either the poor wireless network conection or the GigE-PoE ethernet cable.
For the beam spot position tracking, I am wondering if there is any benefit to going for a wider field of view and getting the OSEMs in the frame? It may provide some "anchor points" against which whatever algorithm can calibrate the spot position against. But there are also several point scatterers visible in the current view, and perhaps the Gaussiam beam profile moving over them and tracking the scattered intensity from these point scatterers serves the same function? I don't know of a good solution to have a "switchable" field of view configuration in the already cramped camera enclosure though.
Also, I think it may be useful to have a cron job take a picture of MC2 and archive it (once a week? or daily?) to have some long term diagnostic of how the scattered light received by the camera changes over several months.
The GigE is focused now and I have closed the lid. I'm attaching a picture of the MC2 beam spot, captured using GigE at an exposure time of 400µs
And finally, a network is trained!
Result summary (TLDR :-P) : No memory was used. Model trained. Results were garbage. Will tune hyperparameters now. Code pushed to github.
More details of the experiment:
What I did:
What I saw
What I think is going wrong-
Well, what now?
Experiment file: train.py
Finding the gain of the Photodiode: The three-position rotary switch of the photodiode being used (PDA520) wasn't working, so I determined its gain by making a comparative measurement between ophir power meter and the photodiode. The photodiode has a responsitivity of 0.34 A/W at 1064 nm (obtained from the responsitivity curve given in the spec sheet using a curve digitizing software). Using the following equation, I determined the gain setting, which turned out to be 20dB.
Setup: Here a 1050nm (closest we have to 1064nm) LED is used as the light source instead of a laser to eliminate the effects caused by coherence of a laser source, which might affect our radiometric calibration. The LED is placed in a box with a hole of diameter 5mm (aperture angle = 40 degrees approx.). Suitable lenses are used to focus the light onto a white paper, which is fixed at an arbitrary angle and serves as a Lambertian scatterer. To make a comparative measurement between the photodiode (PDA520) and GigE, we need to account for their different sensor areas, 8.8mm (aperture diameter) and 3.7mm x 2.8 mm respectively . This can be done by either using an iris with a common aperture so that both the photodiode and GigE receive same amount of light , or by calculating the power incident on GigE using the ratio of sensor areas and power incident on the photodiode (here we are using the fact that power scattered by Lambertian scatterer per unit solid angle is constant).
Calibration of GigE 152 unit: I took around 50 images, starting with an exposure time of 2000 in steps of 2000, using the exposure_variation.py code. But the code doesn't allow us to take images with an exposure time greater than 100 ms, so I took few more images at higher exposures manually. From each image I subtracted a dark image (not in the sense of usual CCD calibration, but just an image with same exposure time and no LED light). These dark images do the job of usual dark frame + bias frame and also account for stray lights. A plot of pixel sum vs exposure time is attached. From a linear fit for the unsaturated region, I obtained the slope and calculated the calibration factor.
Result: CF = 1.91x 10^-16 W-sec/counts Update: I had used a wrong value for the area of photodiode. On using 61.36 mm^2 as the area, I got 2.04 x 10^-15 W-sec/counts.
I'll put the uncertainities soon. I'm also attaching the GigE spectral response curve for future reference.
I wanted to try out the unstick.py script on c1aux but kept running into timeout errors. I was also confronted by a blank GigE screen. Further, couldn't telnet into c1aux using telnet c1aux as described here. Therefore, I went in and keyed the c1aux crate (1Y1).
Today, I read a lot more about BRDF and modelling but could not make much headway regarding the implementation in the simulation. I've stopped for now and I'll take a crack at it tomorrow again.
Tried collecting data today. Was unable to keep the camera_server code running for any length of time as it threw segfaults. Will take a shot again tomorrow.
The quoted elog has figures which indicate that the network did not learn (train or generalize) on the used data. This is a scary thing as (in my experience) it indicates that something is fundamentally wrong with either the data or model and learning will not happen despite how hyperparameters are tuned. To check this, I ran the training experiment for nearly 25 hyperparameter settings (results here)with the old data and was able to successfully overfit the data. Why is this progress? Well, we know that we are on the right track and the task is to reduce overfitting. Whether, that will happen through more hyperparameter tuning, data collection or augmentation remains to be seen. See attachments for more details.
Why is the fit so perfect at the start and bad later? Well, that's because the first 90% of the test data is the training data I overfit to and the latter the validation data that the network has not generalized well to.
The beam splitter (BS1-1064-33-2037-45S) that is currently being used has an antireflection coating on the second surface and a wedge of less than 5 arcmin; yet it leads to ghosting as shown in the figure attached (courtesy: Thorlabs). I'm also attaching its spec sheet I dug up on internet for future reference.
I came across pellicle beamsplitters, that are primarily used to eliminate ghost images. Pellicle beamsplitters have a few microns thick nitrocellulose layer and superimpose the secondary reflection on the first one. Thus the ghost image is eliminated.
Should we go ahead and order them? (https://www.thorlabs.com/newgrouppage9.cfm?objectgroup_id=898
After the two earthquakes, I collected some data by dithering the optic and recording the QPD readings. Today, I set up scripts to process the data and then train networks on this data. I have pushed all the code to github. I attempted to train a bunch of networks on the new data to test if the code was alright but realised quickly that, training on my local machine is not feasilble at all as training for 10 epochs took roughly 6 minutes. Therefore, I have placed a request for access to the cluster and am waiting for a reply. I will now set up a bunch of experiments to tune hyperparameters for this data and see what the results are.
Trainng networks with memory
I set up a network to handle input volumes (stacks of frames) instead of individual frames. It still uses 2D convolution and not 3D convolution. I am currently training on the new data. However, I was curious to see if it would provide any improved performance over the results I put up in the previous elog. After a bit of hyperparameter tuning, I did get some decent results which I have attached below. However, this is for Pooja's old data which makes them, ah, not so relevant. Also, this testing isn't truly representative because the test data isn't entirely new to the network. I am going to train this network on the new data now with the following objectives (in the following steps):
I hope this looks alright? Rana also suggested I try LSTMs today. I'll maybe code it up tomorrow. What I have in mind- A conv layer encoder, flatten, followed by an LSTM layer (why not plain RNNs? well LSTMs handle vanishing gradients, so why the hassle).
you have to use a BS with a larger wedge angle (5 arcmin ~ 1 mrad) so that the beams don't overlap on the camera
I received access today. After some incredible hassle, I was able to set up my repository and code on the remote system. Following this, Gautam wrote to Gabriele to ask him about which GPUs to use and if there was a previously set up environment I could directly use. Gabriele suggested that I use pcdev2 / pcdev3 / pcdev11 as they have good gpus. He also said that I could use source ~gabriele.vajente/virtualenv/bin/activate to use a virtualenv with tensorflow, numpy etc. preinstalled. However, I could not get that working, Therefore I created my own virtual environment with the necessary tensorflow, keras, scipy, numpy etc. libraries and suitable versions. On ssh-ing into the cluster, it can be activated using source /home/millind.vaddiraju/beamtrack/bin/activate. How do I know everything works? Well, I trained a network on it! With the new data. Attached (see attachment #1) is the prediction data for completely new test data. Yeah, its not great, but I got to observe the time it takes for the network to train for 50 epochs-
Therefore, I will carry out all training only on this machine from now.
Note to self:
Steps to repeat what you did are:
I attempted to train a bunch of networks on the new data to test if the code was alright but realised quickly that, training on my local machine is not feasilble at all as training for 10 epochs took roughly 6 minutes. Therefore, I have placed a request for access to the cluster and am waiting for a reply. I will now set up a bunch of experiments to tune hyperparameters for this data and see what the results are.
I trained a bunch (around 25 or so - to tune hyperparameters) of networks today. They were all CNNs. They all produced garbage. I also looked at lstm networks with CNN encoders (see this very useful link) and gave some thought to what kind of architecture we want to use and how to go about programming it (in Keras, will use tensorflow if I feel like I need more control). I will code it up tomorrow after some thought and discussion. I am not sure if abandoning CNNs is the right thing to do or if I should continue probing this with more architectures and tuning attempts. Any thoughts?
Right now, after speaking to Stuart (ldas_admin) I've decided on coding up the LSTM thing and then running that on one machine while probing the CNN thing on another.
Update on 10 July, 2019: I'm attaching all the results of training here in case anyone is interested in the future.
On Friday, I took images for different power outputs of LED. I calculated the calibration factor as explained in my previous elog (plots attached).
Power incident on photodiode (W)
To estimate the uncertainity, I assumed an error of at most 20mV (due to stray lights or difference in orientation of GigE and photodiode) for the photodiode reading. Using the uncertainity in slope from the linear fit, I expect an uncertainity of maximum 4%. Note: I haven't accounted for the error in the responsivity value of the photodiode.
Johannes had reported CF as 0.0858E-15 W-sec/counts for 12 bit images, with measured a laser source. This value and the one I got are off by a factor of 25. Difference in the pixel formats and effect of coherence of the light used might be the possible reasons.
I've set up network with a CNN encoder (front end) feeding into a single LSTM cell followed by the output layer (see attachment #1). The network requires significantly more memory than the previous ones. It takes around 30s for one epoch of training. Attached are the predicted yaw motion and the fft of the same. The FFT looks rather curious. I still haven't done any tuning and these are only the preliminary results.
Rana also suggested I try LSTMs today. I'll maybe code it up tomorrow. What I have in mind- A conv layer encoder, flatten, followed by an LSTM layer (why not plain RNNs? well LSTMs handle vanishing gradients, so why the hassle).
Well, what about the previous conv nets?
What I observed:
What I think is going wrong:
What I will try now:
I've taken the MC2 analog camera down and put another GigE (unit 151) in its place. This is just temporary and I'll put the analog camera back once I finish the MC2 loss map calibration. I'm using a 25mm focal length camera lens with it and it gives a view of MC2 similar to the analog camera one. But I don't think it is completely focused yet (pictures attached).
...more to follow
gautam - Attachment #3 is my (sad) attempt at finding some point scatterers - Kruthi is going to play around with photUtils to figure out the average size of some point scatterers.
[Kruthi, Yehonathan, Gautam]
Today evening, Yehonathan and I aligned the MC2 cameras. As of now there are 2 GigEs in the MC2 enclosure. For the temporary GigE (which is the analog camera's place), we are using an ethernet cable connection from the Netgear switch in 1x6. The MC2 was misaligned and the autolocker wasn't able to lock the mode cleaner. So, Gautam disabled the autolocker and manually changed the settings; the autolocker was able to take over eventually.
I did a whole lot of hyperparameter tuning for convolutional networks (without 3d convolution). Of the results I obtained, I am attaching the best results below.
The lower the power of the error signal (difference between the true and predicted X and Y positions), essentially mse, on the test data, the better the performance of the model. Of the trained models I had, I chose the one with the lowest mse.
(Note: Attachment 6 and 7 present information regarding a fraction of the data. However, the behaviour remains the same for the rest of the data.)
Observations and analysis:
P.S. I will also try the 2D convolution followed by the 1D convolution thing now.
P.P.S. Gabriele suggested that I try average pooling instead of max pooling as this is a regression task. I'll give that a shot.
Experiment file: train_both.py
See Attachment #2.
Make the MSE a subplot on the same axes as the time series for easier interpretation.
Do I need to provide any more details here?
Describe the training dataset - what is the pk-to-pk amplitude of the beam spot motion you are using for training in physical units? What was the frequency of the dither applied? Is this using a zoomed-in view of the spot or a zoomed out one with the OSEMs in it? If the excursion is large, and you are moving the spot by dithering MC2, the WFS servos may not have time to adjust the cavity alignment to the nominal maximum value.
What is the minimum detectable motion given the CCD resolution?
see attachment #4.
I wrote what I think is a handy script to observe if the frames are saturated. I thought this might be handy for if/when I collect data with higher exposure times. I assumed there was no saturation in the images because I'd set the exposure value to something low. I thought it'd be useful to just verify that. Attachment #3 has log scale on the x axis.
What is the significance of Attachment #6? I think the x-axis of that plot should also be log-scaled.
This afternoon Gautam and I assessed what to do about restoring the GigE camera software. Here's what I propose:
I've started resolving the many dependencies of this code on rossa. The idea is to get a working environment on one workstation, then generate requirements files that can be used to set up the rest of the machines. I believe the dependencies have all been installed. However, many of the packages are newer versions than before, and this seems to have broken SnapPy. I'll continue debugging this tomorrow.
I have been trying a couple of HDR algorithms, all of them seem to give very different results. I don't know how suitable these algorithms are for our purpose, because they are more concerned with final display. I'm attaching the HDR image I got by modifying Jigyasa's code a bit (this image has been be modified further to make it suitable for displaying). Here, I'm trying compare the plots of images that look similar. The HDR image has a dynamic ratio of 700:1
PS: 300us_image.png file actually looks very similar to HDR image on my laptop (might be an issue with elog editor?). So I'm attaching its .tiff version also to avoid any confusion.
I upgraded Pylon, the C/C++ API for the GigE cameras, to the latest release, 5.2.0. It is installed in the same location as before, /opt/rtcds/caltech/c1/scripts/GigE/pylon5, so environment variables do not change. The old version, 5.0.12, still exists at opt/rtcds/caltech/c1/scripts/GigE/backup_pylon5.
The package contains a GUI application (/bin/PylonViewerApp) for streaming video. The old version supports saving still images, but Milind discovered that the new version supports saving video as well. This required installing a supplementary package supporting MPEG-4 output.
Basler's GUI application is launched from the terminal using the alias pylon. I've tested it and confirm it can save both videos and still-image formats. I recommend to also try grabbing images using this program and check the bit resolution. It would be a useful diagnostic to know whether it's a bug in Joe B.'s code or something deeper in the camera settings.
At the lab meeting today, Rana suggested that I use the Pylon app to collect more data if that's what I need. Following this, Jon helped me out by updating the pylon version and installing additional software to record video. Now I am collecting data at
Consequently I have dithered the MC2 optic from around 9:00 PM.
Since there are multiple SURF projects that rely on the cameras:
My changes were necessary because the grabHDR.py script was throwing python exceptions, whereas it was running just fine before Jon's changes. We can move the "new_*" dirs to the default once the SURFs are gone.
Let's freeze the camera software config in this state until next week.
Somehow I never got around to doing the pixel sum thing for the new real data from the GigE. Since I have to do it for the presentation, I'm putting up the results here anyway. I've normalized this and computed the SNR with the true readings.
SNR = (power in true readings)/ (power in error signal between true and predicted values)
Attachment #2 is SNR of best performing CNN for comparison.
I'll keep developing the camera server on a parallel track using the "new_..." directory naming convention. One thing I forgot to note is that the new pylon/pypylon packages require Python 3, so will not work with any of the 2.7 scripts. All of the environment I need to set up is exclusively Python 3. I won't change anything in the Python 2.7 environment in current use.
Also, I found the source of the bit resolution issue: Joe B's code loads a set of initialization parameters from a config file. One of them is "Frame Type = Mono8" which sets the dynamic range of the stream. I'll look into how this should be changed.
I've started setting up the last new rackmount SuperMicro as a dedicated server for the GigE cameras. The new machine is currently sitting on the end of the electronics test bench. It is assigned the hostname c1cam at IP 192.168.113.116 on the martian network. I've installed Debian 10, which will be officially supported until July 2024.
I've added the /cvs/cds NFS mount and plan to house all the client/server code on this network disk. Any dependencies that must be built from source will be put on the network disk as well. Any dependencies that can be gotten through the package manager, however, will be installed locally but in an automated way using a reqs file.
We should ask Chub to reorder several more SuperMicro rackmount machines, SSD drives, and DRAM cards. Gautam has the list of parts from Johannes' last order.
I've put the analog camera back and disconnected the 151 unit GigE. But I ran out of time and wasn't able to replace the beamsplitter. I've put all the equipments back to the place where I took them from. The chopper and beam dump mount, that Koji had got me for the scatterometer, are kept outside, on the table I was working on earlier, in the control room. The camera lenses, additional GigEs, wedge beamsplitter, 1050nm LED and all related equipments are kept in the GigE box. This box was put back into CCD cameras' cabinet near the X arm.
Note: To clean stuff up, I had entered the lab around 9.30pm on Monday. This might have affected Yehonathan's loss measurement readings (until then around 57 readings had been recorded).
Sorry for the late update.
Following the death of rossa, which was hosting the only working environment for the GigE camera software, I've set up a new dedicated rackmount camera server: c1cam (details here). The Python server script is now configured as a persistent systemd service, which automatically starts on boot and respawns after a crash. The server depends on a set of EPICS channels being available to control the camera settings, so c1cam is also running a softIOC service hosting these channels. At the moment only the ETMX camera is set up, but we can now easily add more cameras.
Instructions for connecting to a live video feed are posted here. Any machine on the martian network can stream the feed(s). The only requirement is that the client machine have GStreamer 0.10 installed (all the control room workstations satisfy this).
As much as possible, the code and dependencies are hosted on the /cvs/cds network drive instead of installed locally. The client/server code and the Pylon5, PyPylon, and PyEpics dependencies are all installed at /cvs/cds/rtcds/caltech/c1/scripts/GigE. The configuration files for the soft IOC are located at /cvs/cds/caltech/target/c1cam.
The 40m GigE camera code is a slightly-updated version of the 10+ year-old camera code in use at the sites. Consequently every one of its dependencies is now deprecated. Ultimately, we'd like to upgrade to the following:
This is a long-term project, however, as many of these APIs are very different between Python 2 and 3.
We noticed last week that the MC2 trans camera has pitch and yaw swapped; I rotated what I thought is the correct camera by 90 degrees clockwise (as viewed from above, like in the attachment), but I now have doubts. It's the camera on the right in the attachment.
The left one is analog and 90deg rotated.
See also: This issue tracker
MC2 analog camera was rotated by 90 degrees. Orientation correctness was verified by exciting the MC2 Yaw degree of freedom.
Attached before and after photos of the camera setup.
There's this elog from Stephen about better 1064 sensitivity from Basler. We should consider getting one if he finds that its actual SNR is as good as we would expect from the QE improvement.
Might allow for better scatter measurements - not that we need more signal, but it could allow us to use shorter exposure times and reduce blurring due to the wobbly beams.
Nice, and we should also permanently install the camera server (c1cam) which is still sitting on the electronics bench. It is running an adapted version of the Python 2/Debian 8 site code. Maybe if COVID continues long enough I'll get around to making the Python 3 version we've long discussed.
Note from Stephen on more sensitive Baslers.
We found Mon7 in control room dead today afternoon. It's front power on green light is not lighting up. All other monitors are working as normal.
This monitor was used for looking at IMC camera analog feed. It is one of the most important monitors for us, so we should replace it with a different monitor.
Yehonathan and Paco disconnected the monitor and brought it down. We put it under the back table if anyone wants to fix it. Paco has ordered a BNC to VGA/HDMI converter to put in any normal monitor up there. It will happen this Wednesday. Meanwhile, I have changed the MON4 assignment from POP to Quad2 to be used for IMC.
We replaced the Mon 7 with an LCD monitor from back bench. It is fed the analog signal from BNC converted into VGS with a converter box that Paco bought. We can replace this monitor with another monitor if it is required on the back bench. For now, we definitely need a monitor to show IMC camera's up there.
Tested the Nikon batteries for the camera. they are supposed to be 7V batteries but they don't hold a charge. I confirmed this with multi-meter after charging for days. Ordered new ones Nikon EN-EL9
I believe that the Nikon has an exposure problem and that's why we bought the Canon.
I swapped the 1 inch BS and lenses along the POP beam to clear the apertures and avoid clipping this beam. The results are illustrated by the attached pictures; this was done right after Yuta had optimized IFO alignment so it's hopefully a good reference from now on. Yuta also tuned the alignment of BHDC path in ITMY table, which mostly improved the alignment to DCPD A (90-ish counts improved to 100-ish counts with ITMY single bounce).
Coming in this morning, I found ITMX Camera malfunctioning.
I found that an old BNC cable for ITMXF video existed so I first tried swapping both ends of the cable, one on the ITMX viewport and the other one in the video MUX input in the rear. This didn't fix the issue.
I searched around in the CCD cabinet by XARM and found an identical analog camera so I swapped it and got the same image ...
I then searched for a AC/DC supply cable, but couldn't find one.
The issue was the power supply.
Thus far, the software needed for the Magewell video encoder has been successfully installed on Donatella. OBS studio has also been installed and works correctly. OBS will be the video recording software that can be interfaced via command line once the SDI video encoder starts working. (https://github.com/muesli/obs-cli)
So far, the camera can not be connected to the Magewell encoder. The encoder continues to have a pulsing error light that indicates "no signal" or "signal not locked". I have begun testing on a secondary camera, directly connected to the Magewell encoder with similar errors. This may be able to be resolved once more information about the camera and its specifications/resolution is uncovered. At this time I have not found any details on the LCL-902K by Watec that was given to me by Koji. I will begin looking into the model used in the 40 meter next.