After the two earthquakes, I collected some data by dithering the optic and recording the QPD readings. Today, I set up scripts to process the data and then train networks on this data. I have pushed all the code to github. I attempted to train a bunch of networks on the new data to test if the code was alright but realised quickly that, training on my local machine is not feasilble at all as training for 10 epochs took roughly 6 minutes. Therefore, I have placed a request for access to the cluster and am waiting for a reply. I will now set up a bunch of experiments to tune hyperparameters for this data and see what the results are.
Trainng networks with memory
I set up a network to handle input volumes (stacks of frames) instead of individual frames. It still uses 2D convolution and not 3D convolution. I am currently training on the new data. However, I was curious to see if it would provide any improved performance over the results I put up in the previous elog. After a bit of hyperparameter tuning, I did get some decent results which I have attached below. However, this is for Pooja's old data which makes them, ah, not so relevant. Also, this testing isn't truly representative because the test data isn't entirely new to the network. I am going to train this network on the new data now with the following objectives (in the following steps):
- Train on data recorded at one frequency, generalize/ test on unseen data of the same frequency, large amplitude of motion
- Train on data recorded at one frequency, generallize/ test on unseen data of a different frequency, large amplitude of motion
- Train on data recorded at one frequency, generalize/ test on unseen data of same/ different frequency, small amplitude of motion
- Train on data at different frequencies and generalize/ test on data with a mixture of frequencies at small amplitudes - Gautam pointed out that the network would truly be superb (good?) if we can just predict the QPD output from the video of the beam spot when nothing is being shaken.
I hope this looks alright? Rana also suggested I try LSTMs today. I'll maybe code it up tomorrow. What I have in mind- A conv layer encoder, flatten, followed by an LSTM layer (why not plain RNNs? well LSTMs handle vanishing gradients, so why the hassle).
Quote: |
The quoted elog has figures which indicate that the network did not learn (train or generalize) on the used data. This is a scary thing as (in my experience) it indicates that something is fundamentally wrong with either the data or model and learning will not happen despite how hyperparameters are tuned. To check this, I ran the training experiment for nearly 25 hyperparameter settings (results here)with the old data and was able to successfully overfit the data. Why is this progress? Well, we know that we are on the right track and the task is to reduce overfitting. Whether, that will happen through more hyperparameter tuning, data collection or augmentation remains to be seen. See attachments for more details.
Why is the fit so perfect at the start and bad later? Well, that's because the first 90% of the test data is the training data I overfit to and the latter the validation data that the network has not generalized well to.
|
|