40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop | ||||||||||

| ||||||||||

| ||||||||||

| ||||||||||

As seen from earlier elogs, PID control of the temperature of the cavities seems problematic - the time taken for the temperature to converge to the set-point is very large and moreover, the PID parameters may require non-trivial tuning that varies with the desired set-point. Intelligent controls, specifically neural networks, seem like an attractive upgrade to PID as such a network would be able to learn for itself the non-linearities in the model and predict the optimal actuation. More precisely, in this system, the requirement for the intelligent control is to be able to predict the optimal amount of heat actuation to be supplied in each time-step that converges fastest to the set-point temperature. In its final form, this prediction would be implemented as a series of matrix multiplications, with optimal weights (matrix coefficients), simulating the non-linear function describing the required actuation on taking as input the current state of the system.
(Refer figure) A neural network consists of layers of nodes. The layers of nodes begin with an input layer, followed by one or more hidden layers, and finally an output layer. Each layer (represented as an n-dimensional vector, where n is the number of nodes in the layer) can obtained from the previous layer via multiplication by an m-by-n matrix on the previous m-dimensional layer. Each node also has an associated activation function which for the hidden layers is preferably non-linear (ReLU, tanh, etc) to take into account the non-linearities in the model. An optimisation algorithm then attempts to find a `fit' for the components of all matrices. In order to achieve the final form, the weights need to be optimised via some learning algorithm that learns ‘from experience’. For this, a loss or cost function is calculated as a function of the current state, roughly representing the distance between the current state from the set-point. This is fed to the optimiser which moves over the parameter space of the weights associated to the nodes in the neural network (coefficients of all matrices that serve as a transformation from one layer to the next) to a point closer to the optimum. The weights predict an actuation which is applied to the system giving a new state for which the process is repeated until the minimum is reached. This learning algorithm can be implemented, in our case, either using reinforcement learning (RL) or supervised learning (SL).
In order to train and test RL algorithms and also possibly SL algorithms, the physical system can be simulated as a `game environment' on which the neural net would learn the optimal action at each step. OpenAI, an open-source platform for Artificial Intelligence (AI) development, contains Gym and Baselines, which is a set of games on which RL algorithms can be trained and tested, and a set of high performance RL algorithms, respectively.
An initial model of the system only includes the vacuum can and the heat conduction through the foam surrounding it. The dynamics of this is represented as a first order differential equation and therefore the evolution can be predicted by knowing only the temperature of the can (assuming all system parameters are known accurately). The action or actuation would correspond to a specific value for heating power that would be applied to the can during the next time-step. To formulate this as a gym game environment in python on which RL algorithms (such as those on baselines) may be trained and tested, the following methods are to be defined:
In addition, the environment also has the following attributes:
The action is given externally and should belong to the space of valid actions. In our case a learning algorithm, with a neural network, would feed this into the game at every time-step.
| ||||||||||

| ||||||||||