40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  PSL  Not logged in ELOG logo
Message ID: 2196     Entry time: Mon Jun 4 18:00:59 2018
Author: shruti 
Type: Notes 
Category: TempCtrl 
Subject: RL, SL, OpenAI Gym 

Upgrade from PID to intelligent controls:

As seen from earlier elogs, PID control of the temperature of the cavities seems problematic - the time taken for the temperature to converge to the set-point is very large and moreover, the PID parameters may require non-trivial tuning that varies with the desired set-point. Intelligent controls, specifically neural networks, seem like an attractive upgrade to PID as such a network would be able to learn for itself the non-linearities in the model and predict the optimal actuation.

More precisely, in this system, the requirement for the intelligent control is to be able to predict the optimal amount of heat actuation to be supplied in each time-step that converges fastest to the set-point temperature. In its final form, this prediction would be implemented as a series of matrix multiplications, with optimal weights (matrix coefficients), simulating the non-linear function describing the required actuation on taking as input the current state of the system. 


Neural networks:

(Refer figure) A neural network consists of layers of nodes. The layers of nodes begin with an input layer, followed by one or more hidden layers, and finally an output layer. Each layer (represented as an n-dimensional vector, where n is the number of nodes in the layer) can obtained from the previous layer via multiplication by an m-by-n matrix on the previous m-dimensional layer. Each node also has an associated activation function which for the hidden layers is preferably non-linear (ReLU, tanh, etc) to take into account the non-linearities in the model. An optimisation algorithm then attempts to find a `fit' for the components of all matrices.

In order to achieve the final form, the weights need to be optimised via some learning algorithm that learns ‘from experience’. For this, a loss or cost function is calculated as a function of the current state, roughly representing the distance between the current state from the set-point. This is fed to the optimiser which moves over the parameter space of the weights associated to the nodes in the neural network (coefficients of all matrices that serve as a transformation from one layer to the next) to a point closer to the optimum. The weights predict an actuation which is applied to the system giving a new state for which the process is repeated until the minimum is reached. This learning algorithm can be implemented, in our case, either using reinforcement learning (RL) or supervised learning (SL).

Reinforcement Learning (RL):  RL deals with game-like problems where observations of the state are made and the algorithm learns to find the optimal action to perform based on the state. Each action has an associated reward which provides the basis for back-propagation or feedback that is used to predict future actions. In order to implement this, neither the internal working of the game nor a set of a priori `correct actions' need be known.

Supervised Learning (SL): SL operates on a set of labelled data, which includes a training and testing data set consisting of input states and their corresponding correct outputs. The algorithm learns to predict the output by learning with the training data set without any prior knowledge of the mechanism by which the outputs are obtained. In order to use SL in our system, a labelled data set must be obtained. This can be initially done using the model for thermal dynamics of our system and later on by taking real experimental data.


OpenAI Gym:

In order to train and test RL algorithms and also possibly SL algorithms, the physical system can be simulated as a `game environment' on which the neural net would learn the optimal action at each step. OpenAI, an open-source platform for Artificial Intelligence (AI) development, contains Gym and Baselines, which is a set of games on which RL algorithms can be trained and tested, and a set of high performance RL algorithms, respectively. 


Our particular system as a gym environment:

An initial model of the system only includes the vacuum can and the heat conduction through the foam surrounding it. The dynamics of this is represented as a first order differential equation and therefore the evolution can be predicted by knowing only the temperature of the can (assuming all system parameters are known accurately). The action or actuation would correspond to a specific value for heating power that would be applied to the can during the next time-step.

To formulate this as a gym game environment in python on which RL algorithms (such as those on baselines) may be trained and tested, the following methods are to be defined:

step(), reset(), seed(). 

render() and close() may also be used to visualise the gameplay. 


reset() begins a new game and returns an observation or initial state, deterministically or randomly as per choice.

def reset():

    return observation


step() accepts an action and returns a tuple consisting of the next state (observation), reward received after previous action (reward), boolean determining whether the game is over (done) and a dictionary for additional information, if any (info). This method is one time step of the evolution.

def step(action):

    return (observation, reward, done, info)

seed() contains seeds for the random number generators used in the program.


In addition, the environment also has the following attributes:

action_space - space of valid actions

observation_space - space of valid states or observations

reward_range - tuple of min and max possible reward


The action is given externally and should belong to the space of valid actions. In our case a learning algorithm, with a neural network, would feed this into the game at every time-step.


Attachment 1: intelligentcontrol.png  167 kB  Uploaded Mon Jun 4 22:31:41 2018  | Hide | Hide all
ELOG V3.1.3-