40m QIL Cryo_Lab CTN SUS_Lab TCS_Lab OMC_Lab CRIME_Lab FEA ENG_Labs OptContFac Mariner WBEEShop
  40m Log, Page 283 of 341  Not logged in ELOG logo
ID Date Author Type Categorydown Subject
  3606   Fri Sep 24 22:58:40 2010 josephbUpdateCDSModified front end medm screens

To startup medm screens for the new suspension front end, do the following:

1) From a control room machine, log into megatron

ssh -X megatron

2) Move to the new medm directory, and more specifically the master sub-directory

cd /opt/rtcds/caltech/c1/medm/master/

3) Run the basic sitemap in that directory

medm -x sitemap.adl

 

The new matrix of filters replacing the old ULPOS, URPOS, etc type filters is now on the screens.  This was previously hidden.  I also added the sensor input matrix entry for the side sensor.

Lastly, the C1SUS.txt filter bank was updated to place the old ULPOS type filters into the correct matrix filter bank.

 

The suspension controls still need all the correct values entered into the matrix entries (along with gains for the matrix of filter banks), as well as the filters turned on.  I hope to have some time tomorrow morning to do this, which basically involves looking at the old screens copying the values over.  The watch dogs are still controlled by the old control screens.  This will be fixed on Monday when I finish switching the front ends over from their sub-network to the main network, at which point logging into megatron will no longer be necessary.

  3609   Sun Sep 26 18:29:23 2010 rana, JohnUpdateCDSModified front end medm screens

Issues I notice on first glance:

  1. The OSEM Sensor Input matrix and the DOF2COIL Output matrix screens should be their own screens and linked from the overview. Right now they are not. Where is the input matrix?
  2. The SIDE GAIN looks like zero on the main screen, but the side OSEM signal seems to be getting through to the SIDE filter bank. . I think the wiring of the SIDE signal through the input matrix is bogus.
  3. The OUTPUT matrix seems to be the transpose of the previous OUTPUT matrix and we have lost the wires that connect the inputs and outputs to the matrix. We ought to think about how best to represent things on the OVERVIEW screen; probably only need to have a minimal representation and allow power users to open up the detailed screen.
  4. The TIME string is whited out. How will this be done? Does each FE display its local time on its EPICS screens?
  5. So far unable to get any channels on DV. How do we look at channels / test points?
  6. As far as we can tell, there is no connection between the output of the SUSPOS, etc. filter banks and the OUTPUT MATRIX. So....nothing actually goes to the coil driver. Its hard to imagine that this new SUS could have ever worked. Is there any evidence that the damping actually worked in the past, or was it something like "well, the watchdog values came down to small numbers eventually..." ???
  7. We are trying to debug the simulink file, but....the wiki entry on how to do this is out of date (yet updated as recently as August!) some path stuff just probably needs to be edited.

 megamind-poster3.jpg

Basically the suspensions are not functioning yet and we can't attempt locking of the MC.

  3612   Mon Sep 27 17:35:13 2010 josephbUpdateCDSUpdated Suspension screens/Megatron now c1ioo/Further work on fb

The medm screens have been updated further, with the hidden matrices added in bright colors.  An example screen shot is attached.

Megatron has been renamed c1ioo and moved to martian network.  Similarly, c1sus and c1iscex are also on the martian network.  Medm screens can be run on any of the control machines and they will work.

Currently the suspension controller is running on c1sus.

The frame builder is currently running on the fb machine *however* it is not working well.  Test points and daq channels on the new front ends tended to crash it when Alex started the mx_stream to the fb via our new DAQ network (192.168.114.XXX, accessible through the front ends or fb - has a dedicated 1 gigabit network with up to 10 gigabit for the fb).  So for the moment, we're running without front end data. Alex will be back tomorrow to work on it. 

Alex claimed to have left the frame builder in a state where it should be recording slow data, however, I can't seem to access recent trends (i.e. since we started it today).  The frame builder throws up an error "Couldn't open raw minute trend file '/frames/trend/minute_raw/C1:Vac-P1_pressure', for example.  Realtime seems to work for slow channels however.  Remember to connect to fb, not fb40m. So it seems the fb is still in a mostly non-functional state.

Alex also started a job to convert all the old trends to the correct new data format, which should finish by tomorrow.

RA: Nice screen work. The old screens had a 'slow' slider effect when ramping the bias so that we couldn't whack the optic too hard. Is the new one instantaneous?

Attachment 1: MC1_Example_Screen.png
MC1_Example_Screen.png
  3615   Tue Sep 28 10:07:29 2010 josephbUpdateCDSUpdated Suspension screens/Megatron now c1ioo/Further work on fb

Quote:

RA: Nice screen work. The old screens had a 'slow' slider effect when ramping the bias so that we couldn't whack the optic too hard. Is the new one instantaneous?

 Looking at the sliders, I apparently still need to connect them properly.  There's a mismatch between the medm screen channel name and the model name.  At the moment there is no "slow" slider effect implemented, so they are effectively instantaneous.  Talking with Alex, he suggests writing a little c-code block and adding it to the model.  I can use the c code used in the filter module ramps as a starting point.

  3619   Wed Sep 29 11:18:36 2010 josephbUpdateCDSApps code changes

After asking Alex specifically what he did yesterday after I left, he indicated he copied  a bunch of stuff from Hanford, including the latest gds, fftw, libframe, root.  We also now have the new dtt code as well.  But those apparently were for the Gentoo build   After asking Alex about the ezca tools this morning, he discovered they weren't complied in the gds code he brought over.  We are in the process of getting the source over here and compiling the ezca tools. 

 

Alex is indicating to me that the currently compiled new gds code may not run on the Centos 5.5 since it was compiled Gentoo (which is what our new fb is running and apparently what they're using for the front ends at Hanford).  We may need to recompile the source on our local Centos 5.5 control machines to get some working gds code.  We're in the process of transferring the source code from Hanford.  Apparently this latest code is not in SVN yet, because at some point he needs to merge it with some other work other people have been doing in parallel and he hasn't had the time yet to do the work necessary for the merge.

For the moment, Alex is undoing the soft link changes he did pointing gds at the latest gds code he copied, and pointing back at the original install we had.

  3620   Wed Sep 29 12:08:28 2010 josephb, alexSummaryCDSLast burt save of old controls

This is being recorded for posterity so we know where to look for the old controls settings.

The last good burt restore that was saved before turning off scipe25 aka c1dcuepics was on September 29, 11:07.

  3625   Thu Sep 30 11:07:20 2010 josephb, alexUpdateCDStest points starting to work

The centos 5.5 compiled gds code is currently living on rosalba in the /opt/app directory (this is local to Rosalba only).  It has not been fully compiled properly yet.  It is still missing ezcaread/write/ and so forth.  Once we have a fully working code, we'll propagate it to the correct directories on linux1.

So to have a working dtt session with the new front ends, log into rosalba, go to opt/apps/, and source gds-env.bash in /opt/apps (you need to be in bash for this to work, Alex has not made a tcsh environment script yet).  This will let get testpoints and be able to make transfer function measurements, for example

Also, to build the latest awgtpman, got to fb, go to /opt/rtcds/caltech/c1/core/advLigoRTS/src/gds, and type make.  This has been done and mentioned just as reference.

The awgtpman along with the front end models should startup automatically on reboot of c1sus (courtesy of the /etc/rc.local file).

  3628   Thu Sep 30 16:29:35 2010 josephb, alexUpdateCDSfb update

There currently seems to be a timing issue with  the frame builder.  We switched over to using a symmetricom card to get an IRIG-B signal into the fb machine, but the gps time stamp is way off (~80 years Alex said).

If there is a frame buiilder issue, its currently often necessary to kill the associated mx_stream processes, since they don't seem to restart gracefully.  To fix it the following steps should be taken:

Kill frame builder, kill the two mx_stream processes, then /etc/restart_streams/, then restart the frame builder (usual daqd -c ./daqdrc >& ./daqd.log in /opt/rtcds/caltech/c1/target/fb).

To restart (or start after a boot) the nds server, you need to go to /opt/rtcds/caltech/c1/target/fb and type

./nds /opt/rtcds/caltech/c1/target/fb/pipe

At this time, testpoints are kind of working, but timing issues seem to be preventing useful work being done with it.  I'm leaving with Alex working on the code.

 

  3629   Thu Sep 30 17:11:01 2010 alex iUpdateCDSDAQ system update

The frame builder is timed from the Symmetricom GPS card now, which is getting the IRIGB timecode from the freq. distribution amplifier (from the VME GPS receiver card).

I have adjusted the GPS seconds to match the real GPS time and the DTT seems to be happy: sweeping MC2 MCL filter module produces nice plot.

Test points are working on SUS.

Excitations are working on SUS.

I am leaving the frame builder running and acquiring the data.

 

Alex

 

  3631   Thu Sep 30 21:55:31 2010 ranaUpdateCDSDAQ sys update

Its pretty exciting to see that Joe got Alex to actually use the ELOG. Its a proof that even rare events occur if you are patient enough.

1) I fixed the MEDM links to point to the new sitemap.adl in /opt/rtcds. There is a link on the new sitemap which points to the old sitemap so that there is nothing destroyed yet.

2) Some of the fields in the screen are white. These are from the new c1sus processor, not issues with the slow controls. I think its just stuff that has not yet been created in the C1SUS simulink module.

Untitled.png

3) The PZT steering controls are gone. Without this we cannot get the beam down the arm. Must fix before aliging things after the MC. Since PZT used to be controlled by ASC, we'll have to wire the Piezo Jena PZT controls in from a different VME 4116. Possibly c1iool0's crate?

4) Also, the IPANG and IPPOS are somehow not working right. I guess this is because they are part of the ASC / the old ETMX system. We'll have to wire the IPANG QPD into the new ETMY ADC system if we want to get the initial alignment into the Y-arm correct.

5) I've started migrating things over from the old SITEMAP. Please just use the new SITEMAP. it has a red link to the old one, but eventually everything on the new one will work after Joe, Alex, me, and Kiwamu are done tweaking.


 

  3632   Fri Oct 1 10:56:30 2010 josephb,alexUpdateCDSfb work continued

Alex fixed the time issue with the IRIG-B signal being far off, apparently their IRIG-B signal in downs seems to be different.  He simply corrected for the difference in the two signals in the code.

For debugging purposes we uncommented the following line in the feCodeGen.pl script (in /opt/rtcds/caltech/c1/advLigoRTS/src/epics/util/):

print EPICS "test_points ONE_PPS $dac_testpoint_names $::extraTestPoints\n" 

This is to make every ADC testpoint available from the IOP (such as c1x02).

  3633   Fri Oct 1 11:33:15 2010 josephb, alexConfigurationCDSChanging gds code to the new working version

Alex is installing the newly compiled gds code (compiled on Centos 5.5 on Rosalba) which does in fact include the ezca type tools. 

At the moment we don't have a solaris compile, although that should be done at somepoint in the future.  It means the gds tools (diaggui, foton, etc) won't work on op440m.  On the bright side, this newer gds code has a foton that doesn't seem to crash all the time on Linux.

 

  3634   Fri Oct 1 11:53:42 2010 josephbConfigurationCDSAdded RCG simlink files to the 40m svn

I've added a new directory in /opt/rtcds/caltech/c1/core called rts_simlink.  This directory is now in the 40m svn.  Unfortunately, the simlink files used to generate the front end c codes live in a directory controlled by the CDS svn.  So I've copied the .mdl files from /opt/rtcds/caltech/c1/core/advLigoRTS/src/epics/simLink/ into this new directory and added them into the 40m svn.  When making changes to the simlink files, please copy them to this new directory and check them in so we can a useful history of the models.

 

  3635   Fri Oct 1 14:13:29 2010 josephb, alexUpdateCDSfb work that still needs to be done

1) Need to check 1 PPS signal alignment

2) Figure out why 1PPS and ADC/DAC testpoints went away from feCodeGen.pl?

3) Fix 1PPS testpoint giving NaN data

4) Figure out why is daqd printing "making gps time correction" twice?

5) Need to investigate why mx_streams are still getting stuck

6) Epics channels should not go out on 114 network (seen messages when doing
burt restore/save).

7) Dataviewer leaves test points hanging, daqd does not deallocate them
(net_Writer.c shutdown_netwriter call)

8) Need to install wiper scripts on fb

9) Need to install newer kernel on fb to avoid loading myrinet firmware
(avoid boot delay)

  3636   Fri Oct 1 16:34:06 2010 josephbUpdateCDSc1sus not booting due to fb dhcp server not running

For some reason, the dhcp server running on the fb machine which assigns the IP address to c1sus (since its running a diskless boot) was down.  This was preventing c1sus from coming up properly.  The symptom was an error indicated no DHCP offers were made(when I plugged a keyboard and monitor in).

To check if the dhcp server is running, run ps -ef | grep dhcpd.  If its not, it can be started with "sudo /etc/init.d/dhcpd start"

  3638   Fri Oct 1 18:19:24 2010 josephb, kiwamuUpdateCDSc1sus work

The c1sus model was split into 2, so that c1sus controls BS, PRM, SRM, ITMX, ITMY, while c1mcs controls MC1, MC2, MC3.  The c1mcs uses shared memory to tell c1sus what signals to the binary outputs (which control analog whitening/dewhitening filters), since two models can't control a binary output.

This split was done because the CPU time was running above 60 microseconds (the limit allowable since we're trying to run at 16kHz). Apparently the work Alex had done getting testpoints working had put a greater load on the cpu and pushed it over an acceptable maximum.    After removing the MC optics controls, the CPU time dropped to about 47 microseconds from about 67 microseconds.  The c1mcs is taking about 20 microseconds per cycle.

The new model is using the top_names functionality to still call the channels C1SUS-XXX_YYY.  However, the directory to find the actual medm filter modules is /opt/rtcds/caltech/c1/medm/c1mcs, and the gds testpoint screen for that model is called C1MCS-GDS_TP.adl.  I'm currently in the process of updating the medm screens to point to the correct location.

Also, while plugging in the cables from the coil dewhitening boards, we realized I (Joe) had made a mistake in the assignment of channels to the binary output boards.  I need to re-examine Jay's old drawings and fix the simulink model binary outputs.

  3639   Fri Oct 1 18:53:33 2010 josephb, kiwamuUpdateCDSThings needing to be done next week

We realized we cannot build code with the current RCG compiler on c1ioo or c1iscex, since these are not Gentoo machines.  We need either to get a backwards compatible code generator, or change the boot priority (removing the harddrives also probably works) for c1ioo and c1iscex so they do the diskless Gentoo thing.  This would involve adding some MAC address to the framebuilder dhcpd.conf file in /etc/dhcp along with the computer IPs, and then modifying the /diskless/root/etc/rtsystab with the right machine names and models to start.

I also need to bring some of the older, neglected models up to current build standards. I.e. use cdsIPCx_RFM instead of cdsIPCx and so forth. 

Need to fix the binary outputs for c1sus/c1mcs.  Need to actually get the RFM running, since Kiwamu was having some issues with his green RFM test model.  We have the latest checkout from Rolf, but we have no proof that it actually works.

  3642   Mon Oct 4 11:20:45 2010 josephbUpdateCDSFixed Suspension binary output list and sus model

I've updated the CDS wiki page listing the wiring of the 40m suspensions with the correct binary output channels.  I previously had confused the wiring of the Auxillary crate XY220 (watchdogs) with the SOS coil dewhitening bypasses.  So I had wound up with the wrong channels (the physical cables we plugged in were correct, just what I thought was going on channel X of that cable was wrong).  This has been corrected in the plan now.  The updated channel/cable list is at http://lhocds.ligo-wa.caltech.edu:8000/40m/Upgrade_09/CDS/Suspension_wiring_to_channels

  3644   Mon Oct 4 15:28:10 2010 josephbUpdateCDSTrying to get c1ioo booting as Gentoo.

I modified the dhcpd.conf file in /etc/dhcp on the fb machine.  I added a entry for c1ioo, listing its MAC address and ip number near the bottom of the file.  I then restarted the dhcp server using "sudo /etc/init.d/dhcpd restart" while on the fb machine.

I also modified the rtsystab, which is used to determine which front end codes start on boot up of a machine.  I added a line: c1ioo   c1x03  c1ioo

I am now in the process of getting c1ioo to come up as a Gentoo machine so I can build a model with an RFM connection in it and test the communication between c1sus and c1ioo.  This involves removing the hard drives and checking to make sure the boot priority is correct (i.e. it checks for a network boot).

  3648   Tue Oct 5 13:46:26 2010 josephb, alexUpdateCDSRestarted fb trending

Fb is now once again actually recording trends.

A section of the daqdrc file (located in /opt/rtcds/caltech/c1/target/fb/ directory) had been commented out by Alex and never uncommented.  This section included the commands which actually make the fb record trends.

The section now reads as:

# comment out this block to stop saving data

#
start frame-saver;
sync frame-saver;
start trender;
start trend-frame-saver;
sync trend-frame-saver;
start minute-trend-frame-saver;
sync minute-trend-frame-saver;
start raw_minute_trend_saver;
#start frame-writer "225.225.225.1" broadcast="131.215.113.0" all;
#sleep 5;

  3649   Tue Oct 5 13:52:15 2010 ranaUpdateCDSproof of trend

Untitled.png

  3651   Tue Oct 5 14:11:09 2010 josephb, alexUpdateCDSGoing to from rtlinux to Gentoo requires front end code clean out

Apparently when updating front end codes from rtlinux to the patched Gentoo, certain files don't get deleted when running make clean, such as the sysfe.rtl files in the advLigoRTS/src/fe/sys directories.  This fouls the start up scripts by making it think it should be configured for rtlinux rather than the Gentoo kernel module.

  3652   Tue Oct 5 16:30:00 2010 josephb, yutaHowToCDSScreen settings and medm screens for new system

You can find the sitemap medm screen in

/opt/rtcds/caltech/c1/medm/master

The settings for the screens were last saved by burt in the original system on Sept 29, 2010 at 11:07.  So go to the

/cvs/cds/caltech/burt/autoburt/snapshots/2010/Sep/29/11:07

directory.  You can grep for the channels in the files in this directory.

You can also then use the autoBurt.req file in the /opt/rtcds/caltech/c1/target/sysname/sysnameepics (c1sus/c1susepics) to backup the settings entered.  Save to the /opt/rtcds/caltech/c1/target/snapshots directory for now.

 

 

  3653   Tue Oct 5 16:58:41 2010 josephb, yutaUpdateCDSc1sus front end status

We moved the filters for the mode cleaner optics over from the C1SUS.txt file in /opt/rtcds/caltech/c1/chans/ to the C1MCS.txt file, and placed SUS_ on the front of all the filter names.  This has let us load he filters for the mode cleaner optics.

At the moment, we cannot seem to get testpoints for the optics (i.e. dtt is not working, even the specially installed ones on rosalba). I've asked Yuta to enter in the correct matrix elements and turn the correct filters on, then save with a burt backup.

  3658   Wed Oct 6 11:03:51 2010 josephb, yutaHowToCDSHow to start diaggui for right now

I'm hoping to get a proper install this week done, but for now, this a stop gap.

To start diagnostic test tools, go to rosalba.  (Either sit at it, or ssh -X rosalba).

cd /opt/apps

type "bash", this starts a bash shell

source gds-env.bash

diaggui

 --------- Debugging section ------

If that throws up errors, try looking with "diag -i" and see if there's a line that starts with nds.  In the case last night, Alex had not setup a diagconf configuration file in the /etc/xinetd.d directory, which setups up the diagconf service under the xinit service.  To restart that service (if for example the nds line doesn't show up), go to /etc/init.d/ and type "sudo xinit start" (or restart).

Other problems can include awg and/or tpman not running for a particular model on the front end machine.  I.e. diag -i should show 3 results from 192.168.113.85 (c1x02, c1sus, c1mcs) at the moment , for both awg and tp.  If not, that means awg and tpman need to be restarted for those.

These can be started manually by going to the front end, to the /opt/rtcds/caltech/c1/target/gds/bin/ directory, and running awgtpman -s sysname (or in the case of IOP files [c1x02, c1x03, etc], awgtpman -s sysname -4.  Better is probably to run the start scripts which live /opt/rtcds/caltech/c1/scripts/ which kills and restarts all the process for you.

 

 

  3659   Wed Oct 6 12:00:23 2010 josephb, yuta, kiwamuUpdateCDSFound and fixed filter sampling rate problem with suspensions

While we looking using dtt and going over the basics of its operation, we discovered that the filter sample rates for the suspensions were still set to 2048 Hz, rather than 16384 Hz which is the new front end.  This caused the filters loaded into the front ends to not behave as expected.

After correcting the sample rate, the transfer functions obtained from dtt are now looking like the bode plots from foton.

We fixed the C1SUS.txt and C1MCS.txt files in the /opt/rtcds/caltech/c1/chans/ directory, by changing the SAMPLING lines to have 16384 rather than 2048.

  3661   Wed Oct 6 15:56:14 2010 josephbHowToCDSHow to load matrices quickly and easily

Awhile ago I wrote several scripts for reading in medm screen matrix settings and then writing them out.  It was meant as kind of a mini-burt just for matrices for switching between a couple of different setups quickly.

Yuta has expressed interest in having this instruction available.

In /cvs/cds/caltech/scripts/matrix/ there are 4 python scripts:

saveMatrix.py, oldSaveMatrix.py, loadMatrix.py, oldLoadMatrix.py

The saveMatrix.py and loadMatrix.py are for use with the current front end codes (which start counting from 1), where as the old*.py files are for the old system.

To use saveMatrix.py you need to specify the number of inputs, outputs, and the base name of the matrix (i.e. C1:LSP-DOF2PD_MTRX is the base of C1:LSP-DOF2PD_MTRX_0_0_GAIN for example), as well as an ouptut file where the values are stored.

So to save the BS in_matrix setting you could do (from /cvs/cds/caltech/scripts/matrix/)

./saveMatrix.py -i 4 -o 5 -n "C1:SUS-BS_TO_COIL_MTRX" -f -d ./to_coil_mtrx.txt

The -i 4 indicates 4 inputs, the -o 5 indicates 5 outputs, -n "blah blah" indicates the base channel name, -f indicates a matrix bank of filters (if its just a normal matrix with no filters, drop the -f flag), and -d ./to_coil_mtrx.txt indicates the destination file.

To write the matrix, you do virtually the same thing:

./loadMatrix.py -n "C1:SUS-PRM_TO_COIL_MTRX" -f -d ./to_coil_mtrx.txt

In this case, you're writing the saved values of the BS, to the PRM.  This method might be faster if you're trying to fill in a bunch of new matrices that are identical rather than typing 1's and -1's 20 times for each matrix.

I'll have Yuta add his how-to of this to the wiki.

  3662   Wed Oct 6 16:16:48 2010 josephb, yutaUpdateCDSc1sus status

At the moment, c1sus and c1mcs on the c1sus machine seem to be dead in the water.  At this point, it is unclear to me why.

Apparently during the 40m meeting, Alex was able to get test points working for the c1mcs model.  He said he "had to slow down mx_stream startup on c1sus".   When we returned at 2pm, things were running fine. 

We began updating all the matrix values on the medm screens.  Somewhere towards the end of this the c1sus model seemed to have crashed, leaving only c1x02 and c1mcs running.  There were no obvious error messages I saw in dmesg and the target/c1sus/logs/log.txt file (although that seems to empty these days).  We quickly saved to burt snap shots, one of c1sus and one of c1mcs and saved them to /opt/rtcds/catlech/c1/target/snapshots directory temporarily.  We then ran the killc1sus script on c1sus, and then after confirming the code was removed, ran the startup script, startc1sus.  The code seemed to come back partly.  It was syncing up and finding the ADC/DAC boards, but not doing any real computations.  The cycle time was reporting reasonably, but the usr time (representing computation done for the model) was 0.  There were no updating monitor channels on the medm screens and filters would not turn on.

At this point I tried bringing down all 3 models, and restarting c1x02, then c1sus and c1mcs.  At this point, both c1sus and c1mcs came back partly, doing no real calculations.  c1x02 appears to be working normally (or at least the two filter banks in that model are showing changing channels from ADCs properly).  I then tried rebooting the c1sus machine.  It came back in the same state, working c1x02, non-calculating c1sus and c1mcs.

  3665   Thu Oct 7 10:37:42 2010 josephbUpdateCDSc1sus with flaky ssh

Currently trying to understand why the ssh connections to c1sus  are flaky.  This morning, every time I tried to make the c1sus model on the c1sus machine, the ssh session would be terminated at a random spot midway through the build process.  Eventually restarting c1sus fixed the problem for the moment.

However, previously in the last 48 hours, the c1sus machine had stopped responding to ssh logins while still appearing to be running the front end code.  The next time this occurs, we should attach a monitor and keyboard and see what kind of state the computer is in.  Its interesting to note we didn't have these problems before we switched over to the Gentoo kernel from the real-time linux Centos 5.5 kernel.

  3666   Thu Oct 7 10:48:41 2010 josephb, yutaUpdateCDSc1sus status

This problem has been resolved.

Apparently during one of Alex's debugging sessions, he had commented out the feCode function call on line 1532 of the controller.c file (located in /opt/rtcds/caltech/c1/core/advLigoRTS/src/fe/ directory).

This function is the one that actually calls all the front end specific code and without it, the code just doesn't do any computations.  We had to then rebuild the front end codes with this corrected file.

Quote:

At the moment, c1sus and c1mcs on the c1sus machine seem to be dead in the water.  At this point, it is unclear to me why.

Apparently during the 40m meeting, Alex was able to get test points working for the c1mcs model.  He said he "had to slow down mx_stream startup on c1sus".   When we returned at 2pm, things were running fine. 

We began updating all the matrix values on the medm screens.  Somewhere towards the end of this the c1sus model seemed to have crashed, leaving only c1x02 and c1mcs running.  There were no obvious error messages I saw in dmesg and the target/c1sus/logs/log.txt file (although that seems to empty these days).  We quickly saved to burt snap shots, one of c1sus and one of c1mcs and saved them to /opt/rtcds/catlech/c1/target/snapshots directory temporarily.  We then ran the killc1sus script on c1sus, and then after confirming the code was removed, ran the startup script, startc1sus.  The code seemed to come back partly.  It was syncing up and finding the ADC/DAC boards, but not doing any real computations.  The cycle time was reporting reasonably, but the usr time (representing computation done for the model) was 0.  There were no updating monitor channels on the medm screens and filters would not turn on.

At this point I tried bringing down all 3 models, and restarting c1x02, then c1sus and c1mcs.  At this point, both c1sus and c1mcs came back partly, doing no real calculations.  c1x02 appears to be working normally (or at least the two filter banks in that model are showing changing channels from ADCs properly).  I then tried rebooting the c1sus machine.  It came back in the same state, working c1x02, non-calculating c1sus and c1mcs.

 

  3668   Thu Oct 7 14:57:52 2010 josephb, yutaUpdateCDSc1sus status

Around noon, Yuta and I were trying to figure out why we were getting no signal out to the mode cleaner coils.  It turns out the mode cleaner optic control model was not talking to the IOP model. 

Alex and I were working under the incorrect assumption that you could use the same DAC piece in multiple models, and simply use a subset of the channels.  He finally went and asked Rolf, who said that the same DAC simulink piece in different models doesn't work.  You need to use shared memory locations to move the data to the model with the DAC card.  Rolf says there was a discussion (probably a long while back) where it was asked if we needed to support DAC cards in multiple models and the decision was that it was not needed.

Rolf and Alex have said they'd come over and discuss the issue.

In the meantime, I'm moving forward by adding shared memory locations for all the mode cleaner optics to talk to the DAC in the c1sus model.

 

Note by KA: Important fact that is worth remembering

  3671   Thu Oct 7 16:21:02 2010 KojiOmnistructureCDSBig 3

Both Rolf and Alex (at least his elbow) together visited the 40m to talk with Joe for the CDS.

40m is the true front line of the CDS development!!!

Attachment 1: IMG_3642.jpg
IMG_3642.jpg
  3673   Thu Oct 7 17:19:55 2010 josephb, alex, rolfUpdateCDSc1sus status

As noted by Koji, Alex and Rolf stopped by.

We discussed the feasibility of getting multiple models using the same DAC.  We decided that we infact did need it. (I.e. 8 optics through 3 DACs does not divide nicely), and went about changing the controller.c file so as to gracefully handle that case.  Basically it now writes a 0 to the channel rather than repeating the last output if a particular model goes down that is sharing a DAC.

In a separate issue, we found that when skipping DACs  in a model (say using DACs 1 and 2 only) there was a miscommunication to the IOP, resulting in the wrong DACs getting the data.  the temporary solution is to have all DACs in each model, even if they are not used.  This will eventually be fixed in code.

At this point, we *seem* to be able to control and damp optics.  Look for a elog from Yuta confirming or denying this later tonight (or maybe tomorrow).

 

  3675   Thu Oct 7 23:24:44 2010 yutaUpdateCDSchecking MC1 suspension damping

Background:
 The new CDS is currently being set up.
 We want to see if the damping servo of the suspensions are working correctly.
 But before that, we have to see if the sensors and the coils are working correctly.
 Among the 8 optics, MCs take top priority because of the green beam. for the alignment of the in-vac optics.

What I did:
 By seeing the 5 sensor outputs (C1:SUS-MC1_XXSEN_IN1, XX=UL,UR,LR,LL,SD) with the Data Viewer, I checked if all the coils are kicking in the supposed direction and all the sensors are sensing that kick correctly.

 All the matrices elements were set to the ideal values(-1 or 0 or 1) this time.

Result:
 They were perfect.
1. POSITION seemed to be POSITION
 When the offset(C1:SUS-MC1_SUSPOS_OFFSET) was added, all the sensor output moved to the same direction.
2. PITCH seemed to be PITCH
 When the offset(C1:SUS-MC1_PIT_COMM) was changed, UL&UR and LL&LR went to the different direction.
3. YAW seemed to be YAW
 When the offset(C1:SUS-MC1_YAW_COMM) was changed, UL&LL and UR&LR went to the different direction.
4. SIDE seemed to be SIDE
 When the offset(C1:SUS-MC1_SDSEN_OFFSET) was added, DC level of the SD sensor output changed.

Notes:

 c1mcs crashed many times during the investigation, and I had to kill and restart it again and again.
 It seemed to be easily crashed when filters are on, and so I couldn't check whether the damping servo is working correctly or not today.

Next work:

  - fix c1mcs (and maybe others)
  - check the damping servo by comparing the displacements of each 4 degrees of freedom when servo in off and on.

  3678   Fri Oct 8 12:21:11 2010 josephbUpdateCDSchecking MC1 suspension damping

Upon investigation, it appears that the c1mcs model was (and still is) timing out after a random amount of time. Or in other words, it at some point it was taking too long to do all the calculations for a single cycle and falling behind. The evidence for this is from the dmesg command when run on c1sus.

There's a bunch of lines like:

[ 8877.438002] c1mcs: cycle 568 time 62; adcWait 0; write1 0; write2 0; longest write2 0

[ 8877.438002] c1mcs: cycle 569 time 62; adcWait 0; write1 0; write2 0; longest write2 0

With a final line like: [ 8877.439003] c1mcs: ADC TIMEOUT 1 2405 37 2277

This last line indicates in fell so far behind it gave up.

However, this doesn't actually seem to be related to the amount of computation going on with the front end. I restarted the c1mcs model this morning by logging into the c1sus machine, and changing to the /opt/rtcds/caltech/c1/target/c1mcs/bin directory and running:

lsmod

sudo rmmod c1mcsfe

sudo insmod c1mcsfe.ko

The first line just lists the running modules. The second removes the c1mcs module, and the third starts it up again.

I proceeded to turn all the filters and and set all the matrix values while keeping an eye on the C1MCS-GDS_TP.adl screen and its timing indicator. It was running fine with all these turned on. Then I ran a dtt session from rosalba (going to /opt/apps/, typing bash, then source gds-env.bash, and finally diaggui) as well as a dataviewer and looked at 6 test point channels. It received data fine.

However, about 2 minutes after I had stopped doing this (although the dataviewer realtime session was still going) the USR timing jumped from about 20 microseconds to 35 microseconds, and the CPU Max timing jumped to the 50-60 microsecond range. At this point dmesg started reporting things like:

[54143.465613] c1mcs: cycle 1076 time 62; adcWait 0; write1 0; write2 0; longest write2 0

[54143.526004] c1mcs: cycle 2064 time 62; adcWait 0; write1 0; write2 0; longest write2 0

About a minute later the code gave up and reported a ADC timeout via dmesg. None of the other front ends seem to be affected.

I had to clear the test points manually after stopping dataviewer and dtt by going to rosalaba,using the sourced gds-env.bash, and running diag -l. I then typed "tp clear 36 *" to clear all the test points on the model with FEC number 36 (corresponding to c1mcs).

I removed and restarted c1mcs again, trying to turn on a few things at a time and letting it run for a few minutes to see if I could narrow down if its one particular filter perhaps reaching an underflow and starting to bog down the computations. However, after about 45 minutes of this, the model is still running and I've turned all the filters on and have been running about 8 test points with no problem, so the problem is clearly intermittent.

Quote:

Notes:
 c1mcs crashed many times during the investigation, and I had to kill and restart it again and again.
 It seemed to be easily crashed when filters are on, and so I couldn't check whether the damping servo is working correctly or not today.

Next work:

  - fix c1mcs (and maybe others)
  - check the damping servo by comparing the displacements of each 4 degrees of freedom when servo in off and on.

 

  3680   Fri Oct 8 15:57:32 2010 josephbUpdateCDSc1ioo status

I've been trying to figure out why the c1ioo machine crashes when I try to run the c1ioo front end.

I tried removing some RFM components from the c1ioo model, and then the c1gpt model (Kiwamu's green locking model) as an easier test case.  Both cause the machine to lock up once they start running.  Lastly, I tried running the c1x02 and c1sus models on the c1ioo machine instead of the c1sus machine, after first turning off all models on c1sus.  This led to the same lockup. 

Since those models run fine on the c1sus machine, I could only conclude that a recent change in the fe code generator or the Gentoo kernel and the Sun X4600 computer don't work together at the moment.

After talking to Alex, he got the idea to check if the monitor() and mwait() were supported on the c1ioo machine.  These function calls were added relatively recently, and are   used to poll a memory location to see if something has been written there, and then do something when it is.  Apparently the Sun X4600 computers do not support this call.  Alex has modified the code to not use these functions calls, at least for now.  He'd like to add a check to the code so it does use those calls on machines which have them supported.

After this change, the c1ioo and c1gpt front end codes do in fact run.

  3681   Fri Oct 8 17:35:24 2010 josephbUpdateCDSstatus of c1ioo, c1sus and rfm

RFM is still not working.  I can see data on a filter just before it reaches the RFM sending code, but I see only zeros on the receiving side.

c1sus machine and c1x02, c1sus, c1mcs, c1rms are running.  At the moment, the c1mcs model is running at about 42 microseconds for USR time and 56 microseconds for CPU MAX, which is close to the 61 microsecond limit.  This is with MC filters on.  So far it has not been late, but its not clear to me if its going to stay that way.  So far I haven't been able to isolate why it sometimes slows down after a few minutes.  Also, it was running faster earlier in the day (around 30-ish microseconds) and I believe it has slowed down slightly in the last hour or two.

c1ioo machine and c1x03, c1ioo are running. However its not doing very much good as I can't get any data transferred from it to any of the optic suspensions. I need to spend some more time debugging this and then grab Alex I think.

  3687   Mon Oct 11 10:49:03 2010 josephbUpdateCDSc1sus stability

Taking a look at the c1sus machine, it looks as if all of the front end codes its running (c1sus - running BS, ITMX, ITMY, c1mcs - running MC1, MC2, MC3, and c1rms - running PRM and SRM) worked over the weekend.  As I see no

Running dmesg on c1sus reports on a single long cycle on c1x02, where it took 17 microseconds (~15 microseconds i maximum because the c1x02 IOP process is running at 64kHz).

Both the c1sus and c1mcs models are running at around 39-42 microseconds USR time and 44-50 microseconds CPU time.  It would run into problems at 60-62 microseconds.

Looking at the filters that are turned on, it looks as it these models were running with only a single optic's worth of filters turned on via the medm screens.  I.e. the MC2 and ITMY filters were properly set, but not the others.

The c1rms model is running at around 10 microseconds USR time and 14-18 microseconds CPU time.  However it apparently had no filters on.

It looks as if no test points were used this weekend.  We'll turn on the rest of the filters and see if we start seeing crashes of the front end again.

Edit:

The filters for all the suspensions have been turned on, and all matrix elements entered.  The USR and CPU times have not appreciably changed.  No long cycles have been reported through dmesg on c1sus at this time.  I'm going to let it run and see if it runs into problems.

  3690   Mon Oct 11 17:31:44 2010 yutaUpdateCDSActivation of DAQ channels for 8 optics

(Joe, Yuta)

Background:
 We need DAQ channels activated to measure Q-values of the ringdowns for each DOF, each optics with the Dataviewer.

What we did:
1. Activated the following DAQ using daqconfig (in /cvs/cds/rtcds/caltech/c1/scripts).
 C1:SUS-XX_AASEN_IN1
 C1:SUS-XX_SUSBBB_IN1
 C1:RMS-YYY_AASEN_IN1
 C1:RMS-YYY_SUSBBB_IN1
 C1:MCS-ZZZ_AASEN_IN1
 C1:MCS-ZZZ_SUSBBB_IN1
  (XX=BS,ITMX,ITMY  YYY=PRM,SRM  ZZZ=MC1,MC2,MC3  AA=UL,UR,LR,LL,SD  BBB=POS,PIT,YAW)

2. Set datarate to 2048 for each DAQ.

3. Restarted fb(frame builder).

Result:
 We succeeded in making DAQ channels appear in the Dataviewer signal list, but we can't start the measurement because c1mcs is still flaky.

Note:
 We found that c1mcs crashes everytime when turning off all the damping servo (using "Damp" buttons on the medm screen).
 It doesn't crash when all the filters are off.

  3691   Mon Oct 11 20:52:00 2010 ranaUpdateCDSActivation of DAQ channels for 8 optics

Bah!  We tried to get some data but totally failed. It seems like there is some rudimentary functionality in the FE process of the MC, but no useful DAQ at all.

Neither Dataview or DTT had anything. We looked and the NDS process and the DAQD process were not running on FB.

I restarted them both (./daqd -c daqdrc) & (./nds pipe > nds.log) but couldn't get anything.

There's a bunch of errors in the daqd.log like this:

CA.Client.Exception...............................................
    Warning: "Identical process variable names on multiple servers"
    Context: "Channel: "C1:SUS-MC1_SUSPOS_INMON", Connecting to: c1susdaq:57416, Ignored: c1sus.martian:57416"
    Source File: ../cac.cpp line 1208
    Current Time: Mon Oct 11 2010 18:25:15.475629328
..................................................................
CA.Client.Exception...............................................
    Warning: "Identical process variable names on multiple servers"
    Context: "Channel: "C1:SUS-MC1_SUSPIT_INMON", Connecting to: c1susdaq:57416, Ignored: c1sus.martian:57416"
    Source File: ../cac.cpp line 1208
    Current Time: Mon Oct 11 2010 18:25:15.475900124

  3692   Mon Oct 11 22:04:28 2010 yutaUpdateCDSdamping for MCs are basically working

Background:
 Even if we can't use the Dataviewer to get the time series of each 4 DOF displacements, we can still use StripTool to monitor the ringdowns and see if the damping servo is working.

What I did:
1. Excited vibration by kicking the mirror randomly (by putting some offsets randomly and turing the filters on and off randomly).

2. Turned all the servo off by clicking "ShutDown" button.

3. Turned all the servo on by clicking "Normal" button.

3. Monitored each 4 DOF displacements with StripTool and see if there are any considerably low-Q ringdown after turning on the servo.
 The values I monitored are as follows;
  C1:SUS_MCX_SUSPOS_INMON
  C1:SUS_MCX_SUSPIT_INMON
  C1:SUS_MCX_SUSYAW_INMON
  C1:SUS_MCX_SDSEN_INMON  (X=1,2,3)

All the settings I used for this observation are automatically saved here;
  /cvs/cds/caltech/burt/autoburt/snapshots/2010/Oct/11/21:07/c1mcs.epics

Result:
 Attached is the screenshots of StripTool Graph window for each 3 MCs.
 As you can see, the dampings for each DOF, each MCs are basically working.

Note:
 Do NOT turn off all the damping servo by clicking "Damp" buttons or setting the SUSXXX_GAIN to 0. It crashes c1mcs.

Next work:
 - check and relate the signal sign with the actual moving direction of the optics
 - fix data aquisition system
 - measure Q-values when the servo is on and off using DAQ and Dataviewer

Attachment 1: SUS-MC1.png
SUS-MC1.png
Attachment 2: SUS-MC2.png
SUS-MC2.png
Attachment 3: SUS-MC3.png
SUS-MC3.png
  3696   Tue Oct 12 13:05:00 2010 josephb, alexUpdateCDSfb still flaky, models time out fixed

Interesting information from Alex.  We're limited to 2 Megabytes per second per front end model.  Assuming all your channels are running at a 2kHz rate, we can have at most 256 channels being set to the frame builder from the front end (assuming 4 byte data).  We're fine for the moment, but perhaps useful to keep in mind.

I talked to Alex this morning and he said the frame builder is being flaky (it crashed on us twice this morning, but the third time seemed to stay up when requesting data).  I've added a new wiki page called "New Computer Restart Prodecures" under Computers and Scripts, found here. It includes all the codes that need to be running, and also a start order if things seem to be in a particularly bad state.  Unfortunately, there were no fixes done to the frame builder but it is on Alex's list of things to do.

In regards to the timing out of the front ends, Alex came over to the 40m this morning and we sat down debugging.  We tried several things, such as removing all filters from C1MCS.txt file in the chans directory, and watching the timing as we pressed various medm control buttons. We traced it to a filters used by the DAC in the model talking to the IOP front end, which actually sends the data to the physical DAC card.  The filter is used when converting between sample rates, in this case between the 16 kHz of the front end model and the 64 kHz of the IOP.  Sending it raw zeros after having had real data seemed to cause this filter to eat up an usually large amount of CPU time. 

We modified the /opt/rtcds/caltech/c1/core/advLigoRTS/src/include/drv/fm10Gen.c file.

We reverted a change that was done between version 908 and 929, where underflows (really small numbers) were dealt with by adding and then subtracting a very small number.  We left the adding and subtracting, but also restored the hard limits on the history.

So instead of relying on just:

input += 1e-16;
junk = input;
input -= 1e-16;

we also now use

if((new_hist < 1e-20) && (new_hist > -1e-20)) new_hist = new_hist<0 ? -1e-20: 1e-20;

Thus any filter value who's absolute value is less than 1e-20 will be clamped to -1e-20 or 1e-20.  On the bright side, we no longer crash the front ends when we turn something off.

 

  3706   Wed Oct 13 17:04:34 2010 josephb, yutaUpdateCDSburt restore now working with filter settings

Previously, burt restoring was not setting filters on and off, which was required us to constantly go through all the filter banks and turn them on.  We figured it out that the autoBurt.req file doesn't seem to be setup to restore those values, so that snapshots made with the default .req file don't restore either.

So I went to each of the suspension directories (/opt/rtcds/caltech/c1/target/c1SYS/c1SYSepics/   where SYS is sus, mcs, or rms) and modified teh autoBurt.req files found there with the following incantation:

sed -i 's/RO \(.*SW[12]R.*\)/\1/' autoBurt.req

This removes the RO at the beginning of the lines which have SW1R or SW2R in them.  These are the channels which correspond to filter bank switches.  As far as I can tell, the RO means to leave it alone.  Unfortunately, I didn't see an obvious autoBurt file specification in the DCC or on google in the 2 minutes I took to look for it.

We need to talk to Alex to figure out how that autoBurt.req file is generated and get it so it doesn't default to not restoring filter bank settings.

  3707   Wed Oct 13 17:12:33 2010 josephb, yutaUpdateCDSFilter name length problem found and fixed

The missing filter files for ULPOS, URPOS, and so forth for the mode cleaner optics was due to the length of the names of the filters. 

This was not a problem for the c1sus model because it was using its own name as the first 3 letters of its designation.  A filter for the sus model would be called something like BS_TO_COIL_MTRX_0_0, while for the mcs it would be called SUS_MC1_TO_COIL_MTRX_0_0, an extra 4 characters.

However, the c1mcs model used the "top_name" feature which uses a subsystem box within the simlink model to rename all the channels.  Apparently in the filter file, this means it has to add the top name to the front of everything, adding an additional 3 characters.  This pushed things over the length limit.

A hard cap of 18 characters has been added to the FiltMuxMatrix.pm file (located in /cvs/cds/caltech/c1/, so that it will prevent this type of problem in the future by stopping at compile time and presenting a helpful error message.

I also fixed a bug with too many spaces in the feCodeGen.pl file when dealing with top_names and the filtMuxMatrix.pm preventing some .adl files from being generated.

Also of interest, MC3 appears to never have had F2A filters.  For the moment we're running without them, but since they're just a fine tuning it shouldn't affect locking tonight.

 

Improbability factor of mode cleaner suspensions working tonight: ~20

 

  3708   Wed Oct 13 18:01:43 2010 yutaHowToCDSeditting all the similar medm screens

(Joe, Yuta)

Say, you want to edit all the similar medm screens named C1SUS_NAME_XXX.adl.

1. Go to /opt/rtcds/caltech/c1/medm/master and edit C1SUS_DEFAULTNAME_XXX.adl as you like.

2. Run generate_master_screens.py.

That's it!!

  3712   Thu Oct 14 00:54:12 2010 ranaHowToCDSBad PSL Quad, so I edited the SUS MEDM screens

We found today that there was some (unforgivably un-elogged) PSL-POS & PSL-ANG work today.

The wedge splitter at the end of the PSL table is so terribly dogged that it actually moves around irreversibly if you touch the post a little. Pictures have been recorded for the hall of shame. This should be replaced with a non-pedestal / fork mount.

I was so sad about the fork clamp, that I edited the SUS Master screen instead according to Joe-Yuta instructions; see attached.Untitled.png

There's clearly several broken/white links on there, but I guess that Yuta is going to find out how to fix these from Joe in the morning.

  3720   Thu Oct 14 14:09:30 2010 josephbUpdateCDSNDS 2 server status

[Joe, John]

The nds 2 server is about 50% of the way there.  You can connect to it and get channel lists, but its having issues actually serving the data.  The errors we're getting basically say it can't find the source data for the channel

John had to go get lunch at 2pm, but he said he'd log in remotely later and try to configure it properly.

  3728   Fri Oct 15 15:32:35 2010 josephbUpdateCDSslow DAQ channels added

I added all the _OUT16, _INMON, _EXCMON channels associated with the BS, ITMX, and ITMY channels in SUS_SLOW.ini in the /opt/rtcds/caltech/c1/chans/daq directory.  Similarly, I added the channels associated with MC1, MC2, and MC3 to MCS_SLOW.ini and those associated with PRM and the SRM to RMS_SLOW.ini. 

Lines pointing to these files were then added to the /opt/rtcds/caltech/c1/target/fb/master file and the frame builder restarted.  It took about 4 tries before the frame builder stayed up using ./daqd -c ./daqdrc.

I generated the .ini files with a python script.  Its at /opt/rtcds/caltech/c1/scripts/daqscripts/create_inmon_out16_daq_ini.py. It checks the C0EDCU.ini to find what slow channels already exist, then goes through a medm directory given at the command line looking for all _OUT16, _INMON, and _EXCMON channels, adding them if they aren't already accounted for, and then writing out the new file to the location of your choice.

  3732   Mon Oct 18 08:44:36 2010 ranaUpdateCDSslow DAQ channels added

Why EXCMON? I think that we should modify the filter coefficients so that the decimation filter that makes OUT16 is a 2nd order Butterworth at 1 Hz instead of whatever bogus thing it is now. Then we can delete the EXCMON from the DAQ and add either OUTPUT or OUTMON. That way we will have a low frequency signal (OUT16) and a sort of aliased RMS (OUTMON).

  3735   Mon Oct 18 15:33:00 2010 josephb, alexUpdateCDSPossibly broken timing cards

It looks as though we may have two IO chassis with bad timing cards.

Symptoms are as follows:

We can get our front end models writing data and timestamps out on the RFM network.

However, they get rejected on the receiving end because the timestamps don't match up with the receiving front end's timestamp.  Once started, the system is consistently off by the same amount. Stopping the front end module on c1ioo and restarting it, generated a new consistent offset.  Say off by 29,000 cycles in the first case and on restart we might be 11,000 cycles off.  Essentially, on start up, the IOP isn't using the 1PPS signal to determine when to start counting.

We tried swapping the spare IO chassis (intended for the LSC) in ....

# Joe will finish this in 3 days.

# Basically, in conclusion, in a word, we found that c1ioo IO chassis is wrong.

ELOG V3.1.3-