ID |
Date |
Author |
Type |
Category |
Subject |
4208
|
Wed Jan 26 12:04:31 2011 |
josephb | Update | CDS | Explanation of why c1sus and c1lsc models crash when the other one goes down | So apparently with the current Dolphin drivers, when one of the nodes goes down (say c1lsc), it causes all the other nodes to freeze for up to 20 seconds.
This 20 seconds can force a model to go over the 60 microseconds limit and is sufficiently long enough to force the FE to time out. Alex and Rolf have been working with the vendors to get this problem fixed, as having all your front ends go down because you rebooted a single computer is bad.
[40184.120912] c1rfm: sync error my=0x3a6b2d5d00000000 remote=0x0
[40184.120914] c1rfm: sync error my=0x3a6b2d5d00000000 remote=0x0
[44472.627831] c1pem: ADC TIMEOUT 0 7718 38 7782
[44472.627835] c1mcs: ADC TIMEOUT 0 7718 38 7782
[44472.627849] c1sus: ADC TIMEOUT 0 7718 38 7782
[44472.644677] c1rfm: cycle 1945 time 17872; adcWait 15; write1 0; write2 0; longest write2 0
[44472.644682] c1x02: cycle 7782 time 17849; adcWait 12; write1 0; write2 0; longest write2 0
[44472.646898] c1rfm: ADC TIMEOUT 0 8133 5 7941
The solution for the moment is to start the computers at exactly the same time, so the dolphin is up before the front ends, or start the models by hand after the computer is up and dolphin running, but after they have timed out. This is done by:
sudo rmmod c1SYSfe
sudo insmod /opt/rtcds/caltech/c1/target/c1SYS/bin/c1SYSfe.ko
Alex and Rolf have been working with the vendors to get this fixed, and we may simply need to update our Dolphin drivers. I'm trying to get in contact with them and see if this is the case. |
4212
|
Thu Jan 27 15:16:43 2011 |
josephb | Update | CDS | Updated generate_master_screens.py | I modified the generate_master_screens.py script in /opt/rtcds/caltech/c1/medm/master/ to handle changing the MCL (and MC_L) listings to ALS for the two ETM suspension screens and associated sub-screens.
The relevant added code is:
custom_optic_channels = ['ETMX',
{'MCL':'ALS','MC_L':'ALS'},
'ETMY',
{'MCL':'ALS','MC_L':'ALS'}]
for index in range(len(custom_optic_channels)/2):
if optic == custom_optic_channels[index*2]:
for swap in custom_optic_channels[index*2+1]:
sed_command = start_sed_string + swap + "/" + custom_optic_channels[index*2+1][swap] + middle_sed_string + optic + file
os.system(sed_command)
When run, it generates the correctly named C1:SUS-ETMX_ALS channels, and replaces MCL and MC_L with ALS in the matrix screens.
|
4219
|
Fri Jan 28 11:08:44 2011 |
josephb | Update | Green Locking | no transmission of ALS signals | As you've correctly noted, the source of the C1:GCV-SCX_ETMX_ALS channels is in the c1gcv model. The first 3 letters of the channel name indicate this (GCV).
The destination of this channel is c1scx, the 2nd 3 letters indicate this (SCX). If it passed through the c1rfm model, it would be written like C1:GCV-RFM_ETMX_ALS.
This particular channel doesn't pass through the c1rfm model, because the computers these two run on (c1ioo and c1scx) are directly connected via our old VMIC 5565 RFM cards, and don't need to pass through the c1sus computer. This is in contrast to all communications going to or from the c1lsc machine, since that is only connected the c1sus machine by the Dolphin RFM. The c1rfm also handles a bunch of RFM reads from the mode cleaner WFS, since each eats up 3-4 microseconds and I didn't want to slow the c1mcs model by 24 microseconds (and ~50 microseconds before the c1sus/c1scx computer switch).
So basically c1rfm is only used for LSC communications and for some RFM reads for local suspensions on c1sus.
As for the reason we have no transmission, that looks to be a problem on c1ioo's end. I'm also noticing that MCL is not updating on the MC2 suspension screen as well as no changes to MC PIT and YAW channels, which suggests we're not transmitting properly.
I rebooted the c1ioo machine and then did a burt restore of the c1ioo and c1gcv models. These are now up and running, and I'm seeing both MCL and ALS data being transmitted now.
Its possible that when we were working on the c1gfd (green frequency divider model) on c1ioo machine we disturbed the RFM communication somehow. Although what exactly, I'm not sure.
Quote: |
No signal is transmitted from C1:GCV-SCX_ETMX_ALS (on c1gcv) to C1:GCV-SCX_ETMX_ALS (on c1scx)
I can't find RFM definition for ALS channels in c1rfm. Where are they???
|
|
4220
|
Fri Jan 28 12:15:58 2011 |
josephb | Update | CDS | Updating conlog channel list/ working on "HealthCheck" script | I've updated the scan_adls script (currently located in /cvs/cds/caltech/conlog/bin) to look at new location of our medm screens. I made a backup of the old conlog channel list as /cvs/cds/caltech/conlog/data/conlog_channels.old-2011-01-28.
I then ran the update_chanlist script in the same directory, which calls the scan_adl script. After about 5 minutes it finished updating the channel list. I restarted the conlogger just to be sure, and checked that our new model channels showed up in the conlog (which they do).
I have added a cron job to the op340m cron tab to once a day run the update_conlog script at 7am.
Next, I'm working on a HealthCheck script which looks at the conlog channel list and checks to see if channels are actually changing over short time scales, and then spit back a report on possibly non-functioning channels to the user. |
4231
|
Mon Jan 31 10:31:30 2011 |
josephb | Update | WienerFiltering | Improvement in H1 Wiener FF prediction by using weights and taps | Rossa is a rather beefy machine. It effectively has 8 Intel i7 Cores (2.67 Ghz each) and 12 Gigs of ram. Megatron only has 8 Gigs of ram and just 8 Opterons (1 GHz each). Rosalba has 4 Quad Core2 (2.4 GHz) with only 4 Gigs of ram.
MC damp |
dataviewer |
diaggui |
AWG |
c1ioo |
c1sus |
c1iscex |
RFM |
The Dolphins |
Sim.Plant |
Frame builder |
TDS |
|
|
|
|
|
|
|
|
|
|
|
|
|
4241
|
Wed Feb 2 15:07:20 2011 |
josephb | Update | CDS | activateDAQ.py now includes PEM channels | [Joe, Jenne]
We modified the activateDAQ.py script to handle the C1PEM.ini file (defining the PEM channels being recorded by the frame builder) in addition to all the optics channels. Jenne will be modifying it further so as to rename more channels. |
4246
|
Thu Feb 3 16:45:28 2011 |
josephb | Update | CDS | General CDS updates | Updated the FILTER.adl file to have the yellow button moved up, and replaced the symbol in the upper right with a white A with black background. I made a backup of the filter file called FILTER_BAK.adl. These are located in /opt/rtcds/caltech/c1/core/advLigoRTS/src/epics/util.
I also modified the Makefile in /opt/rtcds/caltech/c1/core/advLigoRTS/ to make the startc1SYS scripts it makes take in an argument. If you type in:
sudo startc1SYS 1
it automatically writes 1 to the BURT RESTORE channel so you don't have to open the GDS_TP screen and by hand put a 1 in the box before the model times out.
The scripts also points to the correct burtwb and burtrb files so it should stop complaining about not finding them when running the scripts, and actually puts a time stamped burt snapshot in the /tmp directory when the kill or start scripts are run. The Makefile was also backed up to Makefile_bak.
|
4247
|
Thu Feb 3 17:25:03 2011 |
josephb | Update | Computers | rsync script was not really backing up /cvs/cds | So today, after an "rm" error while working with the autoburt.pl script and burt restores in general, I asked Dan Kozak how to actually look at the backup data. He said there's no way to actually look at it at the moment. You can reverse the rsync command or ask him to grab the data/file if you know what you want. However, in the course of this, we realized there was no /cvs/cds data backup.
Turns out, the rsync command line in the script had a "-n" option. This means do a dry run. Everything *but* the actual final copying.
I have removed the -n from the script and started it on nodus, so we're backing up as of 5:22pm today.
I'm thinking we should have a better way of viewing the backup data, so I may ask Dan and Stewart about a better setup where we can login and actually look at the backed up files.
In addition, tomorrow I'm planning to add cron jobs which will put changes to files in the /chans and /scripts directories into the SVN on a daily basis, since the backup procedure doesn't really provide a history for those, just a 1 day back backup. |
4249
|
Fri Feb 4 13:31:16 2011 |
josephb | Update | CDS | FE start scripts moved to scripts/FE/ from scripts/ | All start and kill scripts for the front end models have been moved into the FE directory under scripts: /opt/rtcds/caltech/c1/scripts/FE/. I modified the Makefile in /opt/rtcds/caltech/c1/core/advLigoRTS/ to update and place new scripts in that directory.
This was done by using
sed -i 's[scripts/start$${system}[scripts/FE/start$${system}[g' Makefile
sed -i 's[scripts/kill$${system}[scripts/FE/kill$${system}[g' Makefile
|
4250
|
Fri Feb 4 13:45:25 2011 |
josephb | Update | Computers | Temporarily removed cronjob for rsync.backup | <p>I removed the rsync backup from nodus' crontab temporarily so as to not have multiple backup jobs running. The job I started from yesterday was still running. Hopefully the backup will finish by Monday.</p>
<p>The line I removed was:</p>
<p>0 5 * * * /opt/rtcds/caltech/c1/scripts/backup/rsync.backup</p>
<table align="left" width="786" cellspacing="1" cellpadding="1" border="2">
<tbody>
<tr>
<td><span style="font-size: larger;">MC damp</span></td>
<td><span style="font-size: larger;">dataviewer</span></td>
<td><span style="font-size: larger;">diaggui</span></td>
<td><span style="font-size: larger;">AWG</span></td>
<td><span style="font-size: larger;">c1lsc</span></td>
<td><span style="font-size: larger;">c1ioo</span></td>
<td><span style="font-size: larger;">c1sus</span></td>
<td><span style="font-size: larger;">c1iscex</span></td>
<td><span style="font-size: larger;">c1iscex</span></td>
<td><span style="font-size: larger;">RFM</span></td>
<td><span style="">The Dolphins</span></td>
<td><span style="font-size: larger;">Sim.Plant</span></td>
<td><span style="font-size: larger;">Frame builder</span></td>
<td><span style="font-size: larger;">TDS</span></td>
<td><span style="font-size: larger;">Cabling</span></td>
</tr>
<tr>
<td bgcolor="blue"><span style="font-size: larger;"> </span></td>
<td bgcolor="green"><span style="font-size: larger;"> </span></td>
<td bgcolor="blue"><span style="font-size: larger;"> </span></td>
<td bgcolor="yellow"><span style="font-size: larger;"> </span></td>
<td bgcolor="orange"><span style="font-size: larger;"> </span></td>
<td bgcolor="yellow"><span style="font-size: larger;"> </span></td>
<td bgcolor="blue"><span style="font-size: larger;"> </span></td>
<td bgcolor="yellow"><span style="font-size: larger;"> </span></td>
<td bgcolor="yellow"><span style="font-size: larger;"> </span></td>
<td bgcolor="blue"><span style="font-size: larger;"> </span></td>
<td bgcolor="blue"><span style="font-size: larger;"> </span></td>
<td bgcolor="red"><span style="font-size: larger;"> </span></td>
<td bgcolor="blue"><span style="font-size: larger;"> </span></td>
<td bgcolor="orange"><span style="font-size: larger;"> </span></td>
<td bgcolor="orange"><span style="font-size: larger;"> </span></td>
</tr>
</tbody>
</table> |
4251
|
Fri Feb 4 15:03:20 2011 |
josephb | Update | Computers | Modified cshrc.40m | Removed some lines from the PATH environment variable since they point to old codes which no longer work with the new frame builder and setup.
The change was:
#setenv PATH $LINUXPATH/bin:$GDSPATH/bin:$ROOTSYS/bin:$TDSPATH/bin:${SCRIPTPATH}:$PATH
setenv PATH $TDSPATH/bin:${SCRIPTPATH}:$PATH
|
4256
|
Mon Feb 7 10:37:28 2011 |
josephb | Update | Computers | Temporarily removed cronjob for rsync.backup | The backup appears to have finished on nodus, and I've put the rsync job back in the crontab.
Quote: |
I removed the rsync backup from nodus' crontab temporarily so as to not have multiple backup jobs running. The job I started from yesterday was still running. Hopefully the backup will finish by Monday.
The line I removed was:
0 5 * * * /opt/rtcds/caltech/c1/scripts/backup/rsync.backup
|
|
4262
|
Tue Feb 8 16:04:58 2011 |
josephb | Update | CDS | Hard coded decimation filters need to be fixed | [Joe, Rana]
Filter definitions for the decimation filters to epics readback channels (like _OUT16) can be found in the fm10Gen.c code (in /opt/rtcds/caltech/c1/core/advLigoRTS/src/include/drv).
At the moment, the code is broken for systems running at 32k, 64k as they look to be defaulting to the 16k filter. I'd like to also figure out the notation and plot the actual filter used for the 16k.
Rana has suggested a 2nd order, 2db ripple low pass Cheby1 filter at 1 Hz.
51 #if defined(SERVO16K) || defined(SERVOMIXED) || defined(SERVO32K) || defined(SERVO64K) || defined(SERVO128K) || defined(SERVO256K)
52 static double sixteenKAvgCoeff[9] = {1.9084759e-12,
53 -1.99708675982420, 0.99709029700517, 2.00000005830747, 1.00000000739582,
54 -1.99878510620232, 0.99879373895648, 1.99999994169253, 0.99999999260419};
55 #endif
56
57 #if defined(SERVO2K) || defined(SERVOMIXED) || defined(SERVO4K)
58 static double twoKAvgCoeff[9] = {7.705446e-9,
59 -1.97673337437048, 0.97695747524900, 2.00000006227141, 1.00000000659235,
60 -1.98984125831661, 0.99039139954634, 1.99999993772859, 0.99999999340765};
61 #endif
62
63 #ifdef SERVO16K
64 #define avgCoeff sixteenKAvgCoeff
65 #elif defined(SERVO32K) || defined(SERVO64K) || defined(SERVO128K) || defined(SERVO256K)
66 #define avgCoeff sixteenKAvgCoeff
67 #elif defined(SERVO2K)
68 #define avgCoeff twoKAvgCoeff
69 #elif defined(SERVO4K)
70 #define avgCoeff twoKAvgCoeff
71 #elif defined(SERVOMIXED)
72 #define filterModule(a,b,c,d) filterModuleRate(a,b,c,d,16384)
73 #elif defined(SERVO5HZ)
74 #else
75 #error need to define 2k or 16k or mixed
76 #endif |
4265
|
Wed Feb 9 15:26:22 2011 |
josephb | Update | CDS | Updated c1scx with lockin, c1gcv for green transmission pd | Updated the c1scx model to have two Lockin demodulators (C1:SUS-ETMX_LOCKIN1 and C1:SUS-ETMX_LOCKIN2). There is a matrix C1:SUS-ETMX_INMUX which directs signals to the inputs of LOCKIN1 and LOCKIN2. Currently only the GREEN_TRX signal is the only signal going in to this matrix, the other 3 are grounds. The actual clocks themselves had to be at the top level (they don't work inside blocks) and thus named C1:SCX-ETMX_LOCKIN1_OSC and C1:SCX-ETMX_LOCKIN2_OSC.
There is a signal (IPC name is C1:GCV-SCX_GREEN_TRX) going from the c1gcv model to the c1scx model, which will contain the output from Jenne's green transmission PD which will eventually be placed. I've placed a filter bank on it in the c1gcv model as a monitor point, and it corresponds to C1:GCV-GREEN_TRX.
The suspension control screens were modified to have a screen for the Matrix feeding signals into the two lockin demodulators. The green medm screen was also modified to have readbacks for the GREEN_TRX and GREEN_TRY channels.
So on the board, the top channel (labeled 1, corresponds to code ADC_0_0) is MCL.
Channel 2 (ADC_0_1) is assigned to frequency divided green signal.
Channel 3 (ADC_0_2) is assigned to the beat PD's DC output.
Channel 4 (ADC_0_3) is assigned to the green power transmission for the x-arm.
Channel 5 (ADC_0_4) is assigned to the green power transmission for the y-arm. |
4270
|
Thu Feb 10 14:07:18 2011 |
josephb | Update | CDS | Updating dolphin drivers to eliminate timeouts when one dolphin card is shutdown | [Joe,Alex]
Alex came over and we installed the new Dolphin drivers so that the front ends using the Dolphin PCIe RFM network don't pause for a long time when one of the other nodes in the network go down. Generally this pause would cause the code to time out and quit. Now you can take c1lsc or c1sus down without having the other have problems.
We did note on reboot however, that the Dolphin_wait script sometimes (not always) seems to hang. Since this is run at boot up, to ensure the dolphin card has had enough to allocate memory space for data to be written/read from by the IOP process, it means nothing else in the startup script gets run if it does happen. In this case, running "pkill dolphin_wait" may be necessary.
Note that you may still have problems if you hit the power button to force a shutdown (i.e. holding it for 4 seconds for immediate power off), but as long as you do a "reboot" or "shutdown -r now" type command, it should come down gracefully.
What was done:
Alex grabbed the code from his server, and put it /home/controls/DIS/ on fb.
He ran the following commands in that directory to build the code.
./configure '--with-adapter=DX' '--prefix=/opt/DIS'
make
sudo make install
He proceeded to modify the /diskless/root/etc/rc.local to have the line:
insmod /lib/modules/2.6.34.1/kernel/drivers/dis/dis_kosf.ko
In that same file he commented out
cd /root
and
exec /bin/bash/
He then modified the run levels in /diskless/root/etc/inittab. Level 0, level 3, and level 6 were changed:
l0:0:wait/etc/rc.halt
l3:3:wait:etc/rc.level3
l6:6:wait:/etc/rc.reboot
Then he created the scripts he was refering to:
rc.level3 is just:
exec /bin/bash
rc.halt is:
/opt/DIS/sbin/dxtool prepare-shutdown 0
sleep 3
halt -p
rc.reboot is:
reboot
Basically rc.halt calls a special code which prepares the Dolphin RFM card to shutdown nicely. This is why just hitting the power button for 4 seconds will cause problems for the rest of the dolphin network.
We then checked out of svn the latest dolphin.c in /opt/rtcds/caltech/c1/core/advLigoRTS/src/fe
The Dolphin RFM cards have a new numbering scheme. 4 is reserved for special broadcasts to everyone, so the Dolphin node IDs now start at 8. So we needed to change the c1lsc and c1sus Dolphin node IDs.
To change them we went to /etc/dis/dishosts.conf on the fb machine, and changed the following lines:
HOSTNAME: c1sus
ADAPTER: c1sus_a0 4 0 4
HOSTNAME: c1lsc
ADAPTER: c1lsc_a0 8 0 4
to
HOSTNAME: c1sus
ADAPTER: c1sus_a0 8 0 4
HOSTNAME: c1lsc
ADAPTER: c1lsc_a0 12 0 4
The FE models for the c1lsc and c1sus machines were recompiled and then the computers were rebooted. After having them come back up, we tested that there was no time out by shutting down c1lsc and watching c1sus. We then reveresed and shutdown c1sus while watching c1lsc. No problems occured. Currently they are up and communicating fine.
|
4291
|
Mon Feb 14 18:27:39 2011 |
josephb | Update | CDS | Began updating to latest CDS svn, reverted to previous state | [Joe, Alex]
This morning I began the process of bringing our copy of the CDS code up to date to the version installed at Livingston. The motivation was to get fixes to various parts, among others such as the oscillator part. This would mean cleaning up front end model .mdl files without having to pass clk, sin, cos channels for every optic through 3 layers of simulink boxes.
I also began the process of using a similar startup method, which involved creating /etc/init.d/ start and stop scripts for the various processes which get run on the front ends, including awgtpman and mx_streams. This allows the monitor software called monit to remotely restart those processes or provide a web page with a real time status of those processes. A cleaner rc.local file utilizing sub-scripts was also adapted.
I did some testing of the new codes on c1iscey. This testing showed a problem with the timing part of the code, with cycles going very long. We think it has something to do with the code not accounting for the fact that we do not have IRIG-B timing cards in the IO chassis providing GPS time, which the sites do have. We rely on the computer clock and ntpd.
At the moment, we've reverted to svn revision 2174 of the CDS code, and I've put the previously working version of the c1scy and c1x05 (running on the c1iscey computer) back. Its from the /opt/rtcds/caltech/c1/target/c1x05/c1x05_11014_163146 directory. I've put the old rc.local file back in /diskless/root/etc/ directory on the fb machine. Currently running code on the other front end computers was not touched. |
4300
|
Tue Feb 15 11:56:17 2011 |
josephb | Update | CDS | Updated some DAQ channel names | That is my fault for not running the activateDAQ.py script after a round of rebuilds. I have run the script this morning, and confirmed that the oplev channels are showing up in dataviewer.
Quote: |
Although Joe and Kiwamu claim that they have inserted the correct DAQ names for the OPLEVs (e.g. PERROR and YERROR) back in Jan. 11, when I look today, I see that these channels are missing!
I want my PERROR/YERRORs back!
|
|
4302
|
Tue Feb 15 15:06:25 2011 |
josephb | Update | CDS | CDS todo list for tomorrow morning | Currently, there is a test directory called /opt/rtcds/caltech/c1/new_core where we have the latest svn checkout. Tomorrow (after everything works), it will become the core directory.
1) Modify on the fb machine the /diskless/root/etc/ld.so.cache file. This is done by logging into fb, going to /etc/ld.so.conf.d/, modifying epics-x86_64.conf to only have .10 stuff , and running sudo /sbin/ldconfig. Copy the newly generated /etc/ld.so.cache file to /diskless/root/etc/.
2) Modify the rc.local file on the fb machine in /diskless/root/etc/ to take advantage of the new subscripts and init.d/ start scripts.
3) Add the no_rfm_dma to all the iop models (c1x01,c1x02,c1x03,c1x04,c1x05).
4) Rebuild all front end models with new code. Install.
5) Build awgtpman and mx_streams with new code.
6) Rerun activateDaq.py (to fix channel names from all the rebuilt code).
7) Double check Burt request files have the switch fix.
8) Restart the front ends.
9)Restart the frame builder.
9) Check channels, exitations, RFM connections.
10) Check Monit is working. |
4308
|
Wed Feb 16 12:16:14 2011 |
josephb | Update | CDS | Fixed Optical level SUM channel names | [Joe,Valera]
Valera pointed out the OPLEV SUM channels were incorrect. We changing the optical level sum channel to _OPLEV_SUM when it should have been OL_SUM. This has been fixed in the activateDAQ.py script. |
4311
|
Thu Feb 17 11:20:04 2011 |
josephb | Update | CDS | start scripts no longer need sudo | I've modified the rc.local file to run the IOC codes as controls, which means they no longer write root permission log files on startup.
The awgtpman, which was the other permission issue with the start scripts, is started by a run script now. This new version seems to be content to keep the permissions of the current log file, which is set to controls.
This should prevent the issue of sudo wiping your path environment variable for just that command. (Try "sudo which burtwb" versus "which burtwb" for example). This apparently a security feature of sudo.
If you should happen to use sudo to run a start script, the easiest solution to fix the permissions is just got to the target directory (type "target") and run "sudo chown controls:controls -R *" on one of the workstations (the front ends don't handle the groups properly at the moment).
This should allow the scripts to properly use burtrb and burtwb to write and backup burt files. |
4312
|
Thu Feb 17 11:49:48 2011 |
josephb | Update | CDS | Front end start/stop scripts go to /scripts/FE again | I modified the core/advLigoRTS/Makefile to once again place the startc1SYS and killc1SYS scripts in the scripts/FE/ directory.
It had been reverted in the SVN update. |
4313
|
Thu Feb 17 11:51:14 2011 |
josephb | Update | CDS | Lockin filter names too long - broke loading | Problem:
Could not load filters into the C1:SUS-ETMX_LOCKIN1_SIG filter bank.
Reason:
Apparently the filter bank name was too long. I'm not sure why this isn't caught by the real time code generator, I'm planning on asking Alex and Rolf about it today.
Solution:
Reduce the name of the components. Basically LOCKIN1 needs to become something like LOCK1 or LIN1.
In related news, it looks like the initial filters are hard coded to be 2048 Hz. Given that they start out empty they won't cause things to break immediately, and if you're editing the file you can update the rate as you add the filter. I'll also bring this up with Alex and Rolf and see if the RCG can't be more intelligent about its filter generation.
|
4320
|
Thu Feb 17 23:56:53 2011 |
josephb | Update | CDS | Daqd was rebuilt, now reverted. | As one of the trouble shooting steps for the daqd (i.e. framebuilder) I rebuilt the daqd executable. My guess is somewhere in the build code is some kind of GPS offset to make the time correct due to our lack of IRIG-B signal.
The actual daqdrc file was left untouched when I did the new install, so the symmetricom gps offset is still the same, which confuses me.
I'll take a look at the SVN diffs tomorrow to see what changed in that code that could cause a 300000000 or so offset to the GPS time.
|
4323
|
Fri Feb 18 13:41:22 2011 |
josephb | Update | CDS | CDS fixes | I talked to Alex today and had two things fixed:
First the maximum length of filter names (in the foton C1SYS.txt files in /chans) has been increased to 40, from 20. This does not increase EPICS channel name length (which is longer than 20 anyways).
This should prevent running into the case where the model doesn't complain when compiled, but we can't load filters.
Additionally, we modified the feCodeGen.pl script in /opt/rtcds/caltech/c1/core/advLigoRTS/src/epics/util/ to correctly generate names for filters in all cases. There was a problem where the C1 was being left off the file name when in the simulink .mdl file the filter was located in a box which had "top_names" set. |
4343
|
Wed Feb 23 10:37:02 2011 |
josephb | Summary | IOO | Myterious data loss: FB needs investigation | Friday:
In addition to the other fixes, Alex rebuilt the daqd process. I failed to elog this. When he rebuilt it, he needed change the symmerticom gps offset in the daqdrc file (located in /opt/rtcds/caltech/c1/target/fb).
On Friday night, Kiwamu contacted me and let me know the frame builder had core dumped after a seg fault. I had him temporarily disable the c1ass process (the only thing we changed that day), and then replaced Alex's rebuilt daqd code with the original daqd code and restarted it. However, I did not change the symmetricom offset at this point. Finally, I restarted the NDS process. At that point testpoints and trends seemed to be working.
Sunday:
The daqd process was restarted sometime on Sunday night (by Valera i believe). Apparently this restart finally had the symmetricom gps offset kick in (perhaps because it was the first restart after the NDS was restarted?). So data was being written to a future gps time.
Monday:
Kiwamu had problems with testpoints and trends and contacted me. I tracked down the gps offset and fixed it, but the original daqd process only started once successfully, after that is was segfault, core dump non-stop. I tried Alex's rebuilt daqd (along with putting the gps offset to the correct value for it), and it worked. Test points, trends, excitations were checked at the point and found working.
I still do not understand the underlying causes of all these segmentation faults with both the old and new daqd codes. Alex has suggested some new open mx drivers be installed today.
Quote: |
Looks like there was a mysterious loss of data overnight; since there's nothing in the elog I assume that its some kind of terrorism. I'm going to call Rolf to see if he can come in and work all night to help diagnose the issue.
|
|
4344
|
Wed Feb 23 11:53:30 2011 |
josephb | Update | CDS | Updated mx drivers | Alex and I updated the open mx drivers from 1.3.3 to 1.3.901 (1.4 release candidate). We downloaded the drivers from: http://open-mx.gforge.inria.fr/
We put them in /root/open-mx-1.3.901 on the fb machine (and thus get mounted by all the front ends.). We did configure and make and make install.
We did a quick check with /opt/mx/bin/mx_info on the fb machine at this point and realized the MAC addresses and host names were all messed up, including two listings for c1iscex with different mac addresses (neither of which was c1iscex).
We then brought all the front ends mx_streams down, brought the fb down, then cleared all the peer names with mx_hostname. We then brought the fb up, and the front end mx_stream processes.
/opt/mx/mx_info now shows a clean and correct set of hostnames and mac addresses. Testpoints and trends are working. |
4364
|
Mon Feb 28 11:22:40 2011 |
josephb | Summary | General | to do list |
Quote: |
- Where is the CDS TO DO ==> Joe
|
CDS To Do:
1) Get ETMY working - figure out why signals are not getting past the AI board (D000186) to the coils.
2) Get TDS and command line AWG stuff working
3) Get c1ass and new c1lsc (with Koji) fully integrated with the rest of the system.
4) Get CDS software instructions up to date and well organized.
5) Redo cabling and generally make it a permanent installation instead of hack job:
a) Measure cable lengths, check connectors, wire with good routes and ensure strain relief. Ensure proper labeling
b) Get correct length fiber for c1sus RFM and timing.
c) Fix up final BO adapter box and DAC boxes.
d) Make boxes for the AA filter adapters which are currently just hanging.
e) Get two "faceplates" for the cards in the back of the ETMY IO chassis so they can screwed down properly.
f) Remove and properly store old, unused cables, boards, and anything else.
6) Create new documentation detailing the current 40m setup, both DCC documents and interactive wiki.
7) Setup an Ubuntu work station using Keith's wiki instructions
Simulated Plant To Do:
1)Create simulated plant to interface with current end mass controls (say scx).
2) Create proper filters for pendulum and noise generation, test suspension.
3) Propagate to all other suspensions.
4) Working on simulated IFO plant to connect to LSC. Create filters for near locked (assume initial green control perhaps) state.
5) Test LSC controls on simulated IFO.
6) Fix c code so there's seamless switching between simulated and real controls.
CDS Status:
MC damp |
dataviewer |
diaggui |
AWG |
c1lsc |
c1ioo |
c1sus |
c1iscex |
c1iscey |
RFM |
The Dolphins |
Sim.Plant |
Frame builder |
TDS |
Cabling |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4380
|
Mon Mar 7 17:22:39 2011 |
josephb | Update | CDS | New simulated plant work | [Joe, Jamie]
We modified the c1scx model to have a switch to go between simulated and real plants. The channel is currently C1:SCX-SIM_SWITCH.
When this channel is zero, the simulated plant channels are going to the ADCs and zeros are going out to the real DACs. When this channel is one, the real ADCs are coming in, and real data is going out to the DACs.
Jamie will be adding a big green/red light to the suspension screens which indicate the state of the simulated plant. We will also eventually add this to the overall status screen.
A control screen for the simulated plant is located at /opt/rtcds/caltech/c1/medm/c1spx/master/C1SUP_ETMX.adl. These are currently a work in progress. |
4388
|
Tue Mar 8 16:59:47 2011 |
josephb | Update | CDS | Simulated Plant Work | The screens for the simplified c1spx model have been updated. I re-introduced the suspension point information into the sensor output matrix so we can take into account the fact that as the entire supporting structure moves, the osems moves relative to the optic.
Master screens for the noise filters (i.e. 60 Hz, suspension point motion, and optic noise) have been created.
I have currently set the matrix values of the c1spx model to handle just longitudinal motion. I.e. Coils drive only in the POS degree of freedom and sensor read outs are also only in the POS degree of freedom. I've turned off all the noise inputs.
I added a simple double pole at 1 Hz in the C1:SUP_ETMX_PL_F2P_0_0 filter bank. |
4396
|
Thu Mar 10 13:44:56 2011 |
josephb | Update | CDS | Added digitization noise to the c1spy model for simulated ADCs/DACs | To simulate digitization noise, the easiest way I found was to use the MathFunction block, found in the CDS_PARTS model, under simLinkParts.
The MathFunction block supports square of input value, square root of input value, reciprocal of input value, and modulo of two input values.
The last is useful because it casts the input values as integers before taking the modulo.By placing this block after the saturation block (set to +/- 32768), adding 32768.5, choosing the 2nd input to be larger than 2 * 32768 (100,000 in this case), and then subtracting 32768, we wind up with a rounding function. 
The above method has been applied to the c1spy model in the CI and SO out sub-blocks. |
4404
|
Fri Mar 11 11:33:24 2011 |
josephb | Update | CDS | Fixed mistake in Matrix of Filter banks naming convention | While fixing up some medm screens and getting spectra of the simulated plant, I realized that the naming convention for the Matrices of Filter banks was backwards when compared to that of the normal matrices (and the rest of the world). The naming was incorrectly column, row.
This has several ramifications:
1) I had to change the suspensions screens for the TO_COIL output filters.
2) I had to change the filters for the suspension with regards to the TO_COIL output filters so they go in the correct filter banks.
3) Burt restores to times previous March 11th around noon, will put your TO_COIL output filters in a funny state that will need to be fixed.
4) The simplant RESPONSE filters had to be moved to the correct filter banks.
5) If you have some model I'm not aware of that uses the FiltMuxMatrix piece, it is going to correctly build now, but you're going to have to move filters you may have created with foton. |
4409
|
Sun Mar 13 16:46:48 2011 |
josephb | Update | CDS | ETMY Sim work | 4. The blue Output Filters section has been changed to agree with the new filter of matrices row, column labeling. My fault for not testing it and realizing it was broken. The change was made in /opt/rtcds/caltech/c1/medm/master/C1SUS_DEFAULTNAME.adl and then ,/generate_master_screens.py was run, updating all the screens.
5. I have swapped the logic for the sensor filter banks (ULSEN, URSEN, etc). It now sends a "1" to the Binary Output board controlling the OSEM analog whitening when the FM1 filter is ON. This has been done for all the suspensions (BS, ITMX,ITMY, SRM, PRM, MC1, MC2,MC3, ITMX, ITMY).
I am also updating the first sensor filter banks for the BS, ITMX, ITMY, SRM, PRM,MC1,MC2,MC3, called "3:30", to match the Y and X ends.
8. I can't find any documentation on how to get a momentary button press to toggle states. I could stick a filter bank in and use the on/off feature of that part, but that feels like a silly hack. I've decided for the moment to split the TM offset button into 2, one for ON, one for OFF. I'll put in on the list of things to have added to the RCG code (either a method, or documentation if it already exists).
EDIT: TM offset still doesn't work. Will worry about it next week.
9. Fixed a connection in SPY/SPX models where the side senor path that was missing a constant to a modulo block.
Quote: |
I did some work on the ETMY real and Sim.
- Set SUS coil gains to have the same quadropole arrangement as the magnets do (-1, 1, 1, -1) so that POS = POS instead of pringle.
- Set the Sim Magnet polarities to match this. These are the ETMY_CI filter banks.
- Found that the Xycom cable from the ETMY slow controls was unplugged at the Xycom side. This was preventing enabling the ETMY coil driver and so there was no real damping of the suspension going on. I plugged it in and checked that the mirror could now be moved.
- The C1SUS_ETMY master screen's BLUE output filter area is now mis-labeled. If you trust the screen you would set it up to drive the suspension incorrectly. This MUST be fixed along with all of the other misleading features of the screen.
- ETMY SUSSIDE filter bank had a 2048 Hz sample rate and was making the damping not work correctly. Fixed to 16384 Hz.
- 12 Hz, 4th order Cheby low pass added and turned on for the local damping filtering. This is not optimum, but its just there to give us some filtering without introducing some instability via phase lag around 3 Hz.
- ETMY OL beam re-aligned on ISCT-EX.
- TM Offset buttons not working on the main overview screen.
It seems like there is still a problem with the input whitening filters. I believe the Xycom logic is set such that the analog whitening of the OSEM signals is turned ON only when the FM1 is turned OFF. Joe has got to fix this (and elog it) so that we can damp the suspension correctly. For now, the damping of the ETMY and the SETMY require different servo gains and signs, probably because of this.
|
|
4410
|
Fri Mar 18 11:29:36 2011 |
josephb | Update | CDS | Minute trend issues | [Joe, Alex]
Steve pointed out to me today he couldn't get trends for his PEM slow channels like C1:PEM-count_full.
I experimented a bit and found for long time requests (over 20 days), it would produce minute trends up to the current time, but only if they started far enough back. So the data was being written, but something was causing a problem for dataviewer/NDS to find it.
On further investigation it looks to be some incorrect time stamps at several points in the last few months are causing the problems. Basically when Alex and I made mistakes in the GPS time stamp settings for the frame builder (daqd) code, the wrong time got written for hours to the raw minute trend data files.
So Alex is going to be running a script to go through the roughly 180 gigabytes of affected trend data to write new files with the correct time stamps. Once it done, we'll move the files over. We'll probably lose a few hours worth of recent trend data, depending on how quickly the scripts run, but after which minute trends should work as they are supposed to. |
4415
|
Fri Mar 18 17:25:21 2011 |
josephb | Update | CDS | Lockins in c1sus update, suspension screens updated | I updated our lockin simulink pieces to use the newer, more streamlined lockin piece that is currently in CDS_PARTS (with new documentation block!). It means we are no longer passing clock signals through three levels of boxes.
In order to use the piece, you need to right click on it after copying from CDS_PARTS and go to Link Options->Disable Link. This forces the .mdl to save all the relevant information about the block rather than just a pointer to the library. I talked with Rolf and Alex today and we discussed setting up another model file, non-library format for putting generically useful user blocks into, rather than using the CDS_PARTS library .mdl.
The BS, ITMX, ITMY, PRM, SRM, ETMX, ETMY now have working lockins, with the input matrix to them having the 2nd input coming from LSC_IN, the 3rd from the oplev pitch, and the 4th from oplev yaw.
This necessitated a few name changes in the medm screens. I also changed the lockin clock on/off switch to a direct amplitude entry, which turns green when a non-zero value is entered.
Currently, the Mode cleaner optic suspension screens have white lockins on them. I started modifying a new set of screens just for them, and will modify the generate_master_screens. Unfortunately, this requires modifying two sets of suspension screens going forward - the main interferometer optics and the MC optics. |
4431
|
Wed Mar 23 10:34:17 2011 |
josephb | Update | CDS | Trend issue fixed | [Joe, Alex]
Yesterday during the day, Alex ran a script to fix the time stamps in the trends files we had messed up back during the daqd change overs around Feb 17th and 23rd. See this elog for more information on the trend problem.
Due to how the script runs, basically taking all the data and making a new copy with the correct time stamps, the data collected while the script was running didn't get converted over. So when he did the final copy of the corrected data, it created a several hour gap in the data from yesterday during the day time.
The original files still exist on the fb machine in /frames/trend/minute_raw_22mar2011 directory.
|
4438
|
Thu Mar 24 13:56:05 2011 |
josephb | Update | elog | elog restarted at 1:55pm | Restarted elog. |
4444
|
Fri Mar 25 11:16:19 2011 |
josephb | Frogs | Green Locking | digital frequency counting | I modified the c1gfd.mdl simulink model. I made a backup as c1gfd_20110325.mdl.
The first change was to use a top_names block to put everything in. The block is labeled ALS. So all the channels will now be C1:ALS-GFD_SOMETHING. This means medm channel names will need to be updated. Also, the filter modules need to be updated in foton because of this.
I then proceeded to add the suggested changes made by Matt. To avoid a divide by zero case, I added a saturation part which saturates at 1e-9 (note this is positive) and 1e9.
Quote: |
Today we tried the Schmitt trigger DFD, and while it works it does not improve the noise performance. At least part of our problem is coming from the discrete nature of our DFD algorithm, so I would propose that an industrious day job person codes up a new DFD which avoids switching. We can probably do this by mixing the input signal (after high-passing) with a time-delayed copy of itself... as we do now, but without the comparator. This has the disadvantage of giving an amplitude dependent output, but since we are working in the digital land we can DIVIDE. If we mix the signal with itself (without delay) to get a rectified version, and low-pass it a little, we can use this for normalization. The net result should be something like:
output = LP2[ s(t) * s(t - dt) / LP1[ s(t) * s(t) ]],
where s(t) is the high-passed input and LP is a low-pass filter. Remember not to divide by zero.
|
|
Attachment 1: C1GFD.png
|
|
4445
|
Mon Mar 28 15:18:04 2011 |
josephb | Update | CDS | CDS updates on Friday | Last Friday, we discovered a bug in the RCG where the delay part was not actually delaying. We reported this to Alex who promptly put a fix in the same day. This allowed Matt's newly proposed frequency discriminator to work properly.
It also required a checkout of the latest RCG code (revision 2328), and rebuild of the various codes. We backed up all the kernel and executables first such as mbuf.ko and awgtpman.
We did the following:
1) Log into the fb machine.
2) Go to /opt/rtcds/caltech/c1/core/advLigoRTS/src/drv/mbuf and run make. Copy the newly built mbuf.ko file to /diskless/root/modules/2.6.34.1/kernel/drivers/mbuf/mbuf.ko on the fb machine.
3) Use "sudo cp" to copy the newly built mbuf.ko file to /diskless/root/modules/2.6.34.1/kernel/drivers/mbuf/
4) Go to /cvs/cds/rtcds/caltech/c1/core/advLigoRTS/src/gds and run make.
5) Copy the newly built awgtpman executable to /opt/rtcds/caltech/c1/target/gds/bin/
6) Go to /opt/rtcds/caltech/c1/core/advLigoRTS/src/mx_stream/ and run make.
7) Copy the newly built mx_stream executable to /opt/rtcds/caltech/c1/target/fb/ |
4446
|
Mon Mar 28 15:49:18 2011 |
josephb | Update | CDS | Lessons from LST | [Koji,Joe]
PART 1:
Koji was unable to build his c1lst model first thing this morning. Turns out there was a bug with RCG parser that was introduced on Friday when we did the RCG updates. We talked Alex who did a quick comment fix. The diff is as follows:
Index: Parser3.pm
==============================
============================== =======
--- Parser3.pm (revision 2328)
+++ Parser3.pm (working copy)
@@ -1124,8 +1124,8 @@
print "Flattening the model\n";
flatten_nested_subsystems($ root);
print "Finished flattening the model\n";
- CDS::Tree::do_on_nodes($root, \&remove_tags, 0, $root);
- print "Removed Tags\n";
+ #CDS::Tree::do_on_nodes($root, \&remove_tags, 0, $root);
+ #print "Removed Tags\n";
#print "TREE\n";
#CDS::Tree::print_tree($root);
CDS::Tree::do_on_nodes($root, \&remove_busses, 0, $root);
This was some code to remove TAGs from the .mdl file for some reason which I do not understand at this time. I will ask tommorrow in person so I can understand the full story.
PART 2:
Koji then rebuilt and started the c1lst process. This is his new test version of the LSC code. We descovered (again) that when you activate too many DAQ channels (simply uncommenting them, not even recording them with activate=1 in the .ini file) that the frame builder crashes. In addition, the c1lsc machine, which the code was running on, also hard crashed.
When a channel gets added to the .ini file (or uncommented) it is sent to the framebuilder, irregardless of whether its recorded or not by the frame builder. There is only about 2 megabytes per second bandwidth per computer. In this case we were trying to do something like 200 channels * 16384 Hz * 4 bytes = 13 megabytes per second.
The maximium number of 16384 channels is roughly 30, with little to no room for anything else. In addition, test points use the same allocated memory structure, so that if you use up all the capacity with channels, you won't be able to use testpoints to that computer (or thats what Alex has led me to believe).
The daqd process then core dumped and was causing all sorts of martian network slowdowns. At the same time, the c1lsc computer crashed hard, and all of the front end processes except for the IOP on c1sus crashed.
We rebooted c1lsc, and restarted the c1sus processes using the startc1SYS scripts. However, the c1susfe.ko apparently got stuck in a wierd state. We were completely unable to damp the optics and were in general ringing them up severely. We tried debugging, including several burt restores and single path checks.
Eventually we decided to reboot the c1sus machine after a bit of debugging. After doing a burt restore after the reboot, everything started to damp and work happily. My best guess is the kernel module crashed in a bad way and remained in memory when we simply did the restart scripts.
|
4499
|
Thu Apr 7 13:14:23 2011 |
josephb | Update | CDS | Proposed plan for ITMX/ITMY control switch | Problem:
The controls (fast and slow both) think ITMX is ITMY and ITMY is ITMX.
Solution:
After some poking around today, I have convinced myself it is sufficient to simply swap all instances of ITMX for ITMY in the C1_SUS-AUX1_ITMX.db file, and then rename it to C1_SUS-AUX1_ITMY.db (after having moved the original C1_SUS-AUX1_ITMY.db to a temporary holding file).
A similar process is then applied to the original C1_SUS-AUX1_ITMY.db file. These files live in /cvs/cds/caltech/target/c1susaux. This will fix all the slow controls.
To fix the fast controls, we'll modify the c1sus.mdl file located in /opt/rtcds/caltech/c1/core/advLigoRTS/src/epics/simLink/ so that the ITMX suspension name is changed to ITMY and vice versa. We'll also need to clean up some of the labeling
At Kiwamu and Bryan's request, this will either be done tomorrow morning or on Monday.
So the steps in order are:
1) cd /cvs/cds/caltech/target/c1susaux
2) mv C1_SUS-AUX1_ITMX.db C1_SUS-AUX1_ITMX.db.20110408
3) mv C1_SUS-AUX1_ITMY.db C1_SUS-AUX1_ITMY.db.20110408
4) sed 's/ITMX/ITMY/g' C1_SUS-AUX1_ITMX.db.20110408 > C1_SUS-AUX1_ITMY.db
5) sed 's/ITMY/ITMX/g' C1_SUS-AUX1_ITMY.db.20110408 > C1_SUS-AUX1_ITMX.db
6) models
7) matlab
8) Modify c1sus model to swap ITMX and ITMY names while preserving wiring from ADCs/DACs/BO to and from those blocks.
9) code; make c1sus; make install-c1sus
10) Disable all watchdogs
11) Restart the c1susaux computer and the c1sus computer
|
4515
|
Tue Apr 12 12:01:30 2011 |
josephb | Update | General | IFO controls, now with 10% less lying (ITMX/ITMY controls swapped) | The ITMX/ITMY control swap is complete.
The steps from this elog were followed.
In addition, I did a burt restore of c1sus, c1mcs.
I then swapped all the gain settings from ITMX to ITMY, and reenabled the watchdogs.
I did some basic kick tests (1000 counts into UL coil) and confirmed channels like C1:SUS-ITMX_ULPD_VAR (watchdogs mV readback) corresponded to the correct optic. I also checked that the POS, PIT, YAW, SIDE produced reasonable damping when engaged. |
4516
|
Tue Apr 12 16:01:33 2011 |
josephb | Update | General | RFM errors | Problem:
Currently the c1scy, c1mcs, and c1rfm models are reporting an error with receiving some data sent over the GE Fanuc Reflected memory cards.
To be more exact, the C1:SUS-ETMY_ALS signal from the c1gcv FE code on the c1ioo computer going too the Y end is not being received. However, the C1:SUS-ETMY_LSC signal is. So the physical RFM card seems to be working.
Similarly, the TRY signal is being sent correctly from the Y end computer. The X end is working fine and receiving both LSC and ALS signals.
The c1mcs and c1rfm models also receive data from the c1ioo computer and reporting receiving errors.
Theory:
Because the RFM cards are transmitting and receiving at least some channels, I'm guessing there was changes made to the C1.ipc file, which defines the memory locations of these various channels on the RFM network, and that when a model was rebuilt, a different one using the previous IPC file was not, and thus one of the computer is going to the wrong place to either read or write data.
Tomorrow, I'm planning on the following:
1) Clean out the C1.ipc file (/opt/rtcds/caltech/c1/chans/ipc/)
2) Rebuild all models
3) Run activate_daq.py script
4) Restart models via script
If this doesn't clear up the problem, I'll continue to bug hunt. |
4518
|
Wed Apr 13 11:34:07 2011 |
josephb | Update | CDS | Fixed IFO_ALIGN.adl | Problem:
I switched the ITMX and ITMY control channels yesterday, but forgot to update the IFO_ALIGN.adl file (/opt/rtcds/caltech/c1/medm/c1ifo/) which had the control labels swapped to make life easier.
Solution:
I swapped ITMX and ITMY control locations on the screen.
Question:
Are there any other screens involving ITMX and ITMY that had controls reversed to make life easier? |
4524
|
Thu Apr 14 12:57:15 2011 |
josephb | Update | CDS | RFM network happy again | [Joe, Alex]
Problem Symptoms:
There were red lights on the status screen indicating RFM errors for the c1scy, c1mcs and c1rfm processes.
The c1iscey, c1sus machines were receiving data sent over the RFM network from the c1ioo computer with a bad time stamp, a few cycles too late. The c1iscex computer was receiving data from c1ioo fine.
Problem:
The c1iscex RFM card had gotten into a bad state and was somehow slowing things down/corrupting data. It didn't affect itself, but due to the loop topology was messing everyone else up. Basically the only one who wasn't throwing an error was the culprit.
Solution:
Hard power cycling the c1iscex computer reset the RFM card and fixed the problem. |
4543
|
Tue Apr 19 15:48:43 2011 |
josephb | Update | CDS | MEDM screens and Front Ends updated to new Matrices | Problem:
The original matrix naming conventions for the front ends was broken. It used _11, _12,...,_1e, _1f, _110, _111 and so forth. The code was changes to use _1_1, _1_2,...,_1_16,_1_17, and so on.
In addition the matrix of filter banks was modified to use the same naming convetion (instead of starting at zero, it now start with one).
Work Done:
I rebuilt all the models, and restarted them all.
I wrote a simple script to modify the burt restore files to have the correct names for all the stored matrix values.
I also modified all the suspension screens, by modifying the default screens in /opt/rtcds/caltech/c1/medm/master/
The C1SUS, C1SCX, C1SPX, C1SCY, C1SPY, and C1MCS models had their foton filter files modified to put filters into the newly changed named filters |
4545
|
Wed Apr 20 11:02:18 2011 |
josephb | Update | CDS | MEDM screens and Front Ends updated to new Matrices | We simply didn't any matrices larger than 16x16. If we had, than that matrix would not have worked properly since the beginning.
Quote: |
Just a curiosity:
I just wonder how you have distingushed the difference between _111 and _111.
They are equivalent alone themselves. Have you looked at the contexts of the lines?
Or you just did not have the larger matrix than 16x16, did you?
|
|
4580
|
Thu Apr 28 10:53:50 2011 |
josephb | Update | CDS | Adventures in Hyper-threading | What was done:
1) Turn off MC1, MC2, MC3, BS, ITMX, ITMY, PRM, SRM watchdogs.
2) Turn c1sus computer off (sudo shutdown now)
3) Go connect monitor and keyboard to c1sus. Turn c1sus on.
4) Hit "del" key at the right time to go to setup (BIOS).
5) Go to BIOS advanced tab, CPU options, enable Multi-threading.
6) Hit F10 to save and let the computer continue booting.
What went wrong:
Once c1sus was up, I noticed several red lights and dead keep alives for the c1sus models.
Typing dmesg on c1sus revealed many messages like:
[ 107.583420] c1x02: cycle 33737 time 20; adcWait 10; write1 0; write2 0; longest write2 0
[ 107.583771] c1x02: cycle 33760 time 19; adcWait 11; write1 0; write2 0; longest write2 0
This indicates the Input/Output Processor (IOP) is not completing its duties within the 15 microseconds (1/64 kHz) that it has. These lines indicate its take 20 or 19 microseconds. (I saw messages ranging from 16 to 22 microseconds).
So this seems to agree with Rolf's observations that hyperthreading can cause a 5-10 microsecond increase in computation time.
So the next thing to do is modify which core the codes are running on, and try to get them paired up on the same physical core. |
4581
|
Thu Apr 28 12:25:11 2011 |
josephb | Update | CDS | Further adventures in Hyper-threading | First, I disabled front end starts on boot up, and brought c1sus up. I rebuilt the models for the c1sus computer so they had a new specific_cpu numbers, making the assumption that 0-1 were one real core, 2-3 were another, etc.
Then I ran the startc1SYS scripts one by one to bring up the models. Upon just loading the c1x02 on "core 2" (the IOP), I saw it fluctuate from about 5 to 12. After bringing up c1sus on "core 3", I saw the IOP settle down to about 7 consistently. Prior to hyper-threading it was generally 5.
Unfortunately, the c1sus model was between 60 and 70 microseconds, and was producing error messages a few times a second
[ 1052.876368] c1sus: cycle 14432 time 65; adcWait 0; write1 0; write2 0; longest write2 0
[ 1052.936698] c1sus: cycle 15421 time 74; adcWait 0; write1 0; write2 0; longest write2 0
Bringing up the rest of the models (c1mcs on 4, c1rfm on 5, and c1pem on 6), saw c1mcs occasionally jumping above the 60 microsecond line, perhaps once a minute. It was generally hovering around 45 microseconds. Prior to hyper-threading it was around 25-28 microseconds.
c1rfm was rock solid at 38, which it was prior to hyper-threading. This is most likely due to the fact it has almost no calculation and only RFM reads slowing it down.
c1pem continued to use negligible time, 3 microseconds out of its 480.
I tried moving c1sus to core 8 from core 3, which seemed to bring it to the 58 to 65 microsecond range, with long cycles every few seconds.
I built 5 dummy models (dua on 7, dub on 9, duc on 10, dud on 11, due on 1) to ensure that each virtual core had a model on it, to see if it helped with stabilizing things. The models were basically copies of the c1pem model.
Interestingly, c1mcs seemed to get somewhat better and only taking to 30-32 microseconds, although still not as good as its pre-hyper-threading 25-28. Over the course of several minutes it was no longer having a long cycle.
c1sus got worse again, and was running long cycles 4-5 times a second.
At this point, without surgery on which models are controlling which optics (i.e. splitting the c1sus model up) I am not able to have hyper-threading on and have things working. I am proceeding to revert the control models and c1sus computer to the hyper-threading state.
|
4608
|
Tue May 3 10:41:35 2011 |
josephb | Update | CDS | Morning maintenance | 1) Filled in the C1SUS_BS_OLMATRIX properly so as to make the BS oplev work for Steve.
2) Turned on the ITMX damping. Apparently it had tripped this morning, possibly due to work in the lab area.
3) The ETMX FE controller (c1scx) had ADC timed out and died sometime around 8:30 am. The c1x01 (the IOP on the ETMX computer) was also indicating a FB status error (mismatch in DAQ channels).
The reported error in dmesg on c1iscex was:
[1628690.250002] c1spx: ADC TIMEOUT 0 3541 21 3605
[1628690.250002] c1scx: ADC TIMEOUT 0 3541 21 3605
Just to be safe, I rebuilt the c1x01 and c1scx models, ran ./activateDAQ.py, and used the scripts killc1spx, killc1scx, and killc1x01.
I finally restarted the process with startc1x01, startc1scx, and startc1spx. Everything is currently alive and indicating all green. |
4609
|
Tue May 3 10:59:31 2011 |
josephb | Update | CDS | 1Y2 binary output adapter board now powered | I temporarily turned off the power to the 1Y2 rack this morning while wiring in the binary output adapter board power (+/- 15V) into the cross connects.
The board is now powered and we can proceed to testing if can actually control the LSC whitening filters. |
|